Class Xhtml5BaseParser

All Implemented Interfaces:
HtmlMarkup, Markup, XmlMarkup, Parser
Direct Known Subclasses:
FmlContentParser, Xhtml1BaseParser, Xhtml5Parser

public class Xhtml5BaseParser extends AbstractXmlParser implements HtmlMarkup
Common base parser for xhtml5 events.
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • BODYTABLEBORDER_CLASS_PATTERN

      private static final Pattern BODYTABLEBORDER_CLASS_PATTERN
      Used to identify if a class string contains `bodyTableBorder`
    • UNMATCHED_XHTML5_ELEMENTS

      private static final Set<String> UNMATCHED_XHTML5_ELEMENTS
    • UNMATCHED_XHTML5_SIMPLE_ELEMENTS

      private static final Set<String> UNMATCHED_XHTML5_SIMPLE_ELEMENTS
    • scriptBlock

      private boolean scriptBlock
      True if a <script></script> or <style></style> block is read. CDATA sections within are handled as rawText.
    • isAnchor

      private boolean isAnchor
      Used to distinguish <a href=""> from <a name="">.
    • orderedListDepth

      private int orderedListDepth
      Used for nested lists.
    • sectionLevel

      private int sectionLevel
      Counts section nesting level of the sections manually set in the HTML document
    • headingLevel

      private int headingLevel
      Counts current heading level. This is either the sectionLevel if no artificial sections are currently open for headings or a number higher or lower than sectionLevel (for all section currently opened/closed for a preceding heading). The heading level only changes when a new heading starts, or a section starts or ends.
    • inVerbatim

      private boolean inVerbatim
      Verbatim flag, true whenever we are inside a <pre> tag.
    • divStack

      private Stack<String> divStack
      Used to keep track of closing tags for content events
    • hasDefinitionListItem

      boolean hasDefinitionListItem
      Used to wrap the definedTerm with its definition, even when one is omitted
    • capturedSinkEventNames

      private LinkedList<String> capturedSinkEventNames
  • Constructor Details

    • Xhtml5BaseParser

      public Xhtml5BaseParser()
  • Method Details

    • parse

      public void parse(Reader source, Sink sink, String reference) throws ParseException
      Parses the given source model and emits Doxia events into the given sink.
      Specified by:
      parse in interface Parser
      Overrides:
      parse in class AbstractXmlParser
      Parameters:
      source - not null reader that provides the source document. You could use newReader methods from ReaderFactory.
      sink - A sink that consumes the Doxia events.
      reference - a string identifying the source (for file based documents the source file path)
      Throws:
      ParseException - if the model could not be parsed.
    • initXmlParser

      protected void initXmlParser(org.codehaus.plexus.util.xml.pull.XmlPullParser parser) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
      Initializes the parser with custom entities or other options. Adds all XHTML (HTML 5.2) entities to the parser so that they can be recognized and resolved without additional DTD.
      Overrides:
      initXmlParser in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem initializing the parser
    • baseStartTag

      protected boolean baseStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink)

      Goes through a common list of possible html5 start tags. These include only tags that can go into the body of an xhtml5 document and so should be re-usable by different xhtml-based parsers.

      The currently handled tags are:

      <article>, <nav>, <aside>, <section>, <h1>, <h2>, <h3>, <h4>, <h5>, <header>, <main>, <footer>, <em>, <strong>, <small>, <s>, <cite>, <q>, <dfn>, <abbr>, <i>, <b>, <code>, <samp>, <kbd>, <sub>, <sup>, <u>, <mark>, <ruby>, <rb>, <rt>, <rtc>, <rp>, <bdi>, <bdo>, <span>, <ins>, <del>, <p>, <pre>, <ul>, <ol>, <li>, <dl>, <dt>, <dd>, <a>, <table>, <tr>, <th>, <td>, <caption>, <br/>, <wbr/>, <hr/>, <img/>.

      Parameters:
      parser - A parser.
      sink - the sink to receive the events.
      Returns:
      True if the event has been handled by this method, i.e. the tag was recognized, false otherwise.
    • baseStartTag

      protected boolean baseStartTag(String elementName, SinkEventAttributeSet attribs, Sink sink)
    • baseEndTag

      protected boolean baseEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink)

      Goes through a common list of possible html end tags. These should be re-usable by different xhtml-based parsers. The tags handled here are the same as for baseStartTag(XmlPullParser,Sink), except for the empty elements (<br/>, <hr/>, <img/>).

      Parameters:
      parser - A parser.
      sink - the sink to receive the events.
      Returns:
      True if the event has been handled by this method, false otherwise.
    • baseEndTag

      protected boolean baseEndTag(String elementName, SinkEventAttributeSet attribs, Sink sink)
    • handleStartTag

      protected void handleStartTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException
      Goes through the possible start tags. Just calls baseStartTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags.
      Specified by:
      handleStartTag in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      sink - the sink to receive the events.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      MacroExecutionException - if there's a problem executing a macro
    • handleEndTag

      protected void handleEndTag(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException, MacroExecutionException
      Goes through the possible end tags. Just calls baseEndTag(XmlPullParser,Sink), this should be overridden by implementing parsers to include additional tags.
      Specified by:
      handleEndTag in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      sink - the sink to receive the events.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
      MacroExecutionException - if there's a problem executing a macro
    • handleText

      protected void handleText(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
      Handles text events.

      This is a default implementation, if the parser points to a non-empty text element, it is emitted as a text event into the specified sink.

      Overrides:
      handleText in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      sink - the sink to receive the events. Not null.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
    • handleComment

      protected void handleComment(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
      Handles comments.

      This is a default implementation, all data are emitted as comment events into the specified sink.

      Overrides:
      handleComment in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      sink - the sink to receive the events. Not null.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
    • handleCdsect

      protected void handleCdsect(org.codehaus.plexus.util.xml.pull.XmlPullParser parser, Sink sink) throws org.codehaus.plexus.util.xml.pull.XmlPullParserException
      Handles CDATA sections.

      This is a default implementation, all data are emitted as text events into the specified sink.

      Overrides:
      handleCdsect in class AbstractXmlParser
      Parameters:
      parser - A parser, not null.
      sink - the sink to receive the events. Not null.
      Throws:
      org.codehaus.plexus.util.xml.pull.XmlPullParserException - if there's a problem parsing the model
    • consecutiveSections

      @Deprecated protected void consecutiveSections(int newLevel, Sink sink, SinkEventAttributeSet attribs)
      Deprecated.
      Shortcut for emitHeadingSections(int, Sink, boolean) with last argument being true
      Parameters:
      newLevel -
      sink -
      attribs -
    • emitHeadingSections

      protected void emitHeadingSections(int newLevel, Sink sink, boolean enforceNewSection)
      Make sure sections are nested consecutively and correctly inserted for the given heading level

      HTML5 heading tags H1 to H5 imply same level sections in Sink API (compare with Sink.sectionTitle(int, SinkEventAttributes)). However (X)HTML5 allows headings without explicit surrounding section elements and is also less strict with non-consecutive heading levels. This methods both closes open sections which have been added for previous headings and/or opens sections necessary for the new heading level. At least one section needs to be opened directly prior the heading due to Sink API restrictions.

      For instance, if the following sequence is parsed:

       <h2></h2>
       <h5></h5>
       

      we have to insert two section starts before we open the <h5>. In the following sequence

       <h5></h5>
       <h2></h2>
       

      we have to close two sections before we open the <h2>.

      The current heading level is set to newLevel afterwards.

      Parameters:
      newLevel - the new section level, all upper levels have to be closed.
      sink - the sink to receive the events.
      enforceNewSection - whether to enforce a new section or not
    • isLastEventSectionStart

      private boolean isLastEventSectionStart()
    • closeOpenHeadingSections

      private void closeOpenHeadingSections(int newLevel, Sink sink)
      Close open heading sections.
      Parameters:
      newLevel - the new section level, all upper levels have to be closed.
      sink - the sink to receive the events.
    • openMissingHeadingSections

      private void openMissingHeadingSections(int newLevel, Sink sink)
      Open missing heading sections.
      Parameters:
      newLevel - the new section level, all lower levels have to be opened.
      sink - the sink to receive the events.
    • getSectionLevel

      protected int getSectionLevel()
      Return the current section level.
      Returns:
      the current section level.
    • setSectionLevel

      protected void setSectionLevel(int newLevel)
      Set the current section level.
      Parameters:
      newLevel - the new section level.
    • verbatim_

      protected void verbatim_()
      Stop verbatim mode.
    • verbatim

      protected void verbatim()
      Start verbatim mode.
    • isVerbatim

      protected boolean isVerbatim()
      Checks if we are currently inside a <pre> tag.
      Returns:
      true if we are currently in verbatim mode.
    • isScriptBlock

      protected boolean isScriptBlock()
      Checks if we are currently inside a <script> tag.
      Returns:
      true if we are currently inside <script> tags.
      Since:
      1.1.1.
    • validAnchor

      protected String validAnchor(String id)
      Checks if the given id is a valid Doxia id and if not, returns a transformed one.
      Parameters:
      id - The id to validate.
      Returns:
      A transformed id or the original id if it was already valid.
      See Also:
    • init

      protected void init()
      Initialize the parser. This is called first by AbstractParser.parse(java.io.Reader, org.apache.maven.doxia.sink.Sink) and can be used to set the parser into a clear state so it can be re-used.
      Overrides:
      init in class AbstractParser
    • handleAEnd

      private void handleAEnd(Sink sink)
    • handleAStart

      private void handleAStart(Sink sink, SinkEventAttributeSet attribs)
    • handleDivStart

      private boolean handleDivStart(SinkEventAttributeSet attribs, Sink sink)
    • handleDivEnd

      private boolean handleDivEnd(Sink sink)
    • handleImgStart

      private void handleImgStart(Sink sink, SinkEventAttributeSet attribs)
    • handleLIStart

      private void handleLIStart(Sink sink, SinkEventAttributeSet attribs)
    • handleListItemEnd

      private void handleListItemEnd(Sink sink)
    • handleOLStart

      private void handleOLStart(Sink sink, SinkEventAttributeSet attribs)
    • handlePStart

      private void handlePStart(Sink sink, SinkEventAttributeSet attribs)
    • handlePreStart

      private void handlePreStart(SinkEventAttributeSet attribs, Sink sink)
    • handleSectionStart

      private void handleSectionStart(Sink sink, SinkEventAttributeSet attribs)
    • handleHeadingStart

      private void handleHeadingStart(Sink sink, int level, SinkEventAttributeSet attribs)
    • handleSectionEnd

      private void handleSectionEnd(Sink sink)
    • handleTableStart

      private void handleTableStart(Sink sink, SinkEventAttributeSet attribs)