The RULES Tag

RULES tags must be placed inside the MODE tag. Each RULES tag defines a ruleset. A ruleset consists of a number of parser rules, with each parser rule specifying how to highlight a specific syntax token. There must be at least one ruleset in each edit mode. There can also be more than one, with different rulesets being used to highlight different parts of a buffer (for example, in HTML mode, one rule set highlights HTML tags, and another highlights inline JavaScript). For information about using more than one ruleset, see the section called "The SPAN Rule".

The RULES tag supports the following attributes, all of which are optional:

Here is an example RULES tag:

<RULES IGNORE_CASE="FALSE" HIGHLIGHT_DIGITS="TRUE">
    ... parser rules go here ...
</RULES>

Rule Ordering Requirements

You might encounter this very common pitfall when writing your own modes.

Since jEdit checks buffer text against parser rules in the order they appear in the ruleset, more specific rules must be placed before generalized ones, otherwise the generalized rules will catch everything.

This is best demonstrated with an example. The following is incorrect rule ordering:

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

If you write the above in a rule set, any occurrence of "[" (even things like "[!DEFINE", etc) will be highlighted using the first rule, because it will be the first to match. This is most likely not the intended behavior.

The problem can be solved by placing the more specific rule before the general one:

<SPAN TYPE="KEYWORD1">
    <BEGIN>[!</BEGIN>
    <END>]</END>
</SPAN>

<SPAN TYPE="MARKUP">
    <BEGIN>[</BEGIN>
    <END>]</END>
</SPAN>

Now, if the buffer contains the text "[!SPECIAL]", the rules will be checked in order, and the first rule will be the first to match. However, if you write "[FOO]", it will be highlighted using the second rule, which is exactly what you would expect.

Per-Ruleset Properties

The PROPS tag (described in the section called "The PROPS Tag") can also be placed inside the RULES tag to define ruleset-specific properties. Only the following properties can be set on a per-ruleset basis:

  • commentEnd - the comment end string.

  • commentStart - the comment start string.

  • lineComment - the line comment string.

These properties are used by the commenting commands to implement context-sensitive comments; see the section called "Commenting Out Code".

The TERMINATE Rule

The TERMINATE rule specifies that parsing should stop after the specified number of characters have been read from a line. The number of characters to terminate after should be specified with the AT_CHAR attribute. Here is an example:

<TERMINATE AT_CHAR="1" />

This rule is used in Patch mode, for example, because only the first character of each line affects highlighting.

The WHITESPACE Rule

The WHITESPACE rule specifies characters which are to be treated as keyword delimiters. Most rulesets will have WHITESPACE tags for spaces and tabs. Here is an example:

<WHITESPACE> </WHITESPACE>
<WHITESPACE>        </WHITESPACE>

The SPAN Rule

The SPAN rule highlights text between a start and end string. The start and end strings are specified inside child elements of the SPAN tag. The following attributes are supported:

  • TYPE - The token type to highlight the span with. See the section called "Token Types" for a list of token types

  • AT_LINE_START - If set to TRUE, the span will only be highlighted if the start sequence occurs at the beginning of a line

  • EXCLUDE_MATCH - If set to TRUE, the start and end sequences will not be highlighted, only the text between them will

  • NO_LINE_BREAK - If set to TRUE, the span will be highlighted with the INVALID token type if it spans more than one line

  • NO_WORD_BREAK - If set to TRUE, the span will be highlighted with the INVALID token type if it includes whitespace

  • DELEGATE - text inside the span will be highlighted with the specified ruleset. To delegate to a ruleset defined in the current mode, just specify its name. To delegate to a ruleset defined in another mode, specify a name of the form mode::ruleset. Note that the first (unnamed) ruleset in a mode is called "MAIN".

Note

Do not delegate to rulesets that define a TERMINATE rule (examples of such rulesets include text::MAIN and patch::MAIN). It won't work.

Here is a SPAN that highlights Java string literals, which cannot include line breaks:

<SPAN TYPE="LITERAL1" NO_LINE_BREAK="TRUE">
   <BEGIN>"</BEGIN>
   <END>"</END>
</SPAN>

Here is a SPAN that highlights Java documentation comments by delegating to the "JAVADOC" ruleset defined elsewhere in the current mode:

<SPAN TYPE="COMMENT2" DELEGATE="JAVADOC">
   <BEGIN>/**</BEGIN>
   <END>*/</END>
</SPAN>

Here is a SPAN that highlights HTML cascading stylesheets inside <STYLE> tags by delegating to the main ruleset in the CSS edit mode:

<SPAN TYPE="MARKUP" DELEGATE="css::MAIN">
   <BEGIN>&lt;style&gt;</BEGIN>
   <END>&lt;/style&gt;</END>
</SPAN>

Tip

The <END> tag is optional. If it is not specified, any occurrence of the start string will cause the remainder of the buffer to be highlighted with this rule.

This can be very useful when combined with delegation.

The EOL_SPAN Rule

An EOL_SPAN is similar to a SPAN except that highlighting stops at the end of the line, not after the end sequence is found. The text to match is specified between the opening and closing EOL_SPAN tags. The following attributes are supported:

  • TYPE - The token type to highlight the span with. See the section called "Token Types" for a list of token types

  • AT_LINE_START - If set to TRUE, the span will only be highlighted if the start sequence occurs at the beginning of a line

  • EXCLUDE_MATCH - If set to TRUE, the start sequence will not be highlighted, only the text after it will

Here is an EOL_SPAN that highlights C++ comments:

<EOL_SPAN TYPE="COMMENT1">//</EOL_SPAN>

The MARK_PREVIOUS Rule

The MARK_PREVIOUS rule highlights from the end of the previous syntax token to the matched text. The text to match is specified between opening and closing MARK_PREVIOUS tags. The following attributes are supported:

  • TYPE - The token type to highlight the text with. See the section called "Token Types" for a list of token types

  • AT_LINE_START - If set to TRUE, the text will only be highlighted if it occurs at the beginning of the line

  • EXCLUDE_MATCH - If set to TRUE, the match will not be highlighted, only the text before it will

Here is a rule that highlights labels in Java mode (for example, "XXX:"):

<MARK_PREVIOUS AT_LINE_START="TRUE"
    EXCLUDE_MATCH="TRUE">:</MARK_PREVIOUS>

The MARK_FOLLOWING Rule

The MARK_FOLLOWING rule highlights from the start of the match to the next syntax token. The text to match is specified between opening and closing MARK_FOLLOWING tags. The following attributes are supported:

  • TYPE - The token type to highlight the text with. See the section called "Token Types" for a list of token types

  • AT_LINE_START - If set to TRUE, the text will only be highlighted if the start sequence occurs at the beginning of a line

  • EXCLUDE_MATCH - If set to TRUE, the match will not be highlighted, only the text after it will

Here is a rule that highlights variables in Unix shell scripts ("$CLASSPATH", "$IFS", etc):

<MARK_FOLLOWING TYPE="KEYWORD2">$</MARK_FOLLOWING>

The SEQ Rule

The SEQ rule highlights fixed sequences of text. The text to highlight is specified between opening and closing SEQ tags. The following attributes are supported:

  • TYPE - the token type to highlight the sequence with. See the section called "Token Types" for a list of token types

  • AT_LINE_START - If set to TRUE, the sequence will only be highlighted if it occurs at the beginning of a line

The following rules highlight a few Java operators:

<SEQ TYPE="OPERATOR">+</SEQ>
<SEQ TYPE="OPERATOR">-</SEQ>
<SEQ TYPE="OPERATOR">*</SEQ>
<SEQ TYPE="OPERATOR">/</SEQ>

The KEYWORDS Rule

There can only be one KEYWORDS tag per ruleset. The KEYWORDS rule defines keywords to highlight. Keywords are similar to SEQs, except that SEQs match anywhere in the text, whereas keywords only match whole words.

The KEYWORDS tag supports only one attribute, IGNORE_CASE. If set to FALSE, keywords will be case sensitive. Otherwise, case will not matter. Default is TRUE.

Each child element of the KEYWORDS tag should be named after the desired token type, with the keyword text between the start and end tags. For example, the following rule highlights the most common Java keywords:

<KEYWORDS IGNORE_CASE="FALSE">
   <KEYWORD1>if</KEYWORD1>
   <KEYWORD1>else</KEYWORD1>
   <KEYWORD3>int</KEYWORD3>
   <KEYWORD3>void</KEYWORD3>
</KEYWORDS>

Token Types

Parser rules can highlight tokens using any of the following token types:

  • NULL - no special highlighting is performed on tokens of type NULL

  • COMMENT1

  • COMMENT2

  • FUNCTION

  • INVALID - tokens of this type are automatically added if a NO_WORD_BREAK or NO_LINE_BREAK SPAN spans more than one word or line, respectively.

  • KEYWORD1

  • KEYWORD2

  • KEYWORD3

  • LABEL

  • LITERAL1

  • LITERAL2

  • MARKUP

  • OPERATOR