JavaCC

Table of Contents

1. JavaCC 简介

JavaCC (Java Compiler Compiler) is an open source parser generator and lexical analyzer generator written in the Java programming language. JavaCC is similar to yacc in that it generates a parser from a formal grammar written in EBNF notation. Unlike yacc, however, JavaCC generates top-down parsers.

参考:https://javacc.java.net/
注:学习 JavaCC 最快的方法是做一遍它自带的 examples

1.1. JavaCC 的 features

  • TOP-DOWN:JavaCC 产生自顶向下的语法分析器,而 YACC 等工具则产生的是自底向上的语法分析器。采用自顶向下的分析方法允许更通用的语法(但是包含左递归的语法除外)。自顶向下的语法分析器还有其他的一些优点,比如:易于调试,可以分析语法中的任何非终结符,可以在语法分析的过程中在语法分析树中上下传值等。
  • LARGE USER: COMMUNTIY:是一个用 JAVA 开发的最受欢迎的语法分析生成器。拥有成百上千的下载量和不计其数是使用者。我们的邮件列表(https://javacc.dev.java.net/doc/mailinglist.html )和新闻组(comp.compilers.tools.JavaCC)里的参与者有 1000 多人。
  • LEXICAL AND GRAMMAR SPECIFICATIONS IN ONE FILE:词法规范(如正则表达式、字符串等)和语法规范(BNF 范式)书写在同一个文件里。这使得语法易读和易维护。
  • TREE BUILDING PREPROCESSOR:JavaCC 提供的 JJTree 工具,是一个强有力的语法树构造的预处理程序。
  • EXTREMELY CUSTOMIZABLE:JavaCC 提供了多种不同的选项供用户自定义 JavaCC 的行为和它所产生的语法分析器的行为。
  • CERTIFIED TO BE 100% PURE JAVA:JavaCC 可以在任何 java 平台 V1.1 以后的版本上运行。它可以不需要特别的移植工作便可在多种机器上运行。是 Java 语言“Write Once, Run Everywhere”特性的证明。
  • DOCUMENT GENERATION:JavaCC 包括一个叫 JJDoc 的工具,它可以把文法文件转换成文本本件(Html).
  • MANY MANY EXAMPLES:JavaCC 的发行版包括一系列的包括 Java 和 HTML 文法的例子。这些例子和相应的文档是学习 JavaCC 的捷径。
  • INTERNATIONALIZED:JavaCC 的词法分析器可以处理全部的 Unicode 输入,并且词法规范何以包括任意的 Unicode 字符。这使得语言元素的描述,例如 Java 标识符变得容易。
  • SYNTACTIC AND SEMANTIC LOOKAHEAD SPECIFICATIONS:默认的,JavaCC 产生的是 LL(1)的语法分析器,然而有许多语法不是 LL(1)的。JavaCC 提供了根据语法和语义向前看的能力来解决在一些局部的移进-归约的二义性。例如,一个 LL(k)的语法分析器只在这些有移进-归约冲突的地方保持 LL(k),而在其他地方为了更好的效率而保持 LL(1)。移进-归约和归约-归约冲突不是自顶向下语法分析器的问题。
  • PERMITS EXTENDED BNF SPECIFICATIONS:JavaCC 允许拓展的 BNF 范式——例如(A)*,(A)+等。 拓展的 BNF 范式在某种程度上解决了左递归。 事实上,拓展的 BNF 范式写成 A ::= y(x)* 或 A ::= Ax|y 更容易阅读。
  • LEXICAL STATES AND LEXICAL ACTIONS:JavaCC 提供了像 lex 的词法状态和词法动作的能力。
  • CASE-INSENSITIVE LEXICAL ANALYSIS:词法描述可以在整个词法描述的全局域或者独立的词法描述中定义大小写不敏感的 Tokens。
  • EXTENSIVE DEBUGGING CAPABILITIES:使用选项 DEBUG_PARSER, DEBUG_LOOKAHEAD, 和 DEBUG_TOKEN_MANAGER,使用者可以在语法分析和 Token 处理中使用深层次的分析。
  • SPECIAL TOKENS:Tokens 可以在词法说明中被定义成特殊的 Tokens 从而在语法分析的过程中被忽略,但这些 Tokens 可以通过工具进行处理。
  • VERY GOOD ERROR REPORTING:JavaCC 的错误提示在众多语法分析生成器中是最好的。JavaCC 产生的语法分析器可以清楚的指出语法分析的错误并提供完整的诊断信息。

参考:https://javacc.java.net/doc/features.html

1.2. Where is it used?

Here is a non-comprehensive list of software built using JavaCC:

  • Apache Derby
  • BeanShell
  • FreeMarker
  • PMD
  • Vaadin
  • Apache Lucene
  • JavaParser

参考:http://en.wikipedia.org/wiki/JavaCC

1.3. What does JavaCC not do?

JavaCC does not automate the building of trees (or any other specific parser output), although there are at least two tree building tools JJTree and JTB based on JavaCC, and building trees "by hand" with a JavaCC based parser is easy.
JavaCC does not generate output languages. However once you have a tree, it is easy to generate string output from it.

注:JavaCC 不自动生成语法树!可以用 JJTree 来生成语法树。

1.4. JavaCC 的输出文件

在默认的选项下,javacc 生成 7 个文件(4个公共文件,每个 grammar 都一样,3个定制文件)
4 个公共文件是:
– SimpleCharStream.java — represent the stream of input characters.
– Token.java — represents a single input token
– TokenMgrError.java — an error thrown from the token manager.
– ParseException.java — an exception indicating that the input did not conform to the parser’s grammar.

3 个定制文件是(XXX is whatever name you choose):
– XXX .java — the parser class
– XXX TokenManager.java — the token manager class.
– XXX Constants.java — an interface associating token classes with symbolic names.

和 ANTLR 不同,JavaCC 产生所有运行程序需要的所有文件,不再需要 runtime 库支持。

2. Token Manager(JavaCC 的词法分析器)

JavaCC [tm]: TokenManager MiniTutorial
摘自:https://javacc.java.net/doc/tokenmanager.html

The JavaCC [tm] lexical specification is organized into a set of "lexical states". Each lexical state is named with an identifier. There is a standard lexical state called DEFAULT. The generated token manager is at any moment in one of these lexical states. When the token manager is initialized, it starts off in the DEFAULT state, by default. The starting lexical state can also be specified as a parameter while constructing a token manager object.

Each lexical state contains an ordered list of regular expressions; the order is derived from the order of occurrence in the input file. There are four kinds of regular expressions: SKIP, MORE, TOKEN, and SPECIAL_TOKEN.

All regular expressions that occur as expansion units in the grammar are considered to be in the DEFAULT lexical state and their order of occurrence is determined by their position in the grammar file.

A token is matched as follows: All regular expressions in the current lexical state are considered as potential match candidates. The token manager consumes the maximum number of characters from the input stream possible that match one of these regular expressions. That is, the token manager prefers the longest possible match. If there are multiple longest matches (of the same length), the regular expression that is matched is the one with the earliest order of occurrence in the grammar file.

As mentioned above, the token manager is in exactly one state at any moment. At this moment, the token manager only considers the regular expressions defined in this state for matching purposes. After a match, one can specify an action to be executed as well as a new lexical state to move to. If a new lexical state is not specified, the token manager remains in the current state.

The regular expression kind specifies what to do when a regular expression has been successfully matched:

SKIP
Simply throw away the matched string (after executing any lexical action).
MORE
Continue (to whatever the next state is) taking the matched string along. This string will be a prefix of the new matched string.
TOKEN
Create a token using the matched string and send it to the parser (or any caller).
SPECIAL_TOKEN
Creates a special token that does not participate in parsing. Already described earlier. (The mechanism of accessing special tokens is at the end of this page)

Whenever the end of file <EOF> is detected, it causes the creation of an <EOF> token (regardless of the current state of the lexical analyzer). However, if an <EOF> is detected in the middle of a match for a regular expression, or immediately after a MORE regular expression has been matched, an error is reported.

After the regular expression is matched, the lexical action is executed. All the variables (and methods) declared in the TOKEN_MGR_DECLS region (see below) are available here for use. In addition, the variables and methods listed below are also available for use.

Immediately after this, the token manager changes state to that specified (if any).

After that the action specified by the kind of the regular expression is taken (SKIP, MORE, ... ). If the kind is TOKEN, the matched token is returned. If the kind is SPECIAL_TOKEN, the matched token is saved to be returned along with the next TOKEN that is matched.

2.1. Variables are available for use within lexical actions

The following variables are available for use within lexical actions:

  1. StringBuffer image (READ/WRITE)
  2. int lengthOfMatch (READ ONLY)
  3. int curLexState (READ ONLY)
  4. inputStream (READ ONLY):
  5. Token matchedToken (READ/WRITE)
  6. void SwitchTo(int)

2.1.1. StringBuffer image (READ/WRITE)

"image" (different from the "image" field of the matched token) is aStringBuffer variable that contains all the characters that have beenmatched since the last SKIP, TOKEN, or SPECIAL_TOKEN. You are freeto make whatever changes you wish to it so long as you do not assignit to null (since this variable is used by the generated token manageralso). If you make changes to "image", this change is passed on tosubsequent matches (if the current match is a MORE). The content of"image" does not automatically get assigned to the "image" fieldof the matched token. If you wish this to happen, you must explicitlyassign it in a lexical action of a TOKEN or SPECIAL_TOKEN regularexpression.

Example:

<DEFAULT> MORE : { "a" : S1 }

<S1> MORE :
{
  "b"
    { int l = image.length()-1; image.setCharAt(l, image.charAt(l).toUpperCase()); }
    ^1                                                                             ^2
    : S2
}

<S2> TOKEN :
{
  "cd" { x = image; } : DEFAULT
       ^3
}

In the above example, the value of "image" at the 3 points marked by^1, ^2, and ^3 are:

At ^1: "ab"
At ^2: "aB"
At ^3: "aBcd"

2.1.2. int lengthOfMatch (READ ONLY)

This is the length of the current match (is not cumulative over MORE's).See example below. You should not modify this variable.

Example:
Using the same example as above, the values of "lengthOfMatch" are:

At ^1: 1 (the size of "b")
At ^2: 1 (does not change due to lexical actions)
At ^3: 2 (the size of "cd")

2.1.3. int curLexState (READ ONLY)

This is the index of the current lexical state. You should not modifythis variable. Integer constants whose names are those of the lexicalstate are generated into the ...Constants file, so you can refer tolexical states without worrying about their actual index value.

2.1.4. inputStream (READ ONLY)

This is an input stream of the appropriate type (one ofASCII_CharStream, ASCII_UCodeESC_CharStream, UCode_CharStream, orUCode_UCodeESC_CharStream depending on the values of optionsUNICODE_INPUT and JAVA_UNICODE_ESCAPE). The stream is currently atthe last character consumed for this match. Methods of inputStreamcan be called. For example, getEndLine and getEndColumn can be calledto get the line and column number information for the current match.inputStream may not be modified.

2.1.5. Token matchedToken (READ/WRITE)

This variable may be used only in actions associated with TOKEN andSPECIAL_TOKEN regular expressions. This is set to be the token thatwill get returned to the parser. You may change this variable andthereby cause the changed token to be returned to the parser insteadof the original one. It is here that you can assign the value ofvariable "image" to "matchedToken.image". Typically that's how yourchanges to "image" has effect outside the lexical actions.

Example:
If we modify the last regular expression specification of theabove example to:

<S2> TOKEN :
{
  "cd" { matchedToken.image = image.toString(); } : DEFAULT
}

Then the token returned to the parser will have its ".image" fieldset to "aBcd". If this assignment was not performed, then the".image" field will remain as "abcd".

2.1.6. void SwitchTo(int)

Calling this method switches you to the specified lexical state. Thismethod may be called from parser actions also (in addition to beingcalled from lexical actions). However, care must be taken when usingthis method to switch states from the parser since the lexicalanalysis could be many tokens ahead of the parser in the presence oflarge lookaheads. When you use this method within a lexical action,you must ensure that it is the last statement executed in the action(otherwise, strange things could happen). If there is a state changespecified using the ": state" syntax, it overrides all switchTo calls,hence there is no point having a switchTo call when there is anexplicit state change specified. In general, calling this methodshould be resorted to only when you cannot do it any other way. Usingthis method of switching states also causes you to lose some of thesemantic checking that JavaCC does when you use the standard syntax.

2.2. Lexical actions

Lexical actions have access to a set of class level declarations. These declarations are introduced within the JavaCC file using the following syntax:

token_manager_decls ::=
  "TOKEN_MGR_DECLS" ":"
  "{" java_declarations_and_code "}"

These declarations are accessible from all lexical actions.

2.2.1. Examples

Example 1: Comments

SKIP :
{
  "/*" : WithinComment
}

<WithinComment> SKIP :
{
  "*/" : DEFAULT
}

<WithinComment> MORE :
{
  <~[]>
}

Example 2: String Literals with actions to print the length of the string

TOKEN_MGR_DECLS :
{
  int stringSize;
}

MORE :
{
  "\"" {stringSize = 0;} : WithinString
}

<WithinString> TOKEN :
{
  <STRLIT: "\""> {System.out.println("Size = " + stringSize);} : DEFAULT
}

<WithinString> MORE :
{
  <~["\n","\r"]> {stringSize++;}
}

2.3. How special tokens are sent to the parser

Special tokens are like tokens, except that they are permitted to appear anywhere in the input file (between any two tokens). Special tokens can be specified in the grammar input file using the reserved word "SPECIAL_TOKEN" instead of "TOKEN" as in:

SPECIAL_TOKEN :
{
  <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>
}

Any regular expression defined to be a SPECIAL_TOKEN may be accessed in a special manner from user actions in the lexical and grammar specifications. This allows these tokens to be recovered during parsing while at the same time these tokens do not participate in the parsing.

JavaCC has been bootstrapped to use this feature to automatically copy relevant comments from the input grammar file into the generated files.

Details:
The class Token now has an additional field:

   Token specialToken;

This field points to the special token immediately prior to the current token (special or otherwise). If the token immediately prior to the current token is a regular token (and not a special token), then this field is set to null. The "next" fields of regular tokens continue to have the same meaning - i.e., they point to the next regular token except in the case of the EOF token where the "next" field is null. The "next" field of special tokens point to the special token immediately following the current token. If the token immediately following the current token is a regular token, the "next" field is set to null.

This is clarified by the following example. Suppose you wish to print all special tokens prior to the regular token "t" (but only those that are after the regular token before "t"):

if (t.specialToken == null) return;
  // The above statement determines that there are no special tokens
  // and returns control to the caller.
Token tmp_t = t.specialToken;
while (tmp_t.specialToken != null) tmp_t = tmp_t.specialToken;
  // The above line walks back the special token chain until it
  // reaches the first special token after the previous regular
  // token.
while (tmp_t != null) {
  System.out.println(tmp_t.image);
  tmp_t = tmp_t.next;
}
  // The above loop now walks the special token chain in the forward
  // direction printing them in the process.

3. JavaCC 语法文件格式

javacc_input ::=
	javacc_options
	"PARSER_BEGIN" "(" <IDENTIFIER> ")"
	java_compilation_unit
	"PARSER_END" "(" <IDENTIFIER> ")"
	( production )*
	<EOF>


production ::=
	javacode_production
|	regular_expr_production
|	bnf_production
|	token_manager_decls


javacode_production ::=
	"JAVACODE"
	java_access_modifier java_return_type java_identifier "(" java_parameter_list ")"
	java_block


regular_expr_production ::=
	[ lexical_state_list ]
	regexpr_kind [ "[" "IGNORE_CASE" "]" ] ":"
	"{" regexpr_spec ( "|" regexpr_spec )* "}"


bnf_production ::=
	java_access_modifier java_return_type java_identifier "(" java_parameter_list ")" ":"
	java_block
	"{" expansion_choices "}"


token_manager_decls ::=
	"TOKEN_MGR_DECLS" ":" java_block

其完整描述请参见:https://javacc.java.net/doc/javaccgrm.html

4. JJTree

JJTree is a preprocessor for JavaCC that inserts parse tree building actions at various places in the JavaCC source.

4.1. JJTree 的输出文件

在默认选项下,除对应的.jj 文件,JJTree 还会生成下面文件:
Node.java
SimpleNode.java
XXXTreeConstants.java
JJTXXXState.java

4.2. JJTree Reference Documentation

JavaCC [tm]: JJTree Reference Documentation
摘自:https://javacc.java.net/doc/JJTree.html

4.2.1. Introduction

JJTree is a preprocessor for JavaCC [tm] that inserts parse tree building actions at various places in the JavaCC source. The output of JJTree is run through JavaCC to create the parser. This document describes how to use JJTree, and how you can interface your parser to it.

By default JJTree generates code to construct parse tree nodes for each nonterminal in the language. This behavior can be modified so that some nonterminals do not have nodes generated, or so that a node is generated for a part of a production's expansion.

JJTree defines a Java interface Node that all parse tree nodes must implement. The interface provides methods for operations such as setting the parent of the node, and for adding children and retrieving them.

JJTree operates in one of two modes, simple and multi (for want of better terms). In simple mode each parse tree node is of concrete type SimpleNode; in multi mode the type of the parse tree node is derived from the name of the node. If you don't provide implementations for the node classes JJTree will generate sample implementations based on SimpleNode for you. You can then modify the implementations to suit.

Although JavaCC is a top-down parser, JJTree constructs the parse tree from the bottom up. To do this it uses a stack where it pushes nodes after they have been created. When it finds a parent for them, it pops the children from the stack and adds them to the parent, and finally pushes the new parent node itself. The stack is open, which means that you have access to it from within grammar actions: you can push, pop and otherwise manipulate its contents however you feel appropriate. See Node Scopes and User Actions below for more important information.

JJTree provides decorations for two basic varieties of nodes, and some syntactic shorthand to make their use convenient.

  1. A definite node is constructed with a specific number of children. That many nodes are popped from the stack and made the children of the new node, which is then pushed on the stack itself. You notate a definite node like this:

    #ADefiniteNode(INTEGER EXPRESSION)

    A definite node descriptor expression can be any integer expression, although literal integer constants are by far the most common expressions.

  2. A conditional node is constructed with all of the children that were pushed on the stack within its node scope if and only if its condition evaluates to true. If it evaluates to false, the node is not constructed, and all of the children remain on the node stack. You notate a conditional node like this:

    #ConditionalNode(BOOLEAN EXPRESSION)

    A conditional node descriptor expression can be any boolean expression. There are two common shorthands for conditional nodes:

    2.1 Indefinite nodes
    #IndefiniteNode is short for #IndefiniteNode(true)

    2.2 Greater-than nodes
    #GTNode(>1) is short for #GTNode(jjtree.arity() > 1)

    The indefinite node shorthand (1) can lead to ambiguities in the JJTree source when it is followed by a parenthesized expansion. In those cases the shorthand must be replaced by the full expression. For example:

    	  ( ... ) #N ( a() )

is ambiguous; you have to use the explicit condition:

     ( ... ) #N(true) ( a()
    	  )

WARNING: node descriptor expressions should not have side-effects. JJTree doesn't specify how many times the expression will be evaluated.

By default JJTree treats each nonterminal as an indefinite node and derives the name of the node from the name of its production. You can give it a different name with the following syntax:

    void P1() #MyNode : { ... } { ... }

When the parser recognizes a P1 nonterminal it begins an indefinite node. It marks the stack, so that any parse tree nodes created and pushed on the stack by nonterminals in the expansion for P1 will be popped off and made children of the node MyNode.

If you want to suppress the creation of a node for a production you can use the following syntax:

    void P2() #void : { ... } { ... }

Now any parse tree nodes pushed by nonterminals in the expansion of P2 will remain on the stack, to be popped and made children of a production further up the tree. You can make this the default behavior for non-decorated nodes by using the NODE_DEFAULT_VOID option.

    void P3() : {}
    {
        P4() ( P5() )+ P6()
    }

In this example, an indefinite node P3 is begun, marking the stack, and then a P4 node, one or more P5 nodes and a P6 node are parsed. Any nodes that they push are popped and made the children of P3. You can further customize the generated tree:

    void P3() : {}
    {
        P4() ( P5() )+ #ListOfP5s P6()
    }

Now the P3 node will have a P4 node, a ListOfP5s node and a P6 node as children. The #Name construct acts as a postfix operator, and its scope is the immediately preceding expansion unit.

4.2.2. Node Scopes and User Actions

Each node is associated with a node scope. User actions within this scope can access the node under construction by using the special identifier jjtThis to refer to the node. This identifier is implicitly declared to be of the correct type for the node, so any fields and methods that the node has can be easily accessed.

A scope is the expansion unit immediately preceding the node decoration. This can be a parenthesized expression. When the production signature is decorated (perhaps implicitly with the default node), the scope is the entire right hand side of the production including its declaration block.

You can also use an expression involving jjtThis on the left hand side of an expansion reference. For example:

    ... ( jjtThis.my_foo = foo() ) #Baz ...

Here jjtThis refers to a Baz node, which has a field called my_foo. The result of parsing the production foo() is assigned to that my_foo.

The final user action in a node scope is different from all the others. When the code within it executes, the node's children have already been popped from the stack and added to the node, which has itself been pushed onto the stack. The children can now be accessed via the node's methods such as jjtGetChild().

User actions other than the final one can only access the children on the stack. They have not yet been added to the node, so they aren't available via the node's methods.

A conditional node that has a node descriptor expression that evaluates to false will not get added to the stack, nor have children added to it. The final user action within a conditional node scope can determine whether the node was created or not by calling the nodeCreated() method. This returns true if the node's condition was satisfied and the node was created and pushed on the node stack, and false otherwise.

4.2.3. Exception handling

An exception thrown by an expansion within a node scope that is not caught within the node scope is caught by JJTree itself. When this occurs, any nodes that have been pushed on to the node stack within the node scope are popped and thrown away. Then the exception is rethrown.

The intention is to make it possible for parsers to implement error recovery and continue with the node stack in a known state.

WARNING: JJTree currently cannot detect whether exceptions are thrown from user actions within a node scope. Such an exception will probably be handled incorrectly.

4.2.4. Node Scope Hooks

If the NODE_SCOPE_HOOK option is set to true, JJTree generates calls to two user-defined parser methods on the entry and exit of every node scope. The methods must have the following signatures:

    void jjtreeOpenNodeScope(Node n)
    void jjtreeCloseNodeScope(Node n)

If the parser is STATIC then these methods will have to be declared as static as well. They are both called with the current node as a parameter.

One use might be to store the parser object itself in the node so that state that should be shared by all nodes produced by that parser can be provided. For example, the parser might maintain a symbol table.

    void jjtreeOpenNodeScope(Node n)
    {
      ((SimpleNode)n).jjtSetValue(getSymbolTable());
    }

    void jjtreeCloseNodeScope(Node n)
    {
    }

Where getSymbolTable() is a user-defined method to return a symbol table structure for the node.

4.2.5. Tracking Tokens

It is often useful to keep track of each node's first and last token so that input can be easily reproduced again. By setting the TRACK_TOKENS option the generated SimpleNode class will contain 4 extra methods:

      public Token jjtGetFirstToken()
      public void jjtSetFirstToken(Token token)
      public Token jjtGetLastToken()
      public void jjtSetLastToken(Token token)

The first and last token for each node will be set up automatically when the parser is run.

4.2.6. The Life Cycle of a Node

A node goes through a well determined sequence of steps as it is built. This is that sequence viewed from the perspective of the node itself:

  1. the node's constructor is called with a unique integer parameter. This parameter identifies the kind of node and is especially useful in simple mode. JJTree automatically generates a file called parserTreeConstants.java that declares valid constants. The names of constants are derived by prepending JJT to the uppercase names of nodes, with dot symbols (".") replaced by underscore symbols ("_"). For convenience, an array of Strings called jjtNodeName[] that maps the constants to the unmodified names of nodes is maintained in the same file.
  2. the node's jjtOpen() method is called.
  3. if the option NODE_SCOPE_HOOK is set, the user-defined parser method openNodeScope() is called and passed the node as its parameter. This method can initialize fields in the node or call its methods. For example, it might store the node's first token in the node.
  4. if an unhandled exception is thrown while the node is being parsed then the node is abandoned. JJTree will never refer to it again. It will not be closed, and the user-defined node scope hook closeNodeHook() will not be called with it as a parameter.
  5. otherwise, if the node is conditional and its conditional expression evaluates to false then the node is abandoned. It will not be closed, although the user-defined node scope hook closeNodeHook() might be called with it as a parameter.
  6. otherwise, all of the children of the node as specified by the integer expression of a definite node, or all the nodes that were pushed on the stack within a conditional node scope are added to the node. The order they are added is not specified.
  7. the node's jjtClose() method is called.
  8. the node is pushed on the stack.
  9. if the option NODE_SCOPE_HOOK is set, the user-defined parser method closeNodeScope() is called and passed the node as its parameter.
  10. if the node is not the root node, it is added as a child of another node and its jjtSetParent() method is called.

4.2.7. Visitor Support

JJTree provides some basic support for the visitor design pattern. If the VISITOR option is set to true JJTree will insert an jjtAccept() method into all of the node classes it generates, and also generate a visitor interface that can be implemented and passed to the nodes to accept.

The name of the visitor interface is constructed by appending Visitor to the name of the parser. The interface is regenerated every time that JJTree is run, so that it accurately represents the set of nodes used by the parser. This will cause compile time errors if the implementation class has not been updated for the new nodes. This is a feature.

4.2.8. Options

JJTree supports the following options on the command line and in the JavaCC options statement:

BUILD_NODE_FILES (default: true)
Generate sample implementations for SimpleNode and any other nodes used in the grammar.
MULTI (default: false)
Generate a multi mode parse tree. The default for this is false, generating a simple mode parse tree.
NODE_DEFAULT_VOID (default: false)
Instead of making each non-decorated production an indefinite node, make it void instead.
NODE_CLASS (default: "")
If set defines the name of a user-supplied class that will extend SimpleNode. Any tree nodes created will then be subclasses of NODE_CLASS.
NODE_FACTORY (default: "")
Specify a class containing a factory method with following signature to construct nodes:
public static Node jjtCreate(int id)
For backwards compatibility, the value false may also be specified, meaning that SimpleNode will be used as the factory class.
NODE_PACKAGE (default: "")
The package to generate the node classes into. The default for this is the parser package.
NODE_EXTENDS (default: "") Deprecated
The superclass for the SimpleNode class. By providing a custom superclass you may be able to avoid the need to edit the generated SimpleNode.java. See the examples/Interpreter for an example usage.
NODE_PREFIX (default: "AST")
The prefix used to construct node class names from node identifiers in multi mode. The default for this is AST.
NODE_SCOPE_HOOK (default: false)
Insert calls to user-defined parser methods on entry and exit of every node scope. See Node Scope Hooks above.
NODE_USES_PARSER (default: false)
JJTree will use an alternate form of the node construction routines where it passes the parser object in. For example,

     public static Node MyNode.jjtCreate(MyParser p, int id);
    	  MyNode(MyParser p, int id);

TRACK_TOKENS (default: false
Insert jjtGetFirstToken(), jjtSetFirstToken(), getLastToken(), and jjtSetLastToken() methods in SimpleNode. The FirstToken is automatically set up on entry to a node scope; the LastToken is automatically set up on exit from a node scope.
STATIC (default: true)
Generate code for a static parser. The default for this is true. This must be used consistently with the equivalent JavaCC options. The value of this option is emitted in the JavaCC source.
VISITOR (default: false)
Insert a jjtAccept() method in the node classes, and generate a visitor implementation with an entry for every node type used in the grammar.
VISITOR_DATA_TYPE (default: "Object")
If this option is set, it is used in the signature of the generated jjtAccept() methods and the visit() methods as the type of the data argument.
VISITOR_RETURN_TYPE (default: "Object")
If this option is set, it is used in the signature of the generated jjtAccept() methods and the visit() methods as the return type of the method.
VISITOR_EXCEPTION (default: "")
If this option is set, it is used in the signature of the generated jjtAccept() methods and the visit() methods.
JJTREE_OUTPUT_DIRECTORY (default: use value of OUTPUT_DIRECTORY)
By default, JJTree generates its output in the directory specified in the global OUTPUT_DIRECTORY setting. Explicitly setting this option allows the user to separate the parser from the tree files.

4.2.9. JJTree state

JJTree keeps its state in a parser class field called jjtree. You can use methods in this member to manipulate the node stack.

    final class JJTreeState {
      /* Call this to reinitialize the node stack.  */
      void reset();

      /* Return the root node of the AST. */
      Node rootNode();

      /* Determine whether the current node was actually closed and
	 pushed */
      boolean nodeCreated();

      /* Return the number of nodes currently pushed on the node
         stack in the current node scope. */
      int arity();

      /* Push a node on to the stack. */
      void pushNode(Node n);

      /* Return the node on the top of the stack, and remove it from the
	 stack.  */
      Node popNode();

      /* Return the node currently on the top of the stack. */
      Node peekNode();
    }

4.2.10. Node Objects

    /* All AST nodes must implement this interface.  It provides basic
       machinery for constructing the parent and child relationships
       between nodes. */

    public interface Node {
      /** This method is called after the node has been made the current
	node.  It indicates that child nodes can now be added to it. */
      public void jjtOpen();

      /** This method is called after all the child nodes have been
	added. */
      public void jjtClose();

      /** This pair of methods are used to inform the node of its
	parent. */
      public void jjtSetParent(Node n);
      public Node jjtGetParent();

      /** This method tells the node to add its argument to the node's
	list of children.  */
      public void jjtAddChild(Node n, int i);

      /** This method returns a child node.  The children are numbered
	 from zero, left to right. */
      public Node jjtGetChild(int i);

      /** Return the number of children the node has. */
      int jjtGetNumChildren();
    }

The class SimpleNode implements the Node interface, and is automatically generated by JJTree if it doesn't already exist. You can use this class as a template or superclass for your node implementations, or you can modify it to suit. SimpleNode additionally provides a rudimentary mechanism for recursively dumping the node and its children. You might use this is in action like this:

    {
        ((SimpleNode)jjtree.rootNode()).dump(">");
    }

The String parameter to dump() is used as padding to indicate the tree hierarchy.

Another utility method is generated if the VISITOR options is set:

    {
        public void childrenAccept(MyParserVisitor visitor);
    }

This walks over the node's children in turn, asking them to accept the visitor. This can be useful when implementing preorder and postorder traversals.

4.2.11. Examples

JJTree is distributed with some simple examples containing a grammar that parses arithmetic expressions. See the file examples/JJTreeExamples/README for further details.

There is also an interpreter for a simple language that uses JJTree to build the program representation. See the file examples/Interpreter/README for more information.

Information about an example using the visitor support is in examples/VTransformer/README.

5. JavaCC 简单实例

以识别 db2 的 create view 语句(已经简化)为例:
CREATE VIEW V1 AS SELECT COL1 FROM T1 WITH CASCADED/LOCAL CHECK OPTION;

第一步:创建语法文件 sql.jj

$ cat sql.jj

options {
  IGNORE_CASE=true;
}

PARSER_BEGIN(SQLParser)

public class SQLParser {

  /** Main entry point. */
  public static void main(String args[]) throws ParseException {
    SQLParser parser = new SQLParser(System.in);
    parser.CreateViewStatement();
  }
}

PARSER_END(SQLParser)

/* WHITE SPACE */
SKIP :
{
  " "
| "\t"
| "\n"
| "\r"
}

TOKEN:
{
  <CREATE: "create">
| <VIEW: "view">
| <AS: "as">
| <SELECT: "select">
| <FROM: "from">
| <WITH: "with">
| <CASCADED: "cascaded">
| <LOCAL: "local">
| <CHECK: "check">
| <OPTION: "option">
}

TOKEN :
{
 < SEMICOLON: ";" >
}

TOKEN :
{
  < Id: ["a"-"z","A"-"Z"] ( ["a"-"z","A"-"Z","0"-"9"] )* >
}

void CreateViewStatement() :
{}
{
  <CREATE> <VIEW> <Id> <AS> SelectStatment()
  [ <WITH> [ ( <CASCADED> | <LOCAL> ) ] <CHECK> <OPTION> ]
  <SEMICOLON>
  <EOF>
}

void SelectStatment() :
{}
{
  <SELECT> <Id> <FROM> <Id>
}

第 2 步:测试语法文件

$ javacc sql.jj
$ javac *.java
$ java SQLParser

输入一个正确的 create view 语句:CREATE VIEW V1 AS SELECT COL1 FROM T1 WITH CASCADED CHECK OPTION;
程序接受了这个输入。

第 3 步:利用 jjtree 生成 tree
首先,把 sql.jj 重命名为 sql.jjt。

第 3.1 步:把函数 CreateViewStatement()做下面修改。

void CreateViewStatement() :
{}
{
  <CREATE> <VIEW> <Id> <AS> SelectStatment()
  [ <WITH> [ ( <CASCADED> | <LOCAL> ) ] <CHECK> <OPTION> ]
  <SEMICOLON>
  <EOF>
}

------修改为----->

SimpleNode CreateViewStatement() :
{}
{
  <CREATE> <VIEW> <Id> <AS> SelectStatment()
  [ <WITH> [ ( <CASCADED> | <LOCAL> ) ] <CHECK> <OPTION> ]
  <SEMICOLON>
  <EOF>
  { return jjtThis; }
}

第 3.2 步:把 main 函数做下面修改。

  public static void main(String args[]) throws ParseException {
    SQLParser parser = new SQLParser(System.in);
    parser.CreateViewStatement();
  }

------修改为----->

  public static void main(String args[]) throws ParseException {
    SQLParser parser = new SQLParser(System.in);
    SimpleNode node = parser.CreateViewStatement();
    node.dump("");
  }

第 4 步:测试上面文件,即测试 node.dump("")

$ jjtree sql.jjt
$ javacc sql.jj
$ javac *.java
$ java SQLParser

第 5 步:使用 visitor
jjtree 实现了对“访问者模式”的基本支持。
在*.jjt 中设置 VISITOR = true;
经过 jjtree 处理后,所有产生的 node 类中都会有一个 jjtAccept 方法。
并且会生成类访问者接口文件 XXXVisitor,用户可以实现这个接口对 node 进行“访问”。

请参考 JavaCC 自带的例子:examples/VTransformer

Author: cig01

Created: <2014-11-02 Sun>

Last updated: <2017-12-13 Wed>

Creator: Emacs 27.1 (Org mode 9.4)