Just as JFlex generates lexers, Jacc generates parsers, but what’s the difference? A lexer can recognize words and a parser can recognize whole sentences, or more formally, use lexers to work with regular grammars, and parsers to work with context-free grammars.
This is the reason the two are often used together — you first use a lexer to recognize words and pass those words to a parser which is able to determine if the words form a valid sentence.
Now, Java being Java, there is a choice of parser generators out there: Antlr is ubiquitous, but there are also CoCo/R, JavaCC, SableCC, Cup, Byacc/J and probably many others. Even the venerable Bison is capable of generating Java parsers. Some parsers, like Antlr, CoCo/R and JavaCC don’t even need a separate lexer to feed them words — they can generate one of their own!
So why Jacc? Why not?
Ok so how JFlex and Jacc work together:
JFlex reads the input as a stream of characters and produces a token for Jacc when Jacc asks for one. A token is a string with a meaning. For instance, +, true and 3.14 are all tokens — some of them don’t really need a value aside from their type: true is a Boolean literal, but some of them do: 3.14 is an integer literal with the value of 3.14.
As with JFlex, a Jacc file has three distinct sections:
directives section
%%
rules section
%%
additional code section
Enter fullscreen mode Exit fullscreen mode
Jacc creates a list of all the tokens it expects in a separate file. You specify the file and the token list like this:
%interface ParserTokens
%token X NL
Enter fullscreen mode Exit fullscreen mode
Let’s look at the generated file:
// Output created by jacc on Mon Mar 11 09:54:05 CET 2019
interface ParserTokens {
int ENDINPUT = 0;
int NL = 1;
int X = 2;
int error = 3;
}
Enter fullscreen mode Exit fullscreen mode
It is an interface that doubles as an enumeration. Apart from the two token types we asked for, NL and X, two more are created: one for the end of input and one for errors. Back in the JFlex file you “implement” this “interface”:
%class Lexer
%implements ParserTokens
Enter fullscreen mode Exit fullscreen mode
just so our lexer could see the ENDINPUT, NL, X AND error constants. There are a few more things Jacc expects:
- A function that returns integer values that represent token types (0, 1, or 3 in our example). Naming that function yylex is a tradition, so let’s do that:
%function yylex
%int
Enter fullscreen mode Exit fullscreen mode
- Three more functions: getToken to get the current token code, nextToken to read the next token code and getSemantic to get the current token value:
%{
private int token;
private String semantic;
public int getToken()
{
return token;
}
public String getSemantic()
{
return semantic;
}
public int nextToken()
{
try
{
token = yylex();
}
catch (java.io.IOException e)
{
System.out.println(
"IO exception occured:\n" + e);
}
return token;
}
%}
Enter fullscreen mode Exit fullscreen mode
You may notice that we decided to make the token semantic value a String so we also need to indicate that in the parser:
%semantic String
Enter fullscreen mode Exit fullscreen mode
For our example we’ll just have the lexer recognize the following words: word, Word, wOrd, worD, … , wORD and WORD and new lines. We’ll ignore whitespace.
x = [wW][oO][rR][dD]
nl = \n | \r | \r\n
space = [\t]
%%
Enter fullscreen mode Exit fullscreen mode
When the lexer finds a word it will return the X token (ie. 2 from the ParserTokens interface) with a value of word, Word,… — this is what semantic = yytext(); does. When it encounters a new line it will return ENDINPUT (ie. 0):
{x} { semantic = yytext(); return X; }
{space} { /\* Ignore space \*/ }
{nl} { return ENDINPUT; }
[^] { System.out.println("Error?"); }
Enter fullscreen mode Exit fullscreen mode
That’s it. Now the parser “grammar” in all its glory:
sentence : X { System.out.println("X found: " + $1); }
| sentence X { System.out.println("X found: " + $2); }
;
Enter fullscreen mode Exit fullscreen mode
The grammar is left-recursive, which allows us to have one or more X words in a sentence. $1 and $2 will hold the X semantic value, passed there by the lexer, and we’ll simply print it out.
In the main method we create Lexer and Parser instances and start parsing. One thing to note here is that we have to “prime” the lexer with
parser.lexer.nextToken();
Enter fullscreen mode Exit fullscreen mode
before the parser can use it. For reference, here is the lexer full source
import java.io.\*;
%%
%class Lexer
%implements ParserTokens
%function yylex
%int
%{
private int token;
private String semantic;
public int getToken()
{
return token;
}
public String getSemantic()
{
return semantic;
}
public int nextToken()
{
try
{
token = yylex();
}
catch (java.io.IOException e)
{
System.out.println(
"IO exception occured:\n" + e);
}
return token;
}
%}
x = [wW][oO][rR][dD]
nl = \n | \r | \r\n
space = [\t]
%%
{x} { semantic = yytext(); return X; }
{space} { /\* Ignore space \*/ }
{nl} { return ENDINPUT; }
[^] { System.out.println("Error?"); }
Enter fullscreen mode Exit fullscreen mode
and that of the parser:
%{
import java.io.\*;
%}
%class Parser
%interface ParserTokens
%semantic String
%token X NL
%%
sentence : X { System.out.println("X found: " + $1); }
| sentence X { System.out.println("X found: " + $2); }
;
%%
private Lexer lexer;
public Parser(Reader reader)
{
lexer = new Lexer(reader);
}
public void yyerror(String error)
{
System.err.println("Error: " + error);
}
public static void main(String args[]) throws IOException
{
System.out.println("Interactive evaluation:");
Parser parser = new Parser(
new InputStreamReader(System.in));
parser.lexer.nextToken();
parser.parse();
}
Enter fullscreen mode Exit fullscreen mode
You need to compile Lexer.flex
jflex lexer.flex
Enter fullscreen mode Exit fullscreen mode
and Parser.jacc
jacc parser.jacc
Enter fullscreen mode Exit fullscreen mode
and the three generated Java files ( Lexer.java , Parser.java and ParserTokens.java ):
javac \*.java
Enter fullscreen mode Exit fullscreen mode
to finally be able to run the parser:
java Parser
Enter fullscreen mode Exit fullscreen mode
Here is an example terminal session:
Interactive evaluation:
word Word wOrD WORD
X found: word
X found: Word
X found: wOrD
X found: WORD
Enter fullscreen mode Exit fullscreen mode
You can find the full source code on Github.
Actually this was pretty boring, but we are now free to play with context-free grammars!
暂无评论内容