The AST is constructed using a scanner and parser generator of the
implementor's choice with calls to the KHEPERA library AST construction
routines. At the level of the scanner, KHEPERA provides support for
source code line number and token offset tracking. This support is
optional, but is very helpful for debugging. If the implementor desires
line number and token offset tracking, the scanner must interact with
KHEPERA in three ways: first, each line of source code must be registered.
In versions of lex that support states, providing this information is
trivial (although inefficient), as show in Figure 5. For other scanner
generators, or if scanning efficiency is of great concern, other techniques
can be used. The routine src_line
stores a copy of the line using
low-level string-handling support. While the routines used in these
examples are tailored for lex semantics, the routines are generally
wrapper routines for lower-level KHEPERA functions and would, therefore,
be easy to implement for other front-end tools.
KHEPERA also handles interpretation of line number information generated by the C preprocessor. This requires a simple lex action:
^#\ .* src_cpp_line(yytext, yyleng);
Finally, every scanner action must advance a pointer to the current
position on the current line. This is accomplished by having every action
make a call to src_get(yyleng)
, a minor inconvenience that can be
encapsulated in a macro.
The productions in the parser need only call KHEPERA tree-building
routines--all other work can be reserved for later tree walking. This
tends to simplify the parser description file, and allows the implementor
to concentrate on parsing issues during this phase of development. A few
example yacc productions are shown in Figure 6. The second argument to
tre_mk
is a pointer to the (optional) source position information
obtained during scanning. The abstract representation of the constructed
AST is that of an n-ary tree, and routines are available to walk the
tree using this viewpoint.
Immediately after the parsing phase, the AST is available for printing. Without any pretty-printer description, the AST is printed as a nested S-expression, as shown in Figure 7.