It is recommended that, when a parser is generated, the -trace 3 option is given, which will print all the LR states that are created. This may be helpful when training the parser. Each time the parser is regenerated the states may have different numbers identifying them, even if the grammar is unchanged.
With a newly generated parser, when a parser error is encountered, the line and column numbers are printed and an "unexpected symbol" error message is given. To print more helpful error messages, the parser can be trained interactively. Interactive training also produces a script for future, automatic retraining when a new parser is generated.
As example, one can use the crate created from calcauto.grammar
in Chapter 2.
Modify main.rs by changing the call to parse_with
to
let tree4= calcautoparser::parse_train_with(&mut parser4,"src/calcautoparser.rs");
For input with no errors, this call works the same way as parse_with
The
parse_train_with
function takes a path to a copy of the parser being trained
(it's not recommended to change the copy that you're using this way).
Cargo run will lead to the following (possible) training session, depending on
user input:
PARSER ERROR: unexpected symbol ( on line 2, column 2 ..
>>>TRAINER: if this message is not adequate (for state 1), enter a replacement (default n
o change): missing an operator symbol such as *
>>>TRAINER: should this message be given for all unexpected symbols in the current state?
(default yes) no
PARSER ERROR: unexpected symbol % on line 3, column 2 ..
>>>TRAINER: if this message is not adequate (for state 1), enter a replacement (default n
o change): this symbol is not recognized as a valid operator in this language
Expression tree from parse: Seq([Minus(Negative(Val(5)), Times(Minus(Val(4), Val(2)), Val
(5))), Minus(Minus(Val(5), Val(7)), Negative(Val(9))), Minus(Times(Val(4), Val(3)), Val(9
)), Plus(Val(2), Divide(Val(1), Minus(Minus(Val(2), Val(1)), Val(1)))), Letexp("x", Val(1
0), Plus(Val(2), Var("x"))), Letexp("x", Val(1), Plus(Plus(Var("x"), Letexp("x", Val(10),
Plus(Var("x"), Var("x")))), Var("x"))), Plus(Letexp("x", Val(2), Plus(Var("x"), Var("x")
)), Var("x")), Plus(Letexp("x", Val(4), Divide(Var("x"), Val(2))), Letexp("x", Val(10), T
imes(Var("x"), Letexp("y", Val(100), Divide(Var("y"), Var("x"))))))])
---------------------------------------
result for line 1: -15 ;
result for line 4: 7 ;
result for line 5: 3 ;
Division by zero (expression starting at column 5) on line 6 of Val(1) at column 3 ... Er
ror evaluating line 6;
result for line 7: 12 ;
result for line 8: 22 ;
UNBOUND VARIABLE x ... Error evaluating line 9;
result for line 10: 102 ;
Parser error, best effort after recovery: Some(102)
Notice that error recovery was effective and the parser still produced a usable parse tree: however, the parser's error_occurred flag will be set. It is under consideration as to whether future editions of Rustlr will also allow the error-recovery strategy to be trainable in the same way. For now, only a fixed number of strategies are available. In the opinion of the author, the resync technique is the simplest and most effective.
If the augmented parser is used on the same input, it will display the trained message in addition to "unexpected symbol..."
You can see how training augments the LR state transition table by
examining the load_extras
function at the end of the generated parser:
fn load_extras(parser:&mut RuntimeParser<Expr,Expr>)
{
parser.RSM[1].insert("(",Stateaction::Error("missing an operator symbol such as *"));
parser.RSM[1].insert("ANY_ERROR",Stateaction::Error("this symbol is not recognized as a
valid operator in this language"));
}//end of load_extras: don't change this line as it affects augmentation
When the "unexpected symbol" is recognized as a declared symbol of the grammar, the trainer will be given the option of entering the error message for either
just that symbol, or all unexpected symbols in the same state. If the latter is
chosen then an entry is created for the reserved ANY_ERROR
symbol. If the
unexpected symbol is not recognized as a terminal symbol of the grammar, an
ANY_ERROR
entry is always created. You can see the contents of "state 1"
if you created it with the -trace 3 option. You will of course have to understand the LR parsing algorithm to make use of the information.
When the modified parser runs and encounters another unexpected symbol in the
same state, it will first see if there is an entry for that symbol; if none
exists, it will look for an ANY_ERROR
entry for a message to display.
Thus the two entries do not conflict with eachother.
The interactive session also generated a script file, which would be called "calc4parser.rs_script.txt", with the following contents:
# Rustlr training script for calc4parser.rs
2 2 ( ::: missing an operator symbol such as *
3 2 ANY_ERROR ::: this symbol is not recognized as an operator in this language
This script can be used to retrain a newly genenerated parser (with different state numbers) with the train_from_script function provided the same input from the original training. The line and column numbers of where the errors are expected are recorded in the script. Please note that training from script has not yet been tested on a large scale.