CI-37 Syntactic analysis Input: token sequence Tasks: Parsing: construct derivation according to concrete syntax, Tree construction according to abstract syntax, Error handling (detection, message, recovery) Result: abstract program tree Compiler module parser: deterministic stack automaton, augmented by actions for tree construction top-down parsers: leftmost derivation; tree construction top-down or bottom-up bottom-up parsers: rightmost derivation backwards; tree construction bottom-up Abstract program tree (condensed derivation tree): represented by a data structure in memory for the translation phase to operate on, linear sequence of nodes on a file (costly in runtime), sequence of calls of functions of the translation phase. (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 37 Objectives: Relation between parsing and tree construction In the lecture: * Explain the tasks, use example on CI-16. * Sources of prerequisites: * context-free grammars: "Grundlagen der Programmiersprachen (2nd Semester), or "Berechenbarkeit und formale Sprachen" (3rd Semester), * Tree representation in prefix form, postfix form: "Modellierung" (st Semester); see CI-5. Suggested reading: Kastens / Übersetzerbau, Section 4.1 -------------------------------------------------------------------------------- CI-38 Concrete and abstract syntax concrete syntax abstract syntax context-free grammar context-free grammar defines the structure of source programs defines abstract program trees unambigous usually ambiguous specifies derivation and parser translation phase is based on it parser actions specify the ---> tree construction some chain productions only for syntactic purposekeep only semantically relevant ones Expr ::= Fact have no action no node created symbols of syntactic chain productions comprised in symbol classes Exp={Expr,Fact} same action at structural equivalent productions: Expr ::= Expr AddOpr Fact &BinEx Fact ::= Fact MulOpr Opd &BinEx terminal symbols keep only semantically relevant ones as tree nodes given the concrete syntax and the symbol classes the actions and the abstract syntax can be generated (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 38 Objectives: Distinguish roles and properties of concrete and abstract syntax In the lecture: * Use the expression grammar of CI-39, CI-40 for comparison. * Construct abstract syntax systematically. * Context-free grammars specify trees - not only strings! Is also used in software engineering to specify interfaces. Suggested reading: Kastens / Übersetzerbau, Section 4.1 Assignments: * Generate abstract syntaxes from concrete syntaxes and symbol classes. * Use Eli for that task. Exercise 10 Questions: * Why is no information lost, when an expression is represented by an abstract program tree? * Give examples for semantically irrelevant chain productions outside of expressions. * Explain: XML-based languages are defined by context-free grammars. Their sentences are textual representations of trees. -------------------------------------------------------------------------------- CI-39 Example: concrete expression grammar name production action p1: Expr ::= Expr AddOpr Fact BinEx p2: Expr ::= Fact p3: Fact ::= Fact MulOpr Opd BinEx p4: Fact ::= Opd p5: Opd ::= quotesingle (quotesingle Expr quotesingle )quotesingle Expr p6: Opd ::= Ident IdEx p2 p7: AddOpr ::= quotesingle +quotesingle PlusOpr Fact p8: AddOpr ::= quotesingle -quotesingle MinusOpr p3 p9: MulOpr ::= quotesingle *quotesingle TimesOpr Fact MulOpr Opd p10: MulOpr ::= quotesingle /quotesingle DivOpr p4 p9 p5 Opd * ( Expr ) p6 p1 a Expr AddOpr Fact p2 p7 p4 Fact + Opd p4 p6 derivation tree for a * (b + c) Opd c p6 b (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 39 Objectives: Illustrate comparison of concrete and abstract syntax In the lecture: * Repeat concepts of "GdP" (slide GdP-2.5): Grammar expresses operator precedences and associativity. * The derivation tree is constructed by the parser - not necessarily stored as a data structure. * Chain productions have only one non-terminal symbol on their right-hand side. Suggested reading: Kastens / Übersetzerbau, Section 4.1 Suggested reading: slide GdP-2.5 Questions: * How does a grammar express operator precedences and associativity? * What is the purpose of the chain productions in this example. * What other purposes can chain productions serve? -------------------------------------------------------------------------------- CI-40 Example: abstract expression grammar name production BinEx: Exp ::= Exp BinOpr Exp IdEx: Exp ::= Ident PlusOpr: BinOpr ::= quotesingle +quotesingle MinusOpr: BinOpr ::= quotesingle -quotesingle TimesOpr: BinOpr ::= quotesingle *quotesingle abstract program tree for a * (b + c) DivOpr: BinOpr ::= quotesingle /quotesingle Exp BinEx Exp BinOpr Exp IdEx TimesOpr BinEx a Exp BinOpr Exp * IdEx PlusOpr IdEx b + c symbol classes: Exp = { Expr, Fact, Opd }, BinOpr = { AddOpr, MulOpr } Actions of the concrete syntax: productions of the abstract syntax to create tree nodes for no action at a concrete chain production: no tree node is created (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 40 Objectives: Illustrate comparison of concrete and abstract syntax In the lecture: * Repeat concepts of "GdP" (slide GdP-2.9): * Compare grammars and trees. * Actions create nodes of the abstract program tree. * Symbol classes shrink node pairs that represent chain productions into one node Suggested reading: Kastens / Übersetzerbau, Section 4.1 Suggested reading: slide GdP-2.9 Questions: * Is this abstract grammar unambiguous? * Why is that irrelevant? -------------------------------------------------------------------------------- CI-41 Recursive descent parser top-down (construction of the derivation tree), predictive method Sytematic transformation of a context-free grammar into a set of functions: non-terminal symbol X function X alternative productions for X branches in the function body decision set of production pi decision for branch pi non-terminal occurrence X ::= ... Y ... function call Y() terminal occurrence X ::= ... t ... accept a token t an read the next token Example: p1: Stmt ::= Variable quotesingle :=quotesingle Expr p2: Stmt ::= quotesingle whilequotesingle Expr quotesingle doquotesingle Stmt Function: void Stmt () { switch (CurrSymbol) { case decision set for p1: case decision set for p2: Variable(); accept(whileSym); accept(assignSym); Expr(); Expr(); accept(doSym); break; Stmt(); break; default: Fehlerbehandlung(); } } (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 41 Objectives: Understand the construction schema In the lecture: Explanation of the method: * Relate grammar constructs to function constructs. * Each function plays the role of an acceptor for a symbol. * accept function for reading and checking of the next token (scanner). * Computation of decision sets on CI-42. * Decision sets must be pairwise disjoint! Suggested reading: Kastens / Übersetzerbau, Section 4.2 Questions: * A parser algorithm is based on a stack automaton. Where is the stack of a recursive descent parser? What corresponds to the states of the stack automaton? * Where can actions be inserted into the functions to output production sequences in postfix or in prefix form? -------------------------------------------------------------------------------- CI-42 Grammar conditions for recursive descent A context-free grammar is strong LL(1), if for any pair of productions that have the same symbol on their left-hand sides, the decision sets are disjoint: productions: A ::= u A ::= v decision sets: First (u Follow(A)) intersection First (v Follow(A)) = emptyset First set and follow set: First (u) := { t element T | v element V* exists and a derivation u arrowdblright * t v } and epsilon element First (u) if u arrowdblright * epsilon exists Follow (A) := { t element T | u,v element V* exist, A element N and a derivation S arrowdblright * u A v such that t element First (v) } Example: production decision set p1: Prog ::= Block # begin non-terminal X p2: Block ::= begin Decls Stmts end begin First(X) Follow(X) p3: Decls ::= Decl ; Decls new p4: Decls ::= Ident begin Prog begin p5: Decls ::= new Ident new Block begin # ; end p6: Stmts ::= Stmts ; Stmt begin Ident Decls epsilon new Ident begin p7: Stmts ::= Stmt begin Ident Decl new ; p8: Stmt ::= Block begin Stmts begin Ident ; end p9: Stmt ::= Ident := Ident Ident Stmt begin Ident ; end (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 42 Objectives: Strong LL(1) can easily be checked In the lecture: * Explain the definitions using the example. * First set: set of terminal symbols, which may begin some token sequence that is derivable from u. * Follow set: set of terminal symbols, which may follow an A in some derivation. Suggested reading: Kastens / Übersetzerbau, Section 4.2, LL(k) conditions, computation of First sets and Follow sets Questions: The example grammar is not strong LL(1). * Show where the condition is violated. * Explain the reason for the violation. * What would happen if we constructed a recursive descent parser although the condition is violated? -------------------------------------------------------------------------------- CI-43 Grammar transformations for LL(1) Consequences of strong LL(1) condition: A strong LL(1) grammar can not have * alternative productions that begin with the same symbols * productions that are directly or indirectly left-recursive. Simple grammar transformations that keep the defined language invariant: * left-factorization: non-LL(1) productions transformed u, v, w element V* X element N does not occur in the A ::= v u A ::= v X original grammar A ::= v w X ::= u X ::= w * elimination of direct recursion : A ::= A u A ::= v X A ::= v X ::= u X X ::= EBNF constructs can avoid violation of strong LL(1) condition: for example repetition of u: A ::= v ( u )* w additional condition: First(u) intersection First(w Follow(A)) = emptyset branch in the function body: v while (CurrToken in First(u)) { u } w correspondingly for EBNF constructs u+, [u] (c) 2006 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 43 Objectives: Understand transformations and their need In the lecture: * Argue why strong LL(1) grammars can not have such productions. * Show why the transformations remove those problems. * Replacing left-recursion by right recursion would usually distort the structure. * There are more general rules for indirect recursion. * Show EBNF productions in recursive descent parsers. Questions: * Apply recursion elimination for expression grammars. * Write a strong LL(1) expression grammar using EBNF. -------------------------------------------------------------------------------- CI-44 Comparison: top-down vs. bottom-up Information a stack automata has when it decides to apply production A ::= x : top-down, predictive bottom-up leftmost derivation rightmost derivation backwards contents of A the stack A direction of u u v tree construction x x input input accepted k accepted k lookahead lookahead A bottom-up parser has seen more of the input when it decides to apply a production. Consequence: bottom-up parsers and their grammar classes are more powerful. (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 44 Objectives: Understand the decision basis of the automata In the lecture: Explain the meaning of the graphics: * role of the stack: contains states of the automaton, * accepted input: will not be considered again, * lookahead: the next k symbols, not yet accepted * leftmost derivation: leftmost non-terminal is derived next; rightmost correspondingly, * consequences for the direction of tree construction, Abbreviations * LL: (L)eft-to-right, (L)eftmost derivation, * LR: (L)eft-to-right, (R)ightmost derivation, * LALR: (L)ook(A)head LR Suggested reading: Kastens / Übersetzerbau, Section Text zu Abb. 4.2-1, 4.3-1 Questions: Use the graphics to explain why a bottom-up parser without lookahead (k=0) is reasonable, but a top-down parser is not. -------------------------------------------------------------------------------- CI-45 LR(1) automata LR(k) grammars introduced 1965 by Donald Knuth; non-practical until subclasses were defined. LR parsers construct the derivation tree bottom-up, a right-derivation backwards. LR(k) grammar condition can not be checked directly, but a context-free grammar is LR(k), iff the (canonical) LR(k) automaton is deterministic. We consider only 1 token lookahead: LR(1). The stacks of LR(k) (and LL(k)) automata contain states. The construction of LR and LL states is based on the notion of items (also called situations): An item represents the progress of analysis with respect to one production: [ A ::= u . v R ] z. B. [ B ::= ( . D ; S ) {#}] . position of analysis R expected right context, i. e. a set of terminals which may follow after the application of the complete production. (for general k: R contains terminal sequences not longer than k) Reduce item: [ A ::= u v . R ] z. B. [ B ::= ( D ; S ) . {#}] characterizes a reduction using this production if the next input token is in R. Each state of an automaton represents LL: one item LR: a set of items (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 45 Objectives: Fundamental notions of LR automata In the lecture: Explain * meaning of an item, * lookahead in the input and right context in the automaton. * There is no right context set in case of an LR(0) automaton. Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * What contains the right context set in case of a LR(3) automaton? -------------------------------------------------------------------------------- CI-46 LR(1) states and operations A state of an LR automaton represents a set of items Each item represents a way in which analysis may proceed from that state. 2 B ::= ( . D ; S ) {#} D ::= . D ; a {;} A shift transition is made under D ::= . a {;} a token read from input or a non-terminal symbol D obtained from a preceding reduction. 4 B ::= ( D . ; S ) {#} The state is pushed. a D ::= D . ; a {;} A reduction is made according to a reduce item. n states are popped for a production of length n. 3 D ::= a . {;} red. p3 Operations: shift read and push the next state on the stack reduce reduce with a certain production, pop n states from the stack error error recognized, report it, recover stop input accepted (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 46 Objectives: Understand LR(1) states and operations In the lecture: Explain * Sets of items, * shift transitions, * reductions. Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * Explain: A state is encoded by a number. A state represents complex information which is important for construction of the automaton. -------------------------------------------------------------------------------- CI-47 Example for a LR(1) automaton Grammar: 1 p1 B ::= ( D ; S ) B ::= . ( D ; S ) {#} p2 D ::= D ; a ( 3 2 a p3 D ::= a B ::= ( . D ; S ) {#} D ::= a . {;} red. p3 D ::= . D ; a {;} p4 S ::= b ; S D ::= . a {;} p5 S ::= b D 6 D ::= D ; a . {;} red. p2 4 B ::= ( D . ; S ) {#} D ::= D . ; a {;} a 7 ; S ::= b . ; S {)} . {)} red. p5 5 S ::= b B ::= ( D ; . S ) {#} D ::= D ; . a {;} ; b 8 S ::= . b ; S {)} S ::= b ; . S {)} S ::= . b {)} S ::= . b ; S {)} S b S ::= . b {)} 10 B ::= ( D ; S . ) {#} S 9 ) 11 S ::= b ; S . {)} red. p4 B ::= ( D ; S ) . {#} red. p1, stop (c) 2006 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 47 Objectives: Example for states, transitions, and automaton construction In the lecture: Use the example to explain * the start state, * the creation of new states, * transitions into successor states, * transitive closure of item set, * push and pop of states, * consequences of left-recursive and right-recursive productions, * use of right context to decide upon a reduction, erläutern. Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * Describe the subgraphs for left-recursive and right-recursive productions. How do they differ? * How does a LR(0) automaton decide upon reductions? -------------------------------------------------------------------------------- CI-48 Construction of LR(1) automata Create the start state; create transitions and states as long as new ones can be created. Transitive closure is to be applied to each state: before: B ::= ( . D ; S ) {#} If [ A ::= u . B v R ] is in state q, with the analysis position before a non-terminal B, after: 2 B ::= ( . D ; S ) {#} then for each production B ::= w D ::= . D ; a {;} [ B ::= . w First (v R) ] D ::= . a {;} has to be added to state q. Start state: Closure of [ S ::= . u {#} ] 1 S ::= u is the unique start production, B ::= . ( D ; S ) {#} # is an artificial end symbol (eof) Successor states: For each symbol x (terminal or non-terminal), which occurs in some items after the analysis position, 2 B ::= ( . D ; S ) {#} a transition is created to a successor state. That D ::= . D ; a {;} contains a corresponding item with the analysis position D ::= . a {;} advanced behind the x occurrence. D a 4 B ::= ( D . ; S ) {#} 3 D ::= D . ; a {;} D ::= a . {;} (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 48 Objectives: Understand the method In the lecture: Explain using the example on CI-47: * transitive closure, * computation of the right context sets, * relation between the items of a state and those of one of its successor Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * Explain the role of the right context. * Explain its computation. -------------------------------------------------------------------------------- CI-49 Operations of the LR(1) automaton shift x (terminal or non-terminal): Example: from current state q under x into thesuccessor state q` , stack input reduction push q` 1 ( a ; a ; b ; b ) # reduce p: 1 2 a ; a ; b ; b ) # apply production p B ::= u , 1 2 3 ; a ; b ; b ) # p3 pop as many states, 1 2 ; a ; b ; b ) # as there are symbols in u, from the 1 2 4 ; a ; b ; b ) # new current state make a shift with B 1 2 4 5 a ; b ; b ) # 1 2 4 5 6 ; b ; b ) # p2 error: 1 2 ; b ; b ) # the current state has no transition 1 2 4 ; b ; b ) # under the next input token, 1 2 4 5 b ; b ) # issue a message and recover 1 2 4 5 7 ; b ) # 1 2 4 5 7 8 b ) # stop: 1 2 4 5 7 8 7 ) # p5 recuce start production, 1 2 4 5 7 8 ) # see # in the input 1 2 4 5 7 8 9 ) # p4 1 2 4 5 ) # 1 2 4 5 10 ) # 1 2 3 5 10 11 # p1 1 # (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 49 Objectives: Understand how the automaton works In the lecture: Explain operations Questions: * Why does the automaton behave differently on a-sequences and b-sequences? * Which behaviour is better? -------------------------------------------------------------------------------- CI-50 LR conflicts An LR(1) automaton that has conflicts is not deterministic. Its grammar is not LR(1); correspondingly defined for any other LR class. 2 kinds of conflicts: reduce-reduce conflict: ... A state contains two reduce items, the A ::= u . R1 R1, R2 right context sets of which are not disjoint: B ::= v . R2 not ... disjoint shift-reduce conflict: A state contains a shift item with the analysis position in front of a t and ... A ::= u .t v R1 a reduce item with t in its right context set. B ::= w . R2 t element R2 ... (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 50 Objectives: Understand LR conflicts In the lecture: Explain: In certain situations the given input token t can not determine * Reduce-reduce: which reduction is to be taken; * Shift-reduce: whether the next token is to be shifted, a reduction is to be made. Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * Why can a shift-shift conflict not exist? * In LR(0) automata items do not have a right-context set. Explain why a state with a reduce item may not contain any other item. -------------------------------------------------------------------------------- CI-51 Shift-reduce conflict for "dangling else" ambiguity 1 S ::= . Stmt {#} Stmt Stmt ::= . if ... then Stmt {#} Stmt ::= . if ... then Stmt else Stmt {#} Stmt ::= . a {#} a if ... then 3 Stmt ::= if ... then . Stmt {#} Stmt ::= if ... then . Stmt else Stmt {#} Stmt Stmt ::= . if ... then Stmt {# else} Stmt ::= . if ... then Stmt else Stmt {# else} a Stmt ::= . a {# else} if ... then 5 Stmt ::= if ... then . Stmt {# else} Stmt ::= if ... then . Stmt else Stmt {# else} if Stmt ::= . if ... then Stmt {# else} Stmt ::= . if ... then Stmt else Stmt {# else} Stmt ::= . a {# else} a Stmt 6 Stmt ::= if ... then Stmt . {# else} else Stmt ::= if ... then Stmt . else Stmt {# else} shift-reduce conflict (c) 2006 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 51 Objectives: See a conflict in an automaton In the lecture: Explain * the construction * a solution of the conflict: The automaton can be modified such that in state 6, if an else is the next input token, it is shifted rather than a reduction is made. In that case the ambiguity is solved such that the else part is bound to the inner if. That is the structure required in Pascal and C. Some parser generators can be instructed to resolve conflicts in this way. Suggested reading: Kastens / Übersetzerbau, Section 4.3 -------------------------------------------------------------------------------- CI-52 Simplified LR grammar classes LR(1): too many states for practical use Grammar hierarchy: Reason: right-contexts distinguish many states (strict inclusions) Strategy: simplify right-contexts sets, fewer states, grammar classes are less powerful context-free LR(0): unambiguous all items without right-context Consequence: reduce items only in LR(k) singleton sets LL(k) LR(1) SLR(1): LALR(1) LR(0) states; in reduce items use larger right-context sets for decision: [ A ::= u . Follow (A) ] strong LL(1) = LL(1) SLR(1) LR(0) LALR(1): identify LR(1) states if their items differ only in their right-context sets, unite the sets for those items; yields the states of the LR(0) automaton augmented by the "exact" LR(1) right-context. State-of-the-art parser generators accept LALR(1) (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 52 Objectives: Understand relations between LR classes In the lecture: Explain: * LALR(1), SLR(1), LR(0) automata have the same number of states, * compare their states, * discuss the grammar classes for the example on slide CI-47. Suggested reading: Kastens / Übersetzerbau, Section 4.3 Questions: * Assume that the LALR(1) contruction for a given grammar yields conflicts. Classify the potential reasons using the LR hierarchy. -------------------------------------------------------------------------------- CI-53 Implementation of LR automata Table-driven: terminals nonterminals sq: shift into sq state q sq rp: reduce states rp ~ production p e ~ e: error ~: never reached Compress tables: * merge rows or columns that differ only in irrelevant entries; method: graph coloring * extract a separate error matrix (bit matrix); increases the chances for merging * normalize the values of rows or columns; yields smaller domain; supports merging * eliminate LR(0) reduce states; new operation in predecessor state: shift-reduce eliminates about 30\% of the states in practical cases About 10-20\% of the original table sizes can be achieved! Directly programmed LR-automata are possible - but usually too large. (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 53 Objectives: Implementation of LR tables In the lecture: Explanation of * pair of tables and their entries, * unreachable entries, * compression techniques, derived from general table compression, * Singleton reduction states yield an effective optimization. Questions: * Why are there no error entries in the nonterminal part? * Why are there unreachable entries? * Why does a parser need a shift-reduce operation if the optimization of LR(0)-reduction states is applied? -------------------------------------------------------------------------------- CI-54 Error handling: general criteria * recognize error as early as possible LL and LR can do that * report the symptom in terms of the source text * continue parsing short after the error position * avoid avalanche errors * build a tree that has a correct structure * do not backtrack, do not undo actions * no runtime penalty for correct programs (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 54 Objectives: Accept strong requirements In the lecture: * The reasons for and the consequences of the requirements are discussed. * Some of the requirements hold for error handling in general - not only that of the syntactic analysis. -------------------------------------------------------------------------------- CI-55 Error position Error recovery: Means that are taken by the parser after recognition of a syntactic error in order to continue parsing Correct prefix: The token sequence w element T* is a correct prefix in the language L(G), if there is an u element T* such that w u element L(G); i. e. w can be extended to a sentence in L(G). Error position: t is the (first) error position in the input w t x , where t element T and w, x element T*, if w is a correct prefix in L(G) and w t is not a correct prefix. Example: int compute (int i) { a = i * / c; return i;} w t LL and LR parsers recognize an error at the error position; they can not accept t in the current state. (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 55 Objectives: Error position from the view of the parser In the lecture: Explain the notions with respect to parser actions using the examples. Questions: Assume the programmer omitted an opening parenthesis. * Where is the error position? * What is the symptom the parser recognizes? -------------------------------------------------------------------------------- CI-56 Error recovery Continuation point: The token d at or behind the error position t such that parsing of the input continues at d. Error repair with respect to a consistent derivation - regardless the intension of the programmer! Let the input be w t x with the error position at t and let w t x = w y d z, then the recovery (conceptually) deletes y and inserts v, such that w v d is a correct prefix in L(G), with d element T and w, y, v, z element T*. Examples: w y d z w yd z w y d z a = i * / c;... a = i * / c;... a = i * / c;... a = i * c;... a = i *e/ c;... a = i * e ;... delete / insert error id. e delete / c and insert error id. e (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 56 Objectives: Understand error recovery In the lecture: Explain the notions with respect to parser actions using the examples. Questions: Assume the programmer omitted an opening parenthesis. * What could be a suitable repair? -------------------------------------------------------------------------------- CI-57 Recovery method: simulated continuation Problem: Determine a continuation point close to the error position and reach it. Idea: Use parse stack to determine a set of tokens as potential continuation points. Steps of the method: 1. Save the contents of the parse stack when an error is recognized. Skip the error token. 2. Compute a set D reflexsubset T of tokens that may be used as continuation point (anchor set) Let a modified parser run to completion: Instead of reading a token from input it is inserted into D; (modification given below) 3. Find a continuation point d: Skip input tokens until a token of D is found. 4. Reach the continuation point d: Restore the saved parser stack as the current stack. Perform dedicated transitions until d is acceptable. Instead of reading tokens (conceptually) insert tokens. Thus a correct prefix is constructed. 5. Continue normal parsing. Augment parser construction for steps 2 and 4: For each parser state select a transition and its token, such that the parser empties its stack and terminates as fast as possible. This selection can be generated automatically. The quality of the recovery can be improved by influence on the computation of D. (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 57 Objectives: Error recovery can be generated In the lecture: * Explain the idea and the steps of the method. * The method yields a correct parse for any input! * Other, less powerful methods determine sets D statically at parser construction time, e. g. semicolon and curly bracket for errors in statements. Questions: * How does this method fit to the general requirements for error handling? -------------------------------------------------------------------------------- CI-58 Parser generators PGS Univ. Karlsruhe; in Eli LALR(1), table-driven Cola Univ. Paderborn; in Eli LALR(1), optional: table-driven or directly programmed Lalr Univ. / GMD Karlsruhe LALR(1), table-driven Yacc Unix tool LALR(1), table-driven Bison Gnu LALR(1), table-driven Llgen Amsterdam Compiler Kit LL(1), recursive descent Deer Univ. Colorado, Bouder LL(1), recursive descent Form of grammar specification: EBNF: Cola, PGS, Lalr; BNF: Yacc, Bison Error recovery: simulated continuation, automatically generated: Cola, PGS, Lalr error productions, hand-specified: Yacc, Bison Actions: statements in the implementation language at the end of productions: Yacc, Bison anywhere in productions: Cola, PGS, Lalr Conflict resolution: modification of states (reduce if ...) Cola, PGS, Lalr order of productions: Yacc, Bison rules for precedence and associativity: Yacc, Bison Implementation languages: C: Cola, Yacc, Bison C, Pascal, Modula-2, Ada: PGS, Lalr (c) 2001 bei Prof. Dr. Uwe Kastens -------------------------------------------------------------------------------- Lecture Compiler I WS 2001/2002 / Slide 58 Objectives: Overview over parser generators In the lecture: * Explain the significance of properties Suggested reading: Kastens / Übersetzerbau, Section 4.5 --------------------------------------------------------------------------------