FreeBSD Bugzilla – Attachment 19963 Details for
Bug 35345
Restore old yacc documentation
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
file.diff
file.diff (text/plain), 81.89 KB, created by
Tony Finch
on 2002-02-26 14:10:01 UTC
(
hide
)
Description:
file.diff
Filename:
MIME Type:
Creator:
Tony Finch
Created:
2002-02-26 14:10:01 UTC
Size:
81.89 KB
patch
obsolete
>--- /usr/src/share/doc/psd/Makefile Fri Nov 26 09:27:36 1999 >+++ /usr/src/share/doc/psd.new/Makefile Tue Feb 26 13:50:20 2002 >@@ -4,15 +4,15 @@ > # The following modules do not build/install: > # 10.gdb > >-# The following modules are encumbered: >+# The following modules are missing because they were encumbered: > # 01.cacm 02.implement 03.iosys 04.uprog 06.Clang 11.adb 14.sccs >-# 15.yacc 16.lex 17.m4 >+# 16.lex 17.m4 > > # The following modules do not apply to FreeBSD: > # 07.pascal 08.f77 09.f77io > > SUBDIR= title contents >-SUBDIR+= 05.sysman 12.make 13.rcs 18.gprof 20.ipctut 21.ipc >+SUBDIR+= 05.sysman 12.make 13.rcs 15.yacc 18.gprof 20.ipctut 21.ipc > > # The following modules are new in FreeBSD: > SUBDIR+= 22.rpcgen 23.rpc 24.xdr 25.xdrrfc 26.rpcrfc 27.nfsrpc >--- /usr/src/share/doc/psd/15.yacc/Makefile Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/Makefile Tue Feb 26 13:53:48 2002 >@@ -0,0 +1,10 @@ >+# From: @(#)Makefile 8.1 (Berkeley) 6/8/93 >+# $FreeBSD$ >+ >+VOLUME= psd/15.yacc >+SRCS= ss.. ss0 ss1 ss2 ss3 ss4 ss5 ss6 ss7 ss8 ss9 ssA ssB ssa ssb ssc ssd >+MACROS= -ms >+ >+USE_REFER= yes >+ >+.include <bsd.doc.mk> >--- /usr/src/share/doc/psd/15.yacc/ss.. Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss.. Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,56 @@ >+.\" @(#)ss.. 6.1 (Berkeley) 5/8/86 >+.\" >+.EH 'PSD:15-%''Yacc: Yet Another Compiler-Compiler' >+.OH 'Yacc: Yet Another Compiler-Compiler''PSD:15-%' >+.\".RP >+.ND "July 31, 1978" >+.TL >+Yacc: >+Yet Another Compiler-Compiler >+.AU "MH 2C-559" 3968 >+Stephen C. Johnson >+.AI >+.MH >+.AB >+.PP >+Computer program input generally has some structure; >+in fact, every computer program that does input can be thought of as defining >+an ``input language'' which it accepts. >+An input language may be as complex as a programming language, or as simple as >+a sequence of numbers. >+Unfortunately, usual input facilities >+are limited, difficult to use, >+and often are lax about checking their inputs for validity. >+.PP >+Yacc provides a general tool for describing >+the input to a computer program. >+The Yacc user specifies the structures >+of his input, together with code to be invoked as >+each such structure is recognized. >+Yacc turns such a specification into a subroutine that >+handles the input process; >+frequently, it is convenient and appropriate to have most >+of the flow of control in the user's application >+handled by this subroutine. >+.PP >+The input subroutine produced by Yacc calls a user-supplied routine to >+return the next basic input item. >+Thus, the user can specify his input in terms of individual input characters, or >+in terms of higher level constructs such as names and numbers. >+The user-supplied routine may also handle idiomatic features such as >+comment and continuation conventions, which typically defy easy grammatical specification. >+.PP >+Yacc is written in portable C. >+The class of specifications accepted is a very general one: LALR(1) >+grammars with disambiguating rules. >+.PP >+In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc >+has also been used for less conventional languages, >+including a phototypesetter language, several desk calculator languages, a document retrieval system, >+and a Fortran debugging system. >+.AE >+.OK >+.\"Computer Languages >+.\"Compilers >+.\"Formal Language Theory >+.CS 23 11 34 0 0 8 >--- /usr/src/share/doc/psd/15.yacc/ss0 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss0 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,201 @@ >+.\" @(#)ss0 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+0: Introduction >+.PP >+Yacc provides a general tool for imposing structure on the input to a computer program. >+The Yacc user prepares a >+specification of the input process; this includes rules >+describing the input structure, code to be invoked when these >+rules are recognized, and a low-level routine to do the >+basic input. >+Yacc then generates a function to control the input process. >+This function, called a >+.I parser , >+calls the user-supplied low-level input routine >+(the >+.I "lexical analyzer" ) >+to pick up the basic items >+(called >+.I tokens ) >+from the input stream. >+These tokens are organized according to the input structure rules, >+called >+.I "grammar rules" \|; >+when one of these rules has been recognized, >+then user code supplied for this rule, an >+.I action , >+is invoked; actions have the ability to return values and >+make use of the values of other actions. >+.PP >+Yacc is written in a portable dialect of C >+.[ >+Ritchie Kernighan Language Prentice >+.] >+and the actions, and output subroutine, are in C as well. >+Moreover, many of the syntactic conventions of Yacc follow C. >+.PP >+The heart of the input specification is a collection of grammar rules. >+Each rule describes an allowable structure and gives it a name. >+For example, one grammar rule might be >+.DS >+date : month\_name day \',\' year ; >+.DE >+Here, >+.I date , >+.I month\_name , >+.I day , >+and >+.I year >+represent structures of interest in the input process; >+presumably, >+.I month\_name , >+.I day , >+and >+.I year >+are defined elsewhere. >+The comma ``,'' is enclosed in single quotes; this implies that the >+comma is to appear literally in the input. >+The colon and semicolon merely serve as punctuation in the rule, and have >+no significance in controlling the input. >+Thus, with proper definitions, the input >+.DS >+July 4, 1776 >+.DE >+might be matched by the above rule. >+.PP >+An important part of the input process is carried out by the >+lexical analyzer. >+This user routine reads the input stream, recognizing the lower level structures, >+and communicates these tokens >+to the parser. >+For historical reasons, a structure recognized by the lexical analyzer is called a >+.I "terminal symbol" , >+while the structure recognized by the parser is called a >+.I "nonterminal symbol" . >+To avoid confusion, terminal symbols will usually be referred to as >+.I tokens . >+.PP >+There is considerable leeway in deciding whether to recognize structures using the lexical >+analyzer or grammar rules. >+For example, the rules >+.DS >+month\_name : \'J\' \'a\' \'n\' ; >+month\_name : \'F\' \'e\' \'b\' ; >+ >+ . . . >+ >+month\_name : \'D\' \'e\' \'c\' ; >+.DE >+might be used in the above example. >+The lexical analyzer would only need to recognize individual letters, and >+.I month\_name >+would be a nonterminal symbol. >+Such low-level rules tend to waste time and space, and may >+complicate the specification beyond Yacc's ability to deal with it. >+Usually, the lexical analyzer would >+recognize the month names, >+and return an indication that a >+.I month\_name >+was seen; in this case, >+.I month\_name >+would be a token. >+.PP >+Literal characters such as ``,'' must also be passed through the lexical >+analyzer, and are also considered tokens. >+.PP >+Specification files are very flexible. >+It is realively easy to add to the above example the rule >+.DS >+date : month \'/\' day \'/\' year ; >+.DE >+allowing >+.DS >+7 / 4 / 1776 >+.DE >+as a synonym for >+.DS >+July 4, 1776 >+.DE >+In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, >+and little danger of disrupting existing input. >+.PP >+The input being read may not conform to the >+specifications. >+These input errors are detected as early as is theoretically possible with a >+left-to-right scan; >+thus, not only is the chance of reading and computing with bad >+input data substantially reduced, but the bad data can usually be quickly found. >+Error handling, >+provided as part of the input specifications, >+permits the reentry of bad data, >+or the continuation of the input process after skipping over the bad data. >+.PP >+In some cases, Yacc fails to produce a parser when given a set of >+specifications. >+For example, the specifications may be self contradictory, or they may >+require a more powerful recognition mechanism than that available to Yacc. >+The former cases represent design errors; >+the latter cases >+can often be corrected >+by making >+the lexical analyzer >+more powerful, or by rewriting some of the grammar rules. >+While Yacc cannot handle all possible specifications, its power >+compares favorably with similar systems; >+moreover, the >+constructions which are difficult for Yacc to handle are >+also frequently difficult for human beings to handle. >+Some users have reported that the discipline of formulating valid >+Yacc specifications for their input revealed errors of >+conception or design early in the program development. >+.PP >+The theory underlying Yacc has been described elsewhere. >+.[ >+Aho Johnson Surveys LR Parsing >+.] >+.[ >+Aho Johnson Ullman Ambiguous Grammars >+.] >+.[ >+Aho Ullman Principles Compiler Design >+.] >+Yacc has been extensively used in numerous practical applications, >+including >+.I lint , >+.[ >+Johnson Lint >+.] >+the Portable C Compiler, >+.[ >+Johnson Portable Compiler Theory >+.] >+and a system for typesetting mathematics. >+.[ >+Kernighan Cherry typesetting system CACM >+.] >+.PP >+The next several sections describe the >+basic process of preparing a Yacc specification; >+Section 1 describes the preparation of grammar rules, >+Section 2 the preparation of the user supplied actions associated with these rules, >+and Section 3 the preparation of lexical analyzers. >+Section 4 describes the operation of the parser. >+Section 5 discusses various reasons why Yacc may be unable to produce a >+parser from a specification, and what to do about it. >+Section 6 describes a simple mechanism for >+handling operator precedences in arithmetic expressions. >+Section 7 discusses error detection and recovery. >+Section 8 discusses the operating environment and special features >+of the parsers Yacc produces. >+Section 9 gives some suggestions which should improve the >+style and efficiency of the specifications. >+Section 10 discusses some advanced topics, and Section 11 gives >+acknowledgements. >+Appendix A has a brief example, and Appendix B gives a >+summary of the Yacc input syntax. >+Appendix C gives an example using some of the more advanced >+features of Yacc, and, finally, >+Appendix D describes mechanisms and syntax >+no longer actively supported, but >+provided for historical continuity with older versions of Yacc. >--- /usr/src/share/doc/psd/15.yacc/ss1 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss1 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,138 @@ >+.\" @(#)ss1 6.1 (Berkeley) 5/8/86 >+.\" >+.tr *\(** >+.tr |\(or >+.SH >+1: Basic Specifications >+.PP >+Names refer to either tokens or nonterminal symbols. >+Yacc requires >+token names to be declared as such. >+In addition, for reasons discussed in Section 3, it is often desirable >+to include the lexical analyzer as part of the specification file; >+it may be useful to include other programs as well. >+Thus, every specification file consists of three sections: >+the >+.I declarations , >+.I "(grammar) rules" , >+and >+.I programs . >+The sections are separated by double percent ``%%'' marks. >+(The percent ``%'' is generally used in Yacc specifications as an escape character.) >+.PP >+In other words, a full specification file looks like >+.DS >+declarations >+%% >+rules >+%% >+programs >+.DE >+.PP >+The declaration section may be empty. >+Moreover, if the programs section is omitted, the second %% mark may be omitted also; >+thus, the smallest legal Yacc specification is >+.DS >+%% >+rules >+.DE >+.PP >+Blanks, tabs, and newlines are ignored except >+that they may not appear in names or multi-character reserved symbols. >+Comments may appear wherever a name is legal; they are enclosed >+in /* . . . */, as in C and PL/I. >+.PP >+The rules section is made up of one or more grammar rules. >+A grammar rule has the form: >+.DS >+A : BODY ; >+.DE >+A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. >+The colon and the semicolon are Yacc punctuation. >+.PP >+Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and >+non-initial digits. >+Upper and lower case letters are distinct. >+The names used in the body of a grammar rule may represent tokens or nonterminal symbols. >+.PP >+A literal consists of a character enclosed in single quotes ``\'''. >+As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes >+are recognized. >+Thus >+.DS >+\'\en\' newline >+\'\er\' return >+\'\e\'\' single quote ``\''' >+\'\e\e\' backslash ``\e'' >+\'\et\' tab >+\'\eb\' backspace >+\'\ef\' form feed >+\'\exxx\' ``xxx'' in octal >+.DE >+For a number of technical reasons, the >+\s-2NUL\s0 >+character (\'\e0\' or 0) should never >+be used in grammar rules. >+.PP >+If there are several grammar rules with the same left hand side, the vertical bar ``|'' >+can be used to avoid rewriting the left hand side. >+In addition, >+the semicolon at the end of a rule can be dropped before a vertical bar. >+Thus the grammar rules >+.DS >+A : B C D ; >+A : E F ; >+A : G ; >+.DE >+can be given to Yacc as >+.DS >+A : B C D >+ | E F >+ | G >+ ; >+.DE >+It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, >+although it makes the input much more readable, and easier to change. >+.PP >+If a nonterminal symbol matches the empty string, this can be indicated in the obvious way: >+.DS >+empty : ; >+.DE >+.PP >+Names representing tokens must be declared; this is most simply done by writing >+.DS >+%token name1 name2 . . . >+.DE >+in the declarations section. >+(See Sections 3 , 5, and 6 for much more discussion). >+Every name not defined in the declarations section is assumed to represent a nonterminal symbol. >+Every nonterminal symbol must appear on the left side of at least one rule. >+.PP >+Of all the nonterminal symbols, one, called the >+.I "start symbol" , >+has particular importance. >+The parser is designed to recognize the start symbol; thus, >+this symbol represents the largest, >+most general structure described by the grammar rules. >+By default, >+the start symbol is taken to be the left hand side of the first >+grammar rule in the rules section. >+It is possible, and in fact desirable, to declare the start >+symbol explicitly in the declarations section using the %start keyword: >+.DS >+%start symbol >+.DE >+.PP >+The end of the input to the parser is signaled by a special token, called the >+.I endmarker . >+If the tokens up to, but not including, the endmarker form a structure >+which matches the start symbol, the parser function returns to its caller >+after the endmarker is seen; it >+.I accepts >+the input. >+If the endmarker is seen in any other context, it is an error. >+.PP >+It is the job of the user-supplied lexical analyzer >+to return the endmarker when appropriate; see section 3, below. >+Usually the endmarker represents some reasonably obvious >+I/O status, such as ``end-of-file'' or ``end-of-record''. >--- /usr/src/share/doc/psd/15.yacc/ss2 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss2 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,151 @@ >+.\" @(#)ss2 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+2: Actions >+.PP >+With each grammar rule, the user may associate actions to be performed each time >+the rule is recognized in the input process. >+These actions may return values, and may obtain the values returned by previous >+actions. >+Moreover, the lexical analyzer can return values >+for tokens, if desired. >+.PP >+An action is an arbitrary C statement, and as such can do >+input and output, call subprograms, and alter >+external vectors and variables. >+An action is specified by >+one or more statements, enclosed in curly braces ``{'' and ``}''. >+For example, >+.DS >+A : \'(\' B \')\' >+ { hello( 1, "abc" ); } >+.DE >+and >+.DS >+XXX : YYY ZZZ >+ { printf("a message\en"); >+ flag = 25; } >+.DE >+are grammar rules with actions. >+.PP >+To facilitate easy communication between the actions and the parser, the action statements are altered >+slightly. >+The symbol ``dollar sign'' ``$'' is used as a signal to Yacc in this context. >+.PP >+To return a value, the action normally sets the >+pseudo-variable ``$$'' to some value. >+For example, an action that does nothing but return the value 1 is >+.DS >+ { $$ = 1; } >+.DE >+.PP >+To obtain the values returned by previous actions and the lexical analyzer, the >+action may use the pseudo-variables $1, $2, . . ., >+which refer to the values returned by the >+components of the right side of a rule, reading from left to right. >+Thus, if the rule is >+.DS >+A : B C D ; >+.DE >+for example, then $2 has the value returned by C, and $3 the value returned by D. >+.PP >+As a more concrete example, consider the rule >+.DS >+expr : \'(\' expr \')\' ; >+.DE >+The value returned by this rule is usually the value of the >+.I expr >+in parentheses. >+This can be indicated by >+.DS >+expr : \'(\' expr \')\' { $$ = $2 ; } >+.DE >+.PP >+By default, the value of a rule is the value of the first element in it ($1). >+Thus, grammar rules of the form >+.DS >+A : B ; >+.DE >+frequently need not have an explicit action. >+.PP >+In the examples above, all the actions came at the end of their rules. >+Sometimes, it is desirable to get control before a rule is fully parsed. >+Yacc permits an action to be written in the middle of a rule as well >+as at the end. >+This rule is assumed to return a value, accessible >+through the usual \$ mechanism by the actions to >+the right of it. >+In turn, it may access the values >+returned by the symbols to its left. >+Thus, in the rule >+.DS >+A : B >+ { $$ = 1; } >+ C >+ { x = $2; y = $3; } >+ ; >+.DE >+the effect is to set >+.I x >+to 1, and >+.I y >+to the value returned by C. >+.PP >+Actions that do not terminate a rule are actually >+handled by Yacc by manufacturing a new nonterminal >+symbol name, and a new rule matching this >+name to the empty string. >+The interior action is the action triggered off by recognizing >+this added rule. >+Yacc actually treats the above example as if >+it had been written: >+.DS >+$ACT : /* empty */ >+ { $$ = 1; } >+ ; >+ >+A : B $ACT C >+ { x = $2; y = $3; } >+ ; >+.DE >+.PP >+In many applications, output is not done directly by the actions; >+rather, a data structure, such as a parse tree, is constructed in memory, >+and transformations are applied to it before output is generated. >+Parse trees are particularly easy to >+construct, given routines to build and maintain the tree >+structure desired. >+For example, suppose there is a C function >+.I node , >+written so that the call >+.DS >+node( L, n1, n2 ) >+.DE >+creates a node with label L, and descendants n1 and n2, and returns the index of >+the newly created node. >+Then parse tree can be built by supplying actions such as: >+.DS >+expr : expr \'+\' expr >+ { $$ = node( \'+\', $1, $3 ); } >+.DE >+in the specification. >+.PP >+The user may define other variables to be used by the actions. >+Declarations and definitions can appear in >+the declarations section, >+enclosed in the marks ``%{'' and ``%}''. >+These declarations and definitions have global scope, >+so they are known to the action statements and the lexical analyzer. >+For example, >+.DS >+%{ int variable = 0; %} >+.DE >+could be placed in the declarations section, >+making >+.I variable >+accessible to all of the actions. >+The Yacc parser uses only names beginning in ``yy''; >+the user should avoid such names. >+.PP >+In these examples, all the values are integers: a discussion of >+values of other types will be found in Section 10. >--- /usr/src/share/doc/psd/15.yacc/ss3 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss3 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,104 @@ >+.\" @(#)ss3 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+3: Lexical Analysis >+.PP >+The user must supply a lexical analyzer to read the input stream and communicate tokens >+(with values, if desired) to the parser. >+The lexical analyzer is an integer-valued function called >+.I yylex . >+The function returns an integer, the >+.I "token number" , >+representing the kind of token read. >+If there is a value associated with that token, it should be assigned >+to the external variable >+.I yylval . >+.PP >+The parser and the lexical analyzer must agree on these token numbers in order for >+communication between them to take place. >+The numbers may be chosen by Yacc, or chosen by the user. >+In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer >+to return these numbers symbolically. >+For example, suppose that the token name DIGIT has been defined in the declarations section of the >+Yacc specification file. >+The relevant portion of the lexical analyzer might look like: >+.DS >+yylex(){ >+ extern int yylval; >+ int c; >+ . . . >+ c = getchar(); >+ . . . >+ switch( c ) { >+ . . . >+ case \'0\': >+ case \'1\': >+ . . . >+ case \'9\': >+ yylval = c\-\'0\'; >+ return( DIGIT ); >+ . . . >+ } >+ . . . >+.DE >+.PP >+The intent is to return a token number of DIGIT, and a value equal to the numerical value of the >+digit. >+Provided that the lexical analyzer code is placed in the programs section of the specification file, >+the identifier DIGIT will be defined as the token number associated >+with the token DIGIT. >+.PP >+This mechanism leads to clear, >+easily modified lexical analyzers; the only pitfall is the need >+to avoid using any token names in the grammar that are reserved >+or significant in C or the parser; for example, the use of >+token names >+.I if >+or >+.I while >+will almost certainly cause severe >+difficulties when the lexical analyzer is compiled. >+The token name >+.I error >+is reserved for error handling, and should not be used naively >+(see Section 7). >+.PP >+As mentioned above, the token numbers may be chosen by Yacc or by the user. >+In the default situation, the numbers are chosen by Yacc. >+The default token number for a literal >+character is the numerical value of the character in the local character set. >+Other names are assigned token numbers >+starting at 257. >+.PP >+To assign a token number to a token (including literals), >+the first appearance of the token name or literal >+.I >+in the declarations section >+.R >+can be immediately followed by >+a nonnegative integer. >+This integer is taken to be the token number of the name or literal. >+Names and literals not defined by this mechanism retain their default definition. >+It is important that all token numbers be distinct. >+.PP >+For historical reasons, the endmarker must have token >+number 0 or negative. >+This token number cannot be redefined by the user; thus, all >+lexical analyzers should be prepared to return 0 or negative as a token number >+upon reaching the end of their input. >+.PP >+A very useful tool for constructing lexical analyzers is >+the >+.I Lex >+program developed by Mike Lesk. >+.[ >+Lesk Lex >+.] >+These lexical analyzers are designed to work in close >+harmony with Yacc parsers. >+The specifications for these lexical analyzers >+use regular expressions instead of grammar rules. >+Lex can be easily used to produce quite complicated lexical analyzers, >+but there remain some languages (such as FORTRAN) which do not >+fit any theoretical framework, and whose lexical analyzers >+must be crafted by hand. >--- /usr/src/share/doc/psd/15.yacc/ss4 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss4 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,330 @@ >+.\" @(#)ss4 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+4: How the Parser Works >+.PP >+Yacc turns the specification file into a C program, which >+parses the input according to the specification given. >+The algorithm used to go from the >+specification to the parser is complex, and will not be discussed >+here (see >+the references for more information). >+The parser itself, however, is relatively simple, >+and understanding how it works, while >+not strictly necessary, will nevertheless make >+treatment of error recovery and ambiguities much more >+comprehensible. >+.PP >+The parser produced by Yacc consists >+of a finite state machine with a stack. >+The parser is also capable of reading and remembering the next >+input token (called the >+.I lookahead >+token). >+The >+.I "current state" >+is always the one on the top of the stack. >+The states of the finite state machine are given >+small integer labels; initially, the machine is in state 0, >+the stack contains only state 0, and no lookahead token has been read. >+.PP >+The machine has only four actions available to it, called >+.I shift , >+.I reduce , >+.I accept , >+and >+.I error . >+A move of the parser is done as follows: >+.IP 1. >+Based on its current state, the parser decides >+whether it needs a lookahead token to decide >+what action should be done; if it needs one, and does >+not have one, it calls >+.I yylex >+to obtain the next token. >+.IP 2. >+Using the current state, and the lookahead token >+if needed, the parser decides on its next action, and >+carries it out. >+This may result in states being pushed onto the stack, or popped off of >+the stack, and in the lookahead token being processed >+or left alone. >+.PP >+The >+.I shift >+action is the most common action the parser takes. >+Whenever a shift action is taken, there is always >+a lookahead token. >+For example, in state 56 there may be >+an action: >+.DS >+ IF shift 34 >+.DE >+which says, in state 56, if the lookahead token is IF, >+the current state (56) is pushed down on the stack, >+and state 34 becomes the current state (on the >+top of the stack). >+The lookahead token is cleared. >+.PP >+The >+.I reduce >+action keeps the stack from growing without >+bounds. >+Reduce actions are appropriate when the parser has seen >+the right hand side of a grammar rule, >+and is prepared to announce that it has seen >+an instance of the rule, replacing the right hand side >+by the left hand side. >+It may be necessary to consult the lookahead token >+to decide whether to reduce, but >+usually it is not; in fact, the default >+action (represented by a ``.'') is often a reduce action. >+.PP >+Reduce actions are associated with individual grammar rules. >+Grammar rules are also given small integer >+numbers, leading to some confusion. >+The action >+.DS >+ \fB.\fR reduce 18 >+.DE >+refers to >+.I "grammar rule" >+18, while the action >+.DS >+ IF shift 34 >+.DE >+refers to >+.I state >+34. >+.PP >+Suppose the rule being reduced is >+.DS >+A \fB:\fR x y z ; >+.DE >+The reduce action depends on the >+left hand symbol (A in this case), and the number of >+symbols on the right hand side (three in this case). >+To reduce, first pop off the top three states >+from the stack >+(In general, the number of states popped equals the number of symbols on the >+right side of the rule). >+In effect, these states were the ones >+put on the stack while recognizing >+.I x , >+.I y , >+and >+.I z , >+and no longer serve any useful purpose. >+After popping these states, a state is uncovered >+which was the state the parser was in before beginning to >+process the rule. >+Using this uncovered state, and the symbol >+on the left side of the rule, perform what is in >+effect a shift of A. >+A new state is obtained, pushed onto the stack, and parsing continues. >+There are significant differences between the processing of >+the left hand symbol and an ordinary shift of a token, >+however, so this action is called a >+.I goto >+action. >+In particular, the lookahead token is cleared by a shift, and >+is not affected by a goto. >+In any case, the uncovered state contains an entry such as: >+.DS >+ A goto 20 >+.DE >+causing state 20 to be pushed >+onto the stack, and become the current state. >+.PP >+In effect, the reduce action ``turns back the clock'' in the parse, >+popping the states off the stack to go back to the >+state where the right hand side of the rule was first seen. >+The parser then behaves as if it had seen the left side at that time. >+If the right hand side of the rule is empty, >+no states are popped off of the stack: the uncovered state >+is in fact the current state. >+.PP >+The reduce action is also important in the treatment of user-supplied >+actions and values. >+When a rule is reduced, the code supplied with the rule is executed >+before the stack is adjusted. >+In addition to the stack holding the states, another stack, >+running in parallel with it, holds the values returned >+from the lexical analyzer and the actions. >+When a shift takes place, the external variable >+.I yylval >+is copied onto the value stack. >+After the return from the user code, the reduction is carried out. >+When the >+.I goto >+action is done, the external variable >+.I yyval >+is copied onto the value stack. >+The pseudo-variables $1, $2, etc., refer to the value stack. >+.PP >+The other two parser actions are conceptually much simpler. >+The >+.I accept >+action indicates that the entire input has been seen and >+that it matches the specification. >+This action appears only when the lookahead token is >+the endmarker, and indicates that the parser has successfully >+done its job. >+The >+.I error >+action, on the other hand, represents a place where the parser >+can no longer continue parsing according to the specification. >+The input tokens it has seen, together with the lookahead token, >+cannot be followed by anything that would result >+in a legal input. >+The parser reports an error, and attempts to recover the situation and >+resume parsing: the error recovery (as opposed to the detection of error) >+will be covered in Section 7. >+.PP >+It is time for an example! >+Consider the specification >+.DS >+%token DING DONG DELL >+%% >+rhyme : sound place >+ ; >+sound : DING DONG >+ ; >+place : DELL >+ ; >+.DE >+.PP >+When Yacc is invoked with the >+.B \-v >+option, a file called >+.I y.output >+is produced, with a human-readable description of the parser. >+The >+.I y.output >+file corresponding to the above grammar (with some statistics >+stripped off the end) is: >+.DS >+state 0 >+ $accept : \_rhyme $end >+ >+ DING shift 3 >+ . error >+ >+ rhyme goto 1 >+ sound goto 2 >+ >+state 1 >+ $accept : rhyme\_$end >+ >+ $end accept >+ . error >+ >+state 2 >+ rhyme : sound\_place >+ >+ DELL shift 5 >+ . error >+ >+ place goto 4 >+ >+state 3 >+ sound : DING\_DONG >+ >+ DONG shift 6 >+ . error >+ >+state 4 >+ rhyme : sound place\_ (1) >+ >+ . reduce 1 >+ >+state 5 >+ place : DELL\_ (3) >+ >+ . reduce 3 >+ >+state 6 >+ sound : DING DONG\_ (2) >+ >+ . reduce 2 >+.DE >+Notice that, in addition to the actions for each state, there is a >+description of the parsing rules being processed in each >+state. The \_ character is used to indicate >+what has been seen, and what is yet to come, in each rule. >+Suppose the input is >+.DS >+DING DONG DELL >+.DE >+It is instructive to follow the steps of the parser while >+processing this input. >+.PP >+Initially, the current state is state 0. >+The parser needs to refer to the input in order to decide >+between the actions available in state 0, so >+the first token, >+.I DING , >+is read, becoming the lookahead token. >+The action in state 0 on >+.I DING >+is >+is ``shift 3'', so state 3 is pushed onto the stack, >+and the lookahead token is cleared. >+State 3 becomes the current state. >+The next token, >+.I DONG , >+is read, becoming the lookahead token. >+The action in state 3 on the token >+.I DONG >+is ``shift 6'', >+so state 6 is pushed onto the stack, and the lookahead is cleared. >+The stack now contains 0, 3, and 6. >+In state 6, without even consulting the lookahead, >+the parser reduces by rule 2. >+.DS >+ sound : DING DONG >+.DE >+This rule has two symbols on the right hand side, so >+two states, 6 and 3, are popped off of the stack, uncovering state 0. >+Consulting the description of state 0, looking for a goto on >+.I sound , >+.DS >+ sound goto 2 >+.DE >+is obtained; thus state 2 is pushed onto the stack, >+becoming the current state. >+.PP >+In state 2, the next token, >+.I DELL , >+must be read. >+The action is ``shift 5'', so state 5 is pushed onto the stack, which now has >+0, 2, and 5 on it, and the lookahead token is cleared. >+In state 5, the only action is to reduce by rule 3. >+This has one symbol on the right hand side, so one state, 5, >+is popped off, and state 2 is uncovered. >+The goto in state 2 on >+.I place , >+the left side of rule 3, >+is state 4. >+Now, the stack contains 0, 2, and 4. >+In state 4, the only action is to reduce by rule 1. >+There are two symbols on the right, so the top two states are popped off, >+uncovering state 0 again. >+In state 0, there is a goto on >+.I rhyme >+causing the parser to enter state 1. >+In state 1, the input is read; the endmarker is obtained, >+indicated by ``$end'' in the >+.I y.output >+file. >+The action in state 1 when the endmarker is seen is to accept, >+successfully ending the parse. >+.PP >+The reader is urged to consider how the parser works >+when confronted with such incorrect strings as >+.I "DING DONG DONG" , >+.I "DING DONG" , >+.I "DING DONG DELL DELL" , >+etc. >+A few minutes spend with this and other simple examples will >+probably be repaid when problems arise in more complicated contexts. >--- /usr/src/share/doc/psd/15.yacc/ss5 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss5 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,302 @@ >+.\" @(#)ss5 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+5: Ambiguity and Conflicts >+.PP >+A set of grammar rules is >+.I ambiguous >+if there is some input string that can be structured in two or more different ways. >+For example, the grammar rule >+.DS >+expr : expr \'\-\' expr >+.DE >+is a natural way of expressing the fact that one way of forming an arithmetic expression >+is to put two other expressions together with a minus sign between them. >+Unfortunately, this grammar rule does not >+completely specify the way that all complex inputs >+should be structured. >+For example, if the input is >+.DS >+expr \- expr \- expr >+.DE >+the rule allows this input to be structured as either >+.DS >+( expr \- expr ) \- expr >+.DE >+or as >+.DS >+expr \- ( expr \- expr ) >+.DE >+(The first is called >+.I "left association" , >+the second >+.I "right association" ). >+.PP >+Yacc detects such ambiguities when it is attempting to build the parser. >+It is instructive to consider the problem that confronts the parser when it is >+given an input such as >+.DS >+expr \- expr \- expr >+.DE >+When the parser has read the second expr, the input that it has seen: >+.DS >+expr \- expr >+.DE >+matches the right side of the grammar rule above. >+The parser could >+.I reduce >+the input by applying this rule; >+after applying the rule; >+the input is reduced to >+.I expr (the >+left side of the rule). >+The parser would then read the final part of the input: >+.DS >+\- expr >+.DE >+and again reduce. >+The effect of this is to take the left associative interpretation. >+.PP >+Alternatively, when the parser has seen >+.DS >+expr \- expr >+.DE >+it could defer the immediate application of the rule, and continue reading >+the input until it had seen >+.DS >+expr \- expr \- expr >+.DE >+It could then apply the rule to the rightmost three symbols, reducing them to >+.I expr >+and leaving >+.DS >+expr \- expr >+.DE >+Now the rule can be reduced once more; the effect is to >+take the right associative interpretation. >+Thus, having read >+.DS >+expr \- expr >+.DE >+the parser can do two legal things, a shift or a reduction, and has no way of >+deciding between them. >+This is called a >+.I "shift / reduce conflict" . >+It may also happen that the parser has a choice of two legal reductions; >+this is called a >+.I "reduce / reduce conflict" . >+Note that there are never any ``Shift/shift'' conflicts. >+.PP >+When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. >+It does this by selecting one of the valid steps wherever it has a choice. >+A rule describing which choice to make in a given situation is called >+a >+.I "disambiguating rule" . >+.PP >+Yacc invokes two disambiguating rules by default: >+.IP 1. >+In a shift/reduce conflict, the default is to do the shift. >+.IP 2. >+In a reduce/reduce conflict, the default is to reduce by the >+.I earlier >+grammar rule (in the input sequence). >+.PP >+Rule 1 implies that reductions are deferred whenever there is a choice, >+in favor of shifts. >+Rule 2 gives the user rather crude control over the behavior of the parser >+in this situation, but reduce/reduce conflicts should be avoided whenever possible. >+.PP >+Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent, >+require a more complex parser than Yacc can construct. >+The use of actions within rules can also cause conflicts, if the action must >+be done before the parser can be sure which rule is being recognized. >+In these cases, the application of disambiguating rules is inappropriate, >+and leads to an incorrect parser. >+For this reason, Yacc >+always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2. >+.PP >+In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also >+possible to rewrite the grammar rules so that the same inputs are read but there are no >+conflicts. >+For this reason, most previous parser generators >+have considered conflicts to be fatal errors. >+Our experience has suggested that this rewriting is somewhat unnatural, >+and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts. >+.PP >+As an example of the power of disambiguating rules, consider a fragment from a programming >+language involving an ``if-then-else'' construction: >+.DS >+stat : IF \'(\' cond \')\' stat >+ | IF \'(\' cond \')\' stat ELSE stat >+ ; >+.DE >+In these rules, >+.I IF >+and >+.I ELSE >+are tokens, >+.I cond >+is a nonterminal symbol describing >+conditional (logical) expressions, and >+.I stat >+is a nonterminal symbol describing statements. >+The first rule will be called the >+.ul >+simple-if >+rule, and the >+second the >+.ul >+if-else >+rule. >+.PP >+These two rules form an ambiguous construction, since input of the form >+.DS >+IF ( C1 ) IF ( C2 ) S1 ELSE S2 >+.DE >+can be structured according to these rules in two ways: >+.DS >+IF ( C1 ) { >+ IF ( C2 ) S1 >+ } >+ELSE S2 >+.DE >+or >+.DS >+IF ( C1 ) { >+ IF ( C2 ) S1 >+ ELSE S2 >+ } >+.DE >+The second interpretation is the one given in most programming languages >+having this construct. >+Each >+.I ELSE >+is associated with the last preceding >+``un-\fIELSE'\fRd'' >+.I IF . >+In this example, consider the situation where the parser has seen >+.DS >+IF ( C1 ) IF ( C2 ) S1 >+.DE >+and is looking at the >+.I ELSE . >+It can immediately >+reduce >+by the simple-if rule to get >+.DS >+IF ( C1 ) stat >+.DE >+and then read the remaining input, >+.DS >+ELSE S2 >+.DE >+and reduce >+.DS >+IF ( C1 ) stat ELSE S2 >+.DE >+by the if-else rule. >+This leads to the first of the above groupings of the input. >+.PP >+On the other hand, the >+.I ELSE >+may be shifted, >+.I S2 >+read, and then the right hand portion of >+.DS >+IF ( C1 ) IF ( C2 ) S1 ELSE S2 >+.DE >+can be reduced by the if-else rule to get >+.DS >+IF ( C1 ) stat >+.DE >+which can be reduced by the simple-if rule. >+This leads to the second of the above groupings of the input, which >+is usually desired. >+.PP >+Once again the parser can do two valid things \- there is a shift/reduce conflict. >+The application of disambiguating rule 1 tells the parser to shift in this case, >+which leads to the desired grouping. >+.PP >+This shift/reduce conflict arises only when there is a particular current input symbol, >+.I ELSE , >+and particular inputs already seen, such as >+.DS >+IF ( C1 ) IF ( C2 ) S1 >+.DE >+In general, there may be many conflicts, and each one >+will be associated with an input symbol and >+a set of previously read inputs. >+The previously read inputs are characterized by the >+state >+of the parser. >+.PP >+The conflict messages of Yacc are best understood >+by examining the verbose (\fB\-v\fR) option output file. >+For example, the output corresponding to the above >+conflict state might be: >+.DS L >+23: shift/reduce conflict (shift 45, reduce 18) on ELSE >+ >+state 23 >+ >+ stat : IF ( cond ) stat\_ (18) >+ stat : IF ( cond ) stat\_ELSE stat >+ >+ ELSE shift 45 >+ . reduce 18 >+ >+.DE >+The first line describes the conflict, giving the state and the input symbol. >+The ordinary state description follows, giving >+the grammar rules active in the state, and the parser actions. >+Recall that the underline marks the >+portion of the grammar rules which has been seen. >+Thus in the example, in state 23 the parser has seen input corresponding >+to >+.DS >+IF ( cond ) stat >+.DE >+and the two grammar rules shown are active at this time. >+The parser can do two possible things. >+If the input symbol is >+.I ELSE , >+it is possible to shift into state >+45. >+State 45 will have, as part of its description, the line >+.DS >+stat : IF ( cond ) stat ELSE\_stat >+.DE >+since the >+.I ELSE >+will have been shifted in this state. >+Back in state 23, the alternative action, described by ``\fB.\fR'', >+is to be done if the input symbol is not mentioned explicitly in the above actions; thus, >+in this case, if the input symbol is not >+.I ELSE , >+the parser reduces by grammar rule 18: >+.DS >+stat : IF \'(\' cond \')\' stat >+.DE >+Once again, notice that the numbers following ``shift'' commands refer to other states, >+while the numbers following ``reduce'' commands refer to grammar >+rule numbers. >+In the >+.I y.output >+file, the rule numbers are printed after those rules which can be reduced. >+In most one states, there will be at most reduce action possible in the >+state, and this will be the default command. >+The user who encounters unexpected shift/reduce conflicts will probably want to >+look at the verbose output to decide whether the default actions are appropriate. >+In really tough cases, the user might need to know more about >+the behavior and construction of the parser than can be covered here. >+In this case, one of the theoretical references >+.[ >+Aho Johnson Surveys Parsing >+.] >+.[ >+Aho Johnson Ullman Deterministic Ambiguous >+.] >+.[ >+Aho Ullman Principles Design >+.] >+might be consulted; the services of a local guru might also be appropriate. >--- /usr/src/share/doc/psd/15.yacc/ss6 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss6 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,146 @@ >+.\" @(#)ss6 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+6: Precedence >+.PP >+There is one common situation >+where the rules given above for resolving conflicts are not sufficient; >+this is in the parsing of arithmetic expressions. >+Most of the commonly used constructions for arithmetic expressions can be naturally >+described by the notion of >+.I precedence >+levels for operators, together with information about left >+or right associativity. >+It turns out that ambiguous grammars with appropriate disambiguating rules >+can be used to create parsers that are faster and easier to >+write than parsers constructed from unambiguous grammars. >+The basic notion is to write grammar rules >+of the form >+.DS >+expr : expr OP expr >+.DE >+and >+.DS >+expr : UNARY expr >+.DE >+for all binary and unary operators desired. >+This creates a very ambiguous grammar, with many parsing conflicts. >+As disambiguating rules, the user specifies the precedence, or binding >+strength, of all the operators, and the associativity >+of the binary operators. >+This information is sufficient to allow Yacc to resolve the parsing conflicts >+in accordance with these rules, and construct a parser that realizes the desired >+precedences and associativities. >+.PP >+The precedences and associativities are attached to tokens in the declarations section. >+This is done by a series of lines beginning with a Yacc keyword: %left, %right, >+or %nonassoc, followed by a list of tokens. >+All of the tokens on the same line are assumed to have the same precedence level >+and associativity; the lines are listed in >+order of increasing precedence or binding strength. >+Thus, >+.DS >+%left \'+\' \'\-\' >+%left \'*\' \'/\' >+.DE >+describes the precedence and associativity of the four arithmetic operators. >+Plus and minus are left associative, and have lower precedence than >+star and slash, which are also left associative. >+The keyword %right is used to describe right associative operators, >+and the keyword %nonassoc is used to describe operators, like >+the operator .LT. in Fortran, that may not associate with themselves; thus, >+.DS >+A .LT. B .LT. C >+.DE >+is illegal in Fortran, and such an operator would be described with the keyword >+%nonassoc in Yacc. >+As an example of the behavior of these declarations, the description >+.DS >+%right \'=\' >+%left \'+\' \'\-\' >+%left \'*\' \'/\' >+ >+%% >+ >+expr : expr \'=\' expr >+ | expr \'+\' expr >+ | expr \'\-\' expr >+ | expr \'*\' expr >+ | expr \'/\' expr >+ | NAME >+ ; >+.DE >+might be used to structure the input >+.DS >+a = b = c*d \- e \- f*g >+.DE >+as follows: >+.DS >+a = ( b = ( ((c*d)\-e) \- (f*g) ) ) >+.DE >+When this mechanism is used, >+unary operators must, in general, be given a precedence. >+Sometimes a unary operator and a binary operator >+have the same symbolic representation, but different precedences. >+An example is unary and binary \'\-\'; unary minus may be given the same >+strength as multiplication, or even higher, while binary minus has a lower strength than >+multiplication. >+The keyword, %prec, changes the precedence level associated with a particular grammar rule. >+%prec appears immediately after the body of the grammar rule, before the action or closing semicolon, >+and is followed by a token name or literal. >+It >+causes the precedence of the grammar rule to become that of the following token name or literal. >+For example, to make unary minus have the same precedence as multiplication the rules might resemble: >+.DS >+%left \'+\' \'\-\' >+%left \'*\' \'/\' >+ >+%% >+ >+expr : expr \'+\' expr >+ | expr \'\-\' expr >+ | expr \'*\' expr >+ | expr \'/\' expr >+ | \'\-\' expr %prec \'*\' >+ | NAME >+ ; >+.DE >+.PP >+A token declared >+by %left, %right, and %nonassoc need not be, but may be, declared by %token as well. >+.PP >+The precedences and associativities are used by Yacc to >+resolve parsing conflicts; they give rise to disambiguating rules. >+Formally, the rules work as follows: >+.IP 1. >+The precedences and associativities are recorded for those tokens and literals >+that have them. >+.IP 2. >+A precedence and associativity is associated with each grammar rule; it is the precedence >+and associativity of the last token or literal in the body of the rule. >+If the %prec construction is used, it overrides this default. >+Some grammar rules may have no precedence and associativity associated with them. >+.IP 3. >+When there is a reduce/reduce conflict, or there is a shift/reduce conflict >+and either the input symbol or the grammar rule has no precedence and associativity, >+then the two disambiguating rules given at the beginning of the section are used, >+and the conflicts are reported. >+.IP 4. >+If there is a shift/reduce conflict, and both the grammar rule and the input character >+have precedence and associativity associated with them, then the conflict is resolved >+in favor of the action (shift or reduce) associated with the higher precedence. >+If the precedences are the same, then the associativity is used; left >+associative implies reduce, right associative implies shift, and nonassociating >+implies error. >+.PP >+Conflicts resolved by precedence are not counted in the number of shift/reduce and reduce/reduce >+conflicts reported by Yacc. >+This means that mistakes in the specification of precedences may >+disguise errors in the input grammar; it is a good idea to be sparing >+with precedences, and use them in an essentially ``cookbook'' fashion, >+until some experience has been gained. >+The >+.I y.output >+file >+is very useful in deciding whether the parser is actually doing >+what was intended. >--- /usr/src/share/doc/psd/15.yacc/ss7 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss7 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,124 @@ >+.\" @(#)ss7 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+7: Error Handling >+.PP >+Error handling is an extremely difficult area, and many of the problems are semantic ones. >+When an error is found, for example, it may be necessary to reclaim parse tree storage, >+delete or alter symbol table entries, and, typically, set switches to avoid generating any further output. >+.PP >+It is seldom acceptable to stop all processing when an error is found; it is more useful to continue >+scanning the input to find further syntax errors. >+This leads to the problem of getting the parser ``restarted'' after an error. >+A general class of algorithms to do this involves discarding a number of tokens >+from the input string, and attempting to adjust the parser so that input can continue. >+.PP >+To allow the user some control over this process, >+Yacc provides a simple, but reasonably general, feature. >+The token name ``error'' is reserved for error handling. >+This name can be used in grammar rules; >+in effect, it suggests places where errors are expected, and recovery might take place. >+The parser pops its stack until it enters a state where the token ``error'' is legal. >+It then behaves as if the token ``error'' were the current lookahead token, >+and performs the action encountered. >+The lookahead token is then reset to the token that caused the error. >+If no special error rules have been specified, the processing halts when an error is detected. >+.PP >+In order to prevent a cascade of error messages, the parser, after >+detecting an error, remains in error state until three tokens have been successfully >+read and shifted. >+If an error is detected when the parser is already in error state, >+no message is given, and the input token is quietly deleted. >+.PP >+As an example, a rule of the form >+.DS >+stat : error >+.DE >+would, in effect, mean that on a syntax error the parser would attempt to skip over the statement >+in which the error was seen. >+More precisely, the parser will >+scan ahead, looking for three tokens that might legally follow >+a statement, and start processing at the first of these; if >+the beginnings of statements are not sufficiently distinctive, it may make a >+false start in the middle of a statement, and end up reporting a >+second error where there is in fact no error. >+.PP >+Actions may be used with these special error rules. >+These actions might attempt to reinitialize tables, reclaim symbol table space, etc. >+.PP >+Error rules such as the above are very general, but difficult to control. >+Somewhat easier are rules such as >+.DS >+stat : error \';\' >+.DE >+Here, when there is an error, the parser attempts to skip over the statement, but >+will do so by skipping to the next \';\'. >+All tokens after the error and before the next \';\' cannot be shifted, and are discarded. >+When the \';\' is seen, this rule will be reduced, and any ``cleanup'' >+action associated with it performed. >+.PP >+Another form of error rule arises in interactive applications, where >+it may be desirable to permit a line to be reentered after an error. >+A possible error rule might be >+.DS >+input : error \'\en\' { printf( "Reenter last line: " ); } input >+ { $$ = $4; } >+.DE >+There is one potential difficulty with this approach; >+the parser must correctly process three input tokens before it >+admits that it has correctly resynchronized after the error. >+If the reentered line contains an error >+in the first two tokens, the parser deletes the offending tokens, >+and gives no message; this is clearly unacceptable. >+For this reason, there is a mechanism that >+can be used to force the parser >+to believe that an error has been fully recovered from. >+The statement >+.DS >+yyerrok ; >+.DE >+in an action >+resets the parser to its normal mode. >+The last example is better written >+.DS >+input : error \'\en\' >+ { yyerrok; >+ printf( "Reenter last line: " ); } >+ input >+ { $$ = $4; } >+ ; >+.DE >+.PP >+As mentioned above, the token seen immediately >+after the ``error'' symbol is the input token at which the >+error was discovered. >+Sometimes, this is inappropriate; for example, an >+error recovery action might >+take upon itself the job of finding the correct place to resume input. >+In this case, >+the previous lookahead token must be cleared. >+The statement >+.DS >+yyclearin ; >+.DE >+in an action will have this effect. >+For example, suppose the action after error >+were to call some sophisticated resynchronization routine, >+supplied by the user, that attempted to advance the input to the >+beginning of the next valid statement. >+After this routine was called, the next token returned by yylex would presumably >+be the first token in a legal statement; >+the old, illegal token must be discarded, and the error state reset. >+This could be done by a rule like >+.DS >+stat : error >+ { resynch(); >+ yyerrok ; >+ yyclearin ; } >+ ; >+.DE >+.PP >+These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser >+from many errors; >+moreover, the user can get control to deal with >+the error actions required by other portions of the program. >--- /usr/src/share/doc/psd/15.yacc/ss8 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss8 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,93 @@ >+.\" @(#)ss8 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+8: The Yacc Environment >+.PP >+When the user inputs a specification >+to Yacc, the output is a file of C programs, called >+.I y.tab.c >+on most >+systems >+(due to local file system conventions, the names may differ from >+installation to installation). >+The function produced by Yacc is called >+.I yyparse \|; >+it is an integer valued function. >+When it is called, it in turn repeatedly calls >+.I yylex , >+the lexical analyzer >+supplied by the user (see Section 3) >+to obtain input tokens. >+Eventually, either an error is detected, in which case >+(if no error recovery is possible) >+.I yyparse >+returns the value 1, >+or the lexical analyzer returns the endmarker token >+and the parser accepts. >+In this case, >+.I yyparse >+returns the value 0. >+.PP >+The user must provide a certain amount of environment for this >+parser in order to obtain a working program. >+For example, as with every C program, a program called >+.I main >+must be defined, that eventually calls >+.I yyparse . >+In addition, a routine called >+.I yyerror >+prints a message >+when a syntax error is detected. >+.PP >+These two routines must be supplied in one form or another by the >+user. >+To ease the initial effort of using Yacc, a library has been >+provided with default versions of >+.I main >+and >+.I yyerror . >+The name of this library is system dependent; >+on many systems the library is accessed by a >+.B \-ly >+argument to the loader. >+To show the triviality of these default programs, the source is >+given below: >+.DS >+main(){ >+ return( yyparse() ); >+ } >+.DE >+and >+.DS >+# include <stdio.h> >+ >+yyerror(s) char *s; { >+ fprintf( stderr, "%s\en", s ); >+ } >+.DE >+The argument to >+.I yyerror >+is a string containing an error message, usually >+the string ``syntax error''. >+The average application will want to do better than this. >+Ordinarily, the program should keep track of the input line number, and print it >+along with the message when a syntax error is detected. >+The external integer variable >+.I yychar >+contains the lookahead token number at the time the error was detected; >+this may be of some interest in giving better diagnostics. >+Since the >+.I main >+program is probably supplied by the user (to read arguments, etc.) >+the Yacc library is useful only in small >+projects, or in the earliest stages of larger ones. >+.PP >+The external integer variable >+.I yydebug >+is normally set to 0. >+If it is set to a nonzero value, the parser will output a >+verbose description of its actions, including >+a discussion of which input symbols have been read, and >+what the parser actions are. >+Depending on the operating environment, >+it may be possible to set this variable by using a debugging system. >--- /usr/src/share/doc/psd/15.yacc/ss9 Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ss9 Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,169 @@ >+.\" @(#)ss9 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+9: Hints for Preparing Specifications >+.PP >+This section contains miscellaneous hints on preparing efficient, easy to change, >+and clear specifications. >+The individual subsections are more or less >+independent. >+.SH >+Input Style >+.PP >+It is difficult to >+provide rules with substantial actions >+and still have a readable specification file. >+The following style hints owe much to Brian Kernighan. >+.IP a. >+Use all capital letters for token names, all lower case letters for >+nonterminal names. >+This rule comes under the heading of ``knowing who to blame when >+things go wrong.'' >+.IP b. >+Put grammar rules and actions on separate lines. >+This allows either to be changed without >+an automatic need to change the other. >+.IP c. >+Put all rules with the same left hand side together. >+Put the left hand side in only once, and let all >+following rules begin with a vertical bar. >+.IP d. >+Put a semicolon only after the last rule with a given left hand side, >+and put the semicolon on a separate line. >+This allows new rules to be easily added. >+.IP e. >+Indent rule bodies by two tab stops, and action bodies by three >+tab stops. >+.PP >+The example in Appendix A is written following this style, as are >+the examples in the text of this paper (where space permits). >+The user must make up his own mind about these stylistic questions; >+the central problem, however, is to make the rules visible through >+the morass of action code. >+.SH >+Left Recursion >+.PP >+The algorithm used by the Yacc parser encourages so called ``left recursive'' >+grammar rules: rules of the form >+.DS >+name : name rest_of_rule ; >+.DE >+These rules frequently arise when >+writing specifications of sequences and lists: >+.DS >+list : item >+ | list \',\' item >+ ; >+.DE >+and >+.DS >+seq : item >+ | seq item >+ ; >+.DE >+In each of these cases, the first rule >+will be reduced for the first item only, and the second rule >+will be reduced for the second and all succeeding items. >+.PP >+With right recursive rules, such as >+.DS >+seq : item >+ | item seq >+ ; >+.DE >+the parser would be a bit bigger, and the items would be seen, and reduced, >+from right to left. >+More seriously, an internal stack in the parser >+would be in danger of overflowing if a very long sequence were read. >+Thus, the user should use left recursion wherever reasonable. >+.PP >+It is worth considering whether a sequence with zero >+elements has any meaning, and if so, consider writing >+the sequence specification with an empty rule: >+.DS >+seq : /* empty */ >+ | seq item >+ ; >+.DE >+Once again, the first rule would always be reduced exactly once, before the >+first item was read, >+and then the second rule would be reduced once for each item read. >+Permitting empty sequences >+often leads to increased generality. >+However, conflicts might arise if Yacc is asked to decide >+which empty sequence it has seen, when it hasn't seen enough to >+know! >+.SH >+Lexical Tie-ins >+.PP >+Some lexical decisions depend on context. >+For example, the lexical analyzer might want to >+delete blanks normally, but not within quoted strings. >+Or names might be entered into a symbol table in declarations, >+but not in expressions. >+.PP >+One way of handling this situation is >+to create a global flag that is >+examined by the lexical analyzer, and set by actions. >+For example, suppose a program >+consists of 0 or more declarations, followed by 0 or more statements. >+Consider: >+.DS >+%{ >+ int dflag; >+%} >+ ... other declarations ... >+ >+%% >+ >+prog : decls stats >+ ; >+ >+decls : /* empty */ >+ { dflag = 1; } >+ | decls declaration >+ ; >+ >+stats : /* empty */ >+ { dflag = 0; } >+ | stats statement >+ ; >+ >+ ... other rules ... >+.DE >+The flag >+.I dflag >+is now 0 when reading statements, and 1 when reading declarations, >+.ul >+except for the first token in the first statement. >+This token must be seen by the parser before it can tell that >+the declaration section has ended and the statements have >+begun. >+In many cases, this single token exception does not >+affect the lexical scan. >+.PP >+This kind of ``backdoor'' approach can be elaborated >+to a noxious degree. >+Nevertheless, it represents a way of doing some things >+that are difficult, if not impossible, to >+do otherwise. >+.SH >+Reserved Words >+.PP >+Some programming languages >+permit the user to >+use words like ``if'', which are normally reserved, >+as label or variable names, provided that such use does not >+conflict with the legal use of these names in the programming language. >+This is extremely hard to do in the framework of Yacc; >+it is difficult to pass information to the lexical analyzer >+telling it ``this instance of `if' is a keyword, and that instance is a variable''. >+The user can make a stab at it, using the >+mechanism described in the last subsection, >+but it is difficult. >+.PP >+A number of ways of making this easier are under advisement. >+Until then, it is better that the keywords be >+.I reserved \|; >+that is, be forbidden for use as variable names. >+There are powerful stylistic reasons for preferring this, anyway. >--- /usr/src/share/doc/psd/15.yacc/ssA Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssA Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,184 @@ >+.\" @(#)ssA 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+10: Advanced Topics >+.PP >+This section discusses a number of advanced features >+of Yacc. >+.SH >+Simulating Error and Accept in Actions >+.PP >+The parsing actions of error and accept can be simulated >+in an action by use of macros YYACCEPT and YYERROR. >+YYACCEPT causes >+.I yyparse >+to return the value 0; >+YYERROR causes >+the parser to behave as if the current input symbol >+had been a syntax error; >+.I yyerror >+is called, and error recovery takes place. >+These mechanisms can be used to simulate parsers >+with multiple endmarkers or context-sensitive syntax checking. >+.SH >+Accessing Values in Enclosing Rules. >+.PP >+An action may refer to values >+returned by actions to the left of the current rule. >+The mechanism is simply the same as with ordinary actions, >+a dollar sign followed by a digit, but in this case the >+digit may be 0 or negative. >+Consider >+.DS >+sent : adj noun verb adj noun >+ { \fIlook at the sentence\fR . . . } >+ ; >+ >+adj : THE { $$ = THE; } >+ | YOUNG { $$ = YOUNG; } >+ . . . >+ ; >+ >+noun : DOG >+ { $$ = DOG; } >+ | CRONE >+ { if( $0 == YOUNG ){ >+ printf( "what?\en" ); >+ } >+ $$ = CRONE; >+ } >+ ; >+ . . . >+.DE >+In the action following the word CRONE, a check is made that the >+preceding token shifted was not YOUNG. >+Obviously, this is only possible when a great deal is known about >+what might precede the symbol >+.I noun >+in the input. >+There is also a distinctly unstructured flavor about this. >+Nevertheless, at times this mechanism will save a great >+deal of trouble, especially when a few combinations are to >+be excluded from an otherwise regular structure. >+.SH >+Support for Arbitrary Value Types >+.PP >+By default, the values returned by actions and the lexical analyzer are integers. >+Yacc can also support >+values of other types, including structures. >+In addition, Yacc keeps track of the types, and inserts >+appropriate union member names so that the resulting parser will >+be strictly type checked. >+The Yacc value stack (see Section 4) >+is declared to be a >+.I union >+of the various types of values desired. >+The user declares the union, and associates union member names >+to each token and nonterminal symbol having a value. >+When the value is referenced through a $$ or $n construction, >+Yacc will automatically insert the appropriate union name, so that >+no unwanted conversions will take place. >+In addition, type checking commands such as >+.I Lint\| >+.[ >+Johnson Lint Checker 1273 >+.] >+will be far more silent. >+.PP >+There are three mechanisms used to provide for this typing. >+First, there is a way of defining the union; this must be >+done by the user since other programs, notably the lexical analyzer, >+must know about the union member names. >+Second, there is a way of associating a union member name with tokens >+and nonterminals. >+Finally, there is a mechanism for describing the type of those >+few values where Yacc can not easily determine the type. >+.PP >+To declare the union, the user includes in the declaration section: >+.DS >+%union { >+ body of union ... >+ } >+.DE >+This declares the Yacc value stack, >+and the external variables >+.I yylval >+and >+.I yyval , >+to have type equal to this union. >+If Yacc was invoked with the >+.B \-d >+option, the union declaration >+is copied onto the >+.I y.tab.h >+file. >+Alternatively, >+the union may be declared in a header file, and a typedef >+used to define the variable YYSTYPE to represent >+this union. >+Thus, the header file might also have said: >+.DS >+typedef union { >+ body of union ... >+ } YYSTYPE; >+.DE >+The header file must be included in the declarations >+section, by use of %{ and %}. >+.PP >+Once YYSTYPE is defined, >+the union member names must be associated >+with the various terminal and nonterminal names. >+The construction >+.DS >+< name > >+.DE >+is used to indicate a union member name. >+If this follows >+one of the >+keywords %token, >+%left, %right, and %nonassoc, >+the union member name is associated with the tokens listed. >+Thus, saying >+.DS >+%left <optype> \'+\' \'\-\' >+.DE >+will cause any reference to values returned by these two tokens to be >+tagged with >+the union member name >+.I optype . >+Another keyword, %type, is >+used similarly to associate >+union member names with nonterminals. >+Thus, one might say >+.DS >+%type <nodetype> expr stat >+.DE >+.PP >+There remain a couple of cases where these mechanisms are insufficient. >+If there is an action within a rule, the value returned >+by this action has no >+.I "a priori" >+type. >+Similarly, reference to left context values (such as $0 \- see the >+previous subsection ) leaves Yacc with no easy way of knowing the type. >+In this case, a type can be imposed on the reference by inserting >+a union member name, between < and >, immediately after >+the first $. >+An example of this usage is >+.DS >+rule : aaa { $<intval>$ = 3; } bbb >+ { fun( $<intval>2, $<other>0 ); } >+ ; >+.DE >+This syntax has little to recommend it, but the situation arises rarely. >+.PP >+A sample specification is given in Appendix C. >+The facilities in this subsection are not triggered until they are used: >+in particular, the use of %type will turn on these mechanisms. >+When they are used, there is a fairly strict level of checking. >+For example, use of $n or $$ to refer to something with no defined type >+is diagnosed. >+If these facilities are not triggered, the Yacc value stack is used to >+hold >+.I int' s, >+as was true historically. >--- /usr/src/share/doc/psd/15.yacc/ssB Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssB Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,26 @@ >+.\" @(#)ssB 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+11: Acknowledgements >+.PP >+Yacc owes much to a >+most stimulating collection of users, who have goaded >+me beyond my inclination, and frequently beyond my >+ability, in their endless search for ``one more feature''. >+Their irritating unwillingness to learn how to >+do things my way has usually led to my doing things their way; >+most of the time, they have been right. >+B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna, >+M. E. Lesk, >+and A. Snyder will recognize some of their ideas in the current version >+of Yacc. >+C. B. Haley contributed to the error recovery algorithm. >+D. M. Ritchie, B. W. Kernighan, and M. O. Harris helped translate this document into English. >+Al Aho also deserves special credit for bringing >+the mountain to Mohammed, and other favors. >+.SG "MH-1273-SCJ-unix" >+.bp >+.[ >+$LIST$ >+.] >+.bp >--- /usr/src/share/doc/psd/15.yacc/ssa Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssa Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,113 @@ >+.\" @(#)ssa 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+Appendix A: A Simple Example >+.PP >+This example gives the complete Yacc specification for a small desk calculator; >+the desk calculator has 26 registers, labeled ``a'' through ``z'', and accepts >+arithmetic expressions made up of the operators +, \-, *, /, >+% (mod operator), & (bitwise and), | (bitwise or), and assignment. >+If an expression at the top level is an assignment, the value is not >+printed; otherwise it is. >+As in C, an integer that begins with 0 (zero) is assumed to be octal; >+otherwise, it is assumed to be decimal. >+.PP >+As an example of a Yacc specification, the desk calculator >+does a reasonable job of showing how precedences and ambiguities >+are used, and demonstrating simple error recovery. >+The major oversimplifications are that the >+lexical analysis phase is much simpler than for most applications, and the >+output is produced immediately, line by line. >+Note the way that decimal and octal integers are read in by the grammar rules; >+This job is probably better done by the lexical analyzer. >+.sp >+.nf >+.ta .5i 1i 1.5i 2i 2.5i >+ >+%{ >+# include <stdio.h> >+# include <ctype.h> >+ >+int regs[26]; >+int base; >+ >+%} >+ >+%start list >+ >+%token DIGIT LETTER >+ >+%left \'|\' >+%left \'&\' >+%left \'+\' \'\-\' >+%left \'*\' \'/\' \'%\' >+%left UMINUS /* supplies precedence for unary minus */ >+ >+%% /* beginning of rules section */ >+ >+list : /* empty */ >+ | list stat \'\en\' >+ | list error \'\en\' >+ { yyerrok; } >+ ; >+ >+stat : expr >+ { printf( "%d\en", $1 ); } >+ | LETTER \'=\' expr >+ { regs[$1] = $3; } >+ ; >+ >+expr : \'(\' expr \')\' >+ { $$ = $2; } >+ | expr \'+\' expr >+ { $$ = $1 + $3; } >+ | expr \'\-\' expr >+ { $$ = $1 \- $3; } >+ | expr \'*\' expr >+ { $$ = $1 * $3; } >+ | expr \'/\' expr >+ { $$ = $1 / $3; } >+ | expr \'%\' expr >+ { $$ = $1 % $3; } >+ | expr \'&\' expr >+ { $$ = $1 & $3; } >+ | expr \'|\' expr >+ { $$ = $1 | $3; } >+ | \'\-\' expr %prec UMINUS >+ { $$ = \- $2; } >+ | LETTER >+ { $$ = regs[$1]; } >+ | number >+ ; >+ >+number : DIGIT >+ { $$ = $1; base = ($1==0) ? 8 : 10; } >+ | number DIGIT >+ { $$ = base * $1 + $2; } >+ ; >+ >+%% /* start of programs */ >+ >+yylex() { /* lexical analysis routine */ >+ /* returns LETTER for a lower case letter, yylval = 0 through 25 */ >+ /* return DIGIT for a digit, yylval = 0 through 9 */ >+ /* all other characters are returned immediately */ >+ >+ int c; >+ >+ while( (c=getchar()) == \' \' ) { /* skip blanks */ } >+ >+ /* c is now nonblank */ >+ >+ if( islower( c ) ) { >+ yylval = c \- \'a\'; >+ return ( LETTER ); >+ } >+ if( isdigit( c ) ) { >+ yylval = c \- \'0\'; >+ return( DIGIT ); >+ } >+ return( c ); >+ } >+.fi >+.bp >--- /usr/src/share/doc/psd/15.yacc/ssb Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssb Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,110 @@ >+.\" @(#)ssb 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+Appendix B: Yacc Input Syntax >+.PP >+This Appendix has a description of the Yacc input syntax, as a Yacc specification. >+Context dependencies, etc., are not considered. >+Ironically, the Yacc input specification language >+is most naturally specified as an LR(2) grammar; the sticky >+part comes when an identifier is seen in a rule, immediately >+following an action. >+If this identifier is followed by a colon, it is the start of the >+next rule; otherwise >+it is a continuation of the current rule, which just happens to have >+an action embedded in it. >+As implemented, the lexical analyzer looks >+ahead after seeing an identifier, and >+decide whether the next token (skipping blanks, newlines, comments, etc.) >+is a colon. >+If so, it returns the token C_IDENTIFIER. >+Otherwise, it returns IDENTIFIER. >+Literals (quoted strings) are also returned as IDENTIFIERS, >+but never as part of C_IDENTIFIERs. >+.sp >+.nf >+.ta .6i 1.2i 1.8i 2.4i 3i 3.6i >+ >+ /* grammar for the input to Yacc */ >+ >+ /* basic entities */ >+%token IDENTIFIER /* includes identifiers and literals */ >+%token C_IDENTIFIER /* identifier (but not literal) followed by colon */ >+%token NUMBER /* [0-9]+ */ >+ >+ /* reserved words: %type => TYPE, %left => LEFT, etc. */ >+ >+%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION >+ >+%token MARK /* the %% mark */ >+%token LCURL /* the %{ mark */ >+%token RCURL /* the %} mark */ >+ >+ /* ascii character literals stand for themselves */ >+ >+%start spec >+ >+%% >+ >+spec : defs MARK rules tail >+ ; >+ >+tail : MARK { \fIIn this action, eat up the rest of the file\fR } >+ | /* empty: the second MARK is optional */ >+ ; >+ >+defs : /* empty */ >+ | defs def >+ ; >+ >+def : START IDENTIFIER >+ | UNION { \fICopy union definition to output\fR } >+ | LCURL { \fICopy C code to output file\fR } RCURL >+ | ndefs rword tag nlist >+ ; >+ >+rword : TOKEN >+ | LEFT >+ | RIGHT >+ | NONASSOC >+ | TYPE >+ ; >+ >+tag : /* empty: union tag is optional */ >+ | \'<\' IDENTIFIER \'>\' >+ ; >+ >+nlist : nmno >+ | nlist nmno >+ | nlist \',\' nmno >+ ; >+ >+nmno : IDENTIFIER /* NOTE: literal illegal with %type */ >+ | IDENTIFIER NUMBER /* NOTE: illegal with %type */ >+ ; >+ >+ /* rules section */ >+ >+rules : C_IDENTIFIER rbody prec >+ | rules rule >+ ; >+ >+rule : C_IDENTIFIER rbody prec >+ | '|' rbody prec >+ ; >+ >+rbody : /* empty */ >+ | rbody IDENTIFIER >+ | rbody act >+ ; >+ >+act : \'{\' { \fICopy action, translate $$, etc.\fR } \'}\' >+ ; >+ >+prec : /* empty */ >+ | PREC IDENTIFIER >+ | PREC IDENTIFIER act >+ | prec \';\' >+ ; >+.fi >+.bp >--- /usr/src/share/doc/psd/15.yacc/ssc Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssc Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,311 @@ >+.\" @(#)ssc 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+Appendix C: An Advanced Example >+.PP >+This Appendix gives an example of a grammar using some >+of the advanced features discussed in Section 10. >+The desk calculator example in Appendix A is >+modified to provide a desk calculator that >+does floating point interval arithmetic. >+The calculator understands floating point >+constants, the arithmetic operations +, \-, *, /, >+unary \-, and = (assignment), and has 26 floating >+point variables, ``a'' through ``z''. >+Moreover, it also understands >+.I intervals , >+written >+.DS >+ ( x , y ) >+.DE >+where >+.I x >+is less than or equal to >+.I y . >+There are 26 interval valued variables ``A'' through ``Z'' >+that may also be used. >+The usage is similar to that in Appendix A; assignments >+return no value, and print nothing, while expressions print >+the (floating or interval) value. >+.PP >+This example explores a number of interesting features >+of Yacc and C. >+Intervals are represented by a structure, consisting of the >+left and right endpoint values, stored as >+.I double 's. >+This structure is given a type name, INTERVAL, by using >+.I typedef . >+The Yacc value stack can also contain floating point scalars, and >+integers (used to index into the arrays holding the variable values). >+Notice that this entire strategy depends strongly on being able to >+assign structures and unions in C. >+In fact, many of the actions call functions that return structures >+as well. >+.PP >+It is also worth noting the use of YYERROR to handle error conditions: >+division by an interval containing 0, and an interval presented in >+the wrong order. >+In effect, the error recovery mechanism of Yacc is used to throw away the >+rest of the offending line. >+.PP >+In addition to the mixing of types on the value stack, >+this grammar also demonstrates an interesting use of syntax to >+keep track of the type (e.g. scalar or interval) of intermediate >+expressions. >+Note that a scalar can be automatically promoted to an interval if >+the context demands an interval value. >+This causes a large number of conflicts when the grammar is run through >+Yacc: 18 Shift/Reduce and 26 Reduce/Reduce. >+The problem can be seen by looking at the two input lines: >+.DS >+ 2.5 + ( 3.5 \- 4. ) >+.DE >+and >+.DS >+ 2.5 + ( 3.5 , 4. ) >+.DE >+Notice that the 2.5 is to be used in an interval valued expression >+in the second example, but this fact is not known until >+the ``,'' is read; by this time, 2.5 is finished, and the parser cannot go back >+and change its mind. >+More generally, it might be necessary to look ahead an arbitrary number of >+tokens to decide whether to convert a scalar to an interval. >+This problem is evaded by having two rules for each binary interval >+valued operator: one when the left operand is a scalar, and one when >+the left operand is an interval. >+In the second case, the right operand must be an interval, >+so the conversion will be applied automatically. >+Despite this evasion, there are still many cases where the >+conversion may be applied or not, leading to the above >+conflicts. >+They are resolved by listing the rules that yield scalars first >+in the specification file; in this way, the conflicts will >+be resolved in the direction of keeping scalar >+valued expressions scalar valued until they are forced to become >+intervals. >+.PP >+This way of handling multiple types is very instructive, but not very general. >+If there were many kinds of expression types, instead of just two, >+the number of rules needed would increase dramatically, and the conflicts >+even more dramatically. >+Thus, while this example is instructive, it is better practice in a >+more normal programming language environment to >+keep the type information as part of the value, and not as part >+of the grammar. >+.PP >+Finally, a word about the lexical analysis. >+The only unusual feature is the treatment of floating point constants. >+The C library routine >+.I atof >+is used to do the actual conversion from a character string >+to a double precision value. >+If the lexical analyzer detects an error, >+it responds by returning a token that >+is illegal in the grammar, provoking a syntax error >+in the parser, and thence error recovery. >+.DS L >+ >+%{ >+ >+# include <stdio.h> >+# include <ctype.h> >+ >+typedef struct interval { >+ double lo, hi; >+ } INTERVAL; >+ >+INTERVAL vmul(), vdiv(); >+ >+double atof(); >+ >+double dreg[ 26 ]; >+INTERVAL vreg[ 26 ]; >+ >+%} >+ >+%start lines >+ >+%union { >+ int ival; >+ double dval; >+ INTERVAL vval; >+ } >+ >+%token <ival> DREG VREG /* indices into dreg, vreg arrays */ >+ >+%token <dval> CONST /* floating point constant */ >+ >+%type <dval> dexp /* expression */ >+ >+%type <vval> vexp /* interval expression */ >+ >+ /* precedence information about the operators */ >+ >+%left \'+\' \'\-\' >+%left \'*\' \'/\' >+%left UMINUS /* precedence for unary minus */ >+ >+%% >+ >+lines : /* empty */ >+ | lines line >+ ; >+ >+line : dexp \'\en\' >+ { printf( "%15.8f\en", $1 ); } >+ | vexp \'\en\' >+ { printf( "(%15.8f , %15.8f )\en", $1.lo, $1.hi ); } >+ | DREG \'=\' dexp \'\en\' >+ { dreg[$1] = $3; } >+ | VREG \'=\' vexp \'\en\' >+ { vreg[$1] = $3; } >+ | error \'\en\' >+ { yyerrok; } >+ ; >+ >+dexp : CONST >+ | DREG >+ { $$ = dreg[$1]; } >+ | dexp \'+\' dexp >+ { $$ = $1 + $3; } >+ | dexp \'\-\' dexp >+ { $$ = $1 \- $3; } >+ | dexp \'*\' dexp >+ { $$ = $1 * $3; } >+ | dexp \'/\' dexp >+ { $$ = $1 / $3; } >+ | \'\-\' dexp %prec UMINUS >+ { $$ = \- $2; } >+ | \'(\' dexp \')\' >+ { $$ = $2; } >+ ; >+ >+vexp : dexp >+ { $$.hi = $$.lo = $1; } >+ | \'(\' dexp \',\' dexp \')\' >+ { >+ $$.lo = $2; >+ $$.hi = $4; >+ if( $$.lo > $$.hi ){ >+ printf( "interval out of order\en" ); >+ YYERROR; >+ } >+ } >+ | VREG >+ { $$ = vreg[$1]; } >+ | vexp \'+\' vexp >+ { $$.hi = $1.hi + $3.hi; >+ $$.lo = $1.lo + $3.lo; } >+ | dexp \'+\' vexp >+ { $$.hi = $1 + $3.hi; >+ $$.lo = $1 + $3.lo; } >+ | vexp \'\-\' vexp >+ { $$.hi = $1.hi \- $3.lo; >+ $$.lo = $1.lo \- $3.hi; } >+ | dexp \'\-\' vexp >+ { $$.hi = $1 \- $3.lo; >+ $$.lo = $1 \- $3.hi; } >+ | vexp \'*\' vexp >+ { $$ = vmul( $1.lo, $1.hi, $3 ); } >+ | dexp \'*\' vexp >+ { $$ = vmul( $1, $1, $3 ); } >+ | vexp \'/\' vexp >+ { if( dcheck( $3 ) ) YYERROR; >+ $$ = vdiv( $1.lo, $1.hi, $3 ); } >+ | dexp \'/\' vexp >+ { if( dcheck( $3 ) ) YYERROR; >+ $$ = vdiv( $1, $1, $3 ); } >+ | \'\-\' vexp %prec UMINUS >+ { $$.hi = \-$2.lo; $$.lo = \-$2.hi; } >+ | \'(\' vexp \')\' >+ { $$ = $2; } >+ ; >+ >+%% >+ >+# define BSZ 50 /* buffer size for floating point numbers */ >+ >+ /* lexical analysis */ >+ >+yylex(){ >+ register c; >+ >+ while( (c=getchar()) == \' \' ){ /* skip over blanks */ } >+ >+ if( isupper( c ) ){ >+ yylval.ival = c \- \'A\'; >+ return( VREG ); >+ } >+ if( islower( c ) ){ >+ yylval.ival = c \- \'a\'; >+ return( DREG ); >+ } >+ >+ if( isdigit( c ) || c==\'.\' ){ >+ /* gobble up digits, points, exponents */ >+ >+ char buf[BSZ+1], *cp = buf; >+ int dot = 0, exp = 0; >+ >+ for( ; (cp\-buf)<BSZ ; ++cp,c=getchar() ){ >+ >+ *cp = c; >+ if( isdigit( c ) ) continue; >+ if( c == \'.\' ){ >+ if( dot++ || exp ) return( \'.\' ); /* will cause syntax error */ >+ continue; >+ } >+ >+ if( c == \'e\' ){ >+ if( exp++ ) return( \'e\' ); /* will cause syntax error */ >+ continue; >+ } >+ >+ /* end of number */ >+ break; >+ } >+ *cp = \'\e0\'; >+ if( (cp\-buf) >= BSZ ) printf( "constant too long: truncated\en" ); >+ else ungetc( c, stdin ); /* push back last char read */ >+ yylval.dval = atof( buf ); >+ return( CONST ); >+ } >+ return( c ); >+ } >+ >+INTERVAL hilo( a, b, c, d ) double a, b, c, d; { >+ /* returns the smallest interval containing a, b, c, and d */ >+ /* used by *, / routines */ >+ INTERVAL v; >+ >+ if( a>b ) { v.hi = a; v.lo = b; } >+ else { v.hi = b; v.lo = a; } >+ >+ if( c>d ) { >+ if( c>v.hi ) v.hi = c; >+ if( d<v.lo ) v.lo = d; >+ } >+ else { >+ if( d>v.hi ) v.hi = d; >+ if( c<v.lo ) v.lo = c; >+ } >+ return( v ); >+ } >+ >+INTERVAL vmul( a, b, v ) double a, b; INTERVAL v; { >+ return( hilo( a*v.hi, a*v.lo, b*v.hi, b*v.lo ) ); >+ } >+ >+dcheck( v ) INTERVAL v; { >+ if( v.hi >= 0. && v.lo <= 0. ){ >+ printf( "divisor interval contains 0.\en" ); >+ return( 1 ); >+ } >+ return( 0 ); >+ } >+ >+INTERVAL vdiv( a, b, v ) double a, b; INTERVAL v; { >+ return( hilo( a/v.hi, a/v.lo, b/v.hi, b/v.lo ) ); >+ } >+.DE >+.bp >--- /usr/src/share/doc/psd/15.yacc/ssd Thu Jan 1 00:00:00 1970 >+++ /usr/src/share/doc/psd.new/15.yacc/ssd Tue Feb 26 13:49:21 2002 >@@ -0,0 +1,40 @@ >+.\" @(#)ssd 6.1 (Berkeley) 5/8/86 >+.\" >+.SH >+Appendix D: Old Features Supported but not Encouraged >+.PP >+This Appendix mentions synonyms and features which are supported for historical >+continuity, but, for various reasons, are not encouraged. >+.IP 1. >+Literals may also be delimited by double quotes ``"''. >+.IP 2. >+Literals may be more than one character long. >+If all the characters are alphabetic, numeric, or \_, the type number of the literal is defined, >+just as if the literal did not have the quotes around it. >+Otherwise, it is difficult to find the value for such literals. >+.IP >+The use of multi-character literals is likely to mislead those unfamiliar with >+Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer. >+.IP 3. >+Most places where % is legal, backslash ``\e'' may be used. >+In particular, \e\e is the same as %%, \eleft the same as %left, etc. >+.IP 4. >+There are a number of other synonyms: >+.DS >+%< is the same as %left >+%> is the same as %right >+%binary and %2 are the same as %nonassoc >+%0 and %term are the same as %token >+%= is the same as %prec >+.DE >+.IP 5. >+Actions may also have the form >+.DS >+={ . . . } >+.DE >+and the curly braces can be dropped if the action is a >+single C statement. >+.IP 6. >+C code between %{ and %} used to be permitted at the >+head of the rules section, as well as in the >+declaration section.
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Diff
Attachments on
bug 35345
: 19963