View | Details | Raw Unified | Return to bug 35345
Collapse All | Expand All

(-)/usr/src/share/doc/psd.new/Makefile (-3 / +3 lines)
Lines 4-18 Link Here
4
# The following modules do not build/install:
4
# The following modules do not build/install:
5
# 10.gdb
5
# 10.gdb
6
6
7
# The following modules are encumbered:
7
# The following modules are missing because they were encumbered:
8
# 01.cacm 02.implement 03.iosys 04.uprog 06.Clang 11.adb 14.sccs
8
# 01.cacm 02.implement 03.iosys 04.uprog 06.Clang 11.adb 14.sccs
9
# 15.yacc 16.lex 17.m4
9
# 16.lex 17.m4
10
10
11
# The following modules do not apply to FreeBSD:
11
# The following modules do not apply to FreeBSD:
12
# 07.pascal 08.f77 09.f77io
12
# 07.pascal 08.f77 09.f77io
13
13
14
SUBDIR=	 title contents 
14
SUBDIR=	 title contents 
15
SUBDIR+= 05.sysman 12.make 13.rcs 18.gprof 20.ipctut 21.ipc
15
SUBDIR+= 05.sysman 12.make 13.rcs 15.yacc 18.gprof 20.ipctut 21.ipc
16
16
17
# The following modules are new in FreeBSD:
17
# The following modules are new in FreeBSD:
18
SUBDIR+= 22.rpcgen 23.rpc 24.xdr 25.xdrrfc 26.rpcrfc 27.nfsrpc
18
SUBDIR+= 22.rpcgen 23.rpc 24.xdr 25.xdrrfc 26.rpcrfc 27.nfsrpc
(-)/usr/src/share/doc/psd.new/15.yacc/Makefile (+10 lines)
Line 0 Link Here
1
#	From: @(#)Makefile	8.1 (Berkeley) 6/8/93
2
# $FreeBSD$
3
4
VOLUME=	psd/15.yacc
5
SRCS=	ss.. ss0 ss1 ss2 ss3 ss4 ss5 ss6 ss7 ss8 ss9 ssA ssB ssa ssb ssc ssd
6
MACROS=	-ms
7
8
USE_REFER=	yes
9
10
.include <bsd.doc.mk>
(-)/usr/src/share/doc/psd.new/15.yacc/ss.. (+56 lines)
Line 0 Link Here
1
.\"	@(#)ss..	6.1 (Berkeley) 5/8/86
2
.\"
3
.EH 'PSD:15-%''Yacc: Yet Another Compiler-Compiler'
4
.OH 'Yacc: Yet Another Compiler-Compiler''PSD:15-%'
5
.\".RP
6
.ND "July 31, 1978"
7
.TL
8
Yacc:
9
Yet Another Compiler-Compiler
10
.AU "MH 2C-559" 3968
11
Stephen C. Johnson
12
.AI
13
.MH
14
.AB
15
.PP
16
Computer program input generally has some structure;
17
in fact, every computer program that does input can be thought of as defining
18
an ``input language'' which it accepts.
19
An input language may be as complex as a programming language, or as simple as
20
a sequence of numbers.
21
Unfortunately, usual input facilities
22
are limited, difficult to use,
23
and often are lax about checking their inputs for validity.
24
.PP
25
Yacc provides a general tool for describing
26
the input to a computer program.
27
The Yacc user specifies the structures
28
of his input, together with code to be invoked as
29
each such structure is recognized.
30
Yacc turns such a specification into a subroutine that
31
handles the input process;
32
frequently, it is convenient and appropriate to have most
33
of the flow of control in the user's application
34
handled by this subroutine.
35
.PP
36
The input subroutine produced by Yacc calls a user-supplied routine to
37
return the next basic input item.
38
Thus, the user can specify his input in terms of individual input characters, or
39
in terms of higher level constructs such as names and numbers.
40
The user-supplied routine may also handle idiomatic features such as
41
comment and continuation conventions, which typically defy easy grammatical specification.
42
.PP
43
Yacc is written in portable C.
44
The class of specifications accepted is a very general one: LALR(1)
45
grammars with disambiguating rules.
46
.PP
47
In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc
48
has also been used for less conventional languages,
49
including a phototypesetter language, several desk calculator languages, a document retrieval system,
50
and a Fortran debugging system.
51
.AE
52
.OK
53
.\"Computer Languages
54
.\"Compilers
55
.\"Formal Language Theory
56
.CS 23 11 34 0 0 8
(-)/usr/src/share/doc/psd.new/15.yacc/ss0 (+201 lines)
Line 0 Link Here
1
.\"	@(#)ss0	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
0: Introduction
5
.PP
6
Yacc provides a general tool for imposing structure on the input to a computer program.
7
The Yacc user prepares a
8
specification of the input process; this includes rules
9
describing the input structure, code to be invoked when these
10
rules are recognized, and a low-level routine to do the
11
basic input.
12
Yacc then generates a function to control the input process.
13
This function, called a
14
.I parser ,
15
calls the user-supplied low-level input routine
16
(the
17
.I "lexical analyzer" )
18
to pick up the basic items
19
(called
20
.I tokens )
21
from the input stream.
22
These tokens are organized according to the input structure rules,
23
called
24
.I "grammar rules" \|;
25
when one of these rules has been recognized,
26
then user code supplied for this rule, an
27
.I action ,
28
is invoked; actions have the ability to return values and
29
make use of the values of other actions.
30
.PP
31
Yacc is written in a portable dialect of C
32
.[
33
Ritchie Kernighan Language Prentice
34
.]
35
and the actions, and output subroutine, are in C as well.
36
Moreover, many of the syntactic conventions of Yacc follow C.
37
.PP
38
The heart of the input specification is a collection of grammar rules.
39
Each rule describes an allowable structure and gives it a name.
40
For example, one grammar rule might be
41
.DS
42
date  :  month\_name  day  \',\'  year   ;
43
.DE
44
Here,
45
.I date ,
46
.I month\_name ,
47
.I day ,
48
and
49
.I year
50
represent structures of interest in the input process;
51
presumably,
52
.I month\_name ,
53
.I day ,
54
and
55
.I year
56
are defined elsewhere.
57
The comma ``,'' is enclosed in single quotes; this implies that the
58
comma is to appear literally in the input.
59
The colon and semicolon merely serve as punctuation in the rule, and have
60
no significance in controlling the input.
61
Thus, with proper definitions, the input
62
.DS
63
July  4, 1776
64
.DE
65
might be matched by the above rule.
66
.PP
67
An important part of the input process is carried out by the
68
lexical analyzer.
69
This user routine reads the input stream, recognizing the lower level structures,
70
and communicates these tokens
71
to the parser.
72
For historical reasons, a structure recognized by the lexical analyzer is called a
73
.I "terminal symbol" ,
74
while the structure recognized by the parser is called a
75
.I "nonterminal symbol" .
76
To avoid confusion, terminal symbols will usually be referred to as
77
.I tokens .
78
.PP
79
There is considerable leeway in deciding whether to recognize structures using the lexical
80
analyzer or grammar rules.
81
For example, the rules
82
.DS
83
month\_name  :  \'J\' \'a\' \'n\'   ;
84
month\_name  :  \'F\' \'e\' \'b\'   ;
85
86
         . . .
87
88
month\_name  :  \'D\' \'e\' \'c\'   ;
89
.DE
90
might be used in the above example.
91
The lexical analyzer would only need to recognize individual letters, and
92
.I month\_name
93
would be a nonterminal symbol.
94
Such low-level rules tend to waste time and space, and may
95
complicate the specification beyond Yacc's ability to deal with it.
96
Usually, the lexical analyzer would
97
recognize the month names,
98
and return an indication that a
99
.I month\_name
100
was seen; in this case,
101
.I month\_name
102
would be a token.
103
.PP
104
Literal characters such as ``,'' must also be passed through the lexical
105
analyzer, and are also considered tokens.
106
.PP
107
Specification files are very flexible.
108
It is realively easy to add to the above example the rule
109
.DS
110
date  :  month \'/\' day \'/\' year   ;
111
.DE
112
allowing
113
.DS
114
7 / 4 / 1776
115
.DE
116
as a synonym for
117
.DS
118
July 4, 1776
119
.DE
120
In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
121
and little danger of disrupting existing input.
122
.PP
123
The input being read may not conform to the
124
specifications.
125
These input errors are detected as early as is theoretically possible with a
126
left-to-right scan;
127
thus, not only is the chance of reading and computing with bad
128
input data substantially reduced, but the bad data can usually be quickly found.
129
Error handling,
130
provided as part of the input specifications,
131
permits the reentry of bad data,
132
or the continuation of the input process after skipping over the bad data.
133
.PP
134
In some cases, Yacc fails to produce a parser when given a set of
135
specifications.
136
For example, the specifications may be self contradictory, or they may
137
require a more powerful recognition mechanism than that available to Yacc.
138
The former cases represent design errors;
139
the latter cases
140
can often be corrected
141
by making
142
the lexical analyzer
143
more powerful, or by rewriting some of the grammar rules.
144
While Yacc cannot handle all possible specifications, its power
145
compares favorably with similar systems;
146
moreover, the
147
constructions which are difficult for Yacc to handle are
148
also frequently difficult for human beings to handle.
149
Some users have reported that the discipline of formulating valid
150
Yacc specifications for their input revealed errors of
151
conception or design early in the program development.
152
.PP
153
The theory underlying Yacc has been described elsewhere.
154
.[
155
Aho Johnson Surveys LR Parsing
156
.]
157
.[
158
Aho Johnson Ullman Ambiguous Grammars
159
.]
160
.[
161
Aho Ullman Principles Compiler Design
162
.]
163
Yacc has been extensively used in numerous practical applications,
164
including
165
.I lint ,
166
.[
167
Johnson Lint
168
.]
169
the Portable C Compiler,
170
.[
171
Johnson Portable Compiler Theory
172
.]
173
and a system for typesetting mathematics.
174
.[
175
Kernighan Cherry typesetting system CACM
176
.]
177
.PP
178
The next several sections describe the
179
basic process of preparing a Yacc specification;
180
Section 1 describes the preparation of grammar rules,
181
Section 2 the preparation of the user supplied actions associated with these rules,
182
and Section 3 the preparation of lexical analyzers.
183
Section 4 describes the operation of the parser.
184
Section 5 discusses various reasons why Yacc may be unable to produce a
185
parser from a specification, and what to do about it.
186
Section 6 describes a simple mechanism for
187
handling operator precedences in arithmetic expressions.
188
Section 7 discusses error detection and recovery.
189
Section 8 discusses the operating environment and special features
190
of the parsers Yacc produces.
191
Section 9 gives some suggestions which should improve the
192
style and efficiency of the specifications.
193
Section 10 discusses some advanced topics, and Section 11 gives
194
acknowledgements.
195
Appendix A has a brief example, and Appendix B gives a
196
summary of the Yacc input syntax.
197
Appendix C gives an example using some of the more advanced
198
features of Yacc, and, finally,
199
Appendix D describes mechanisms and syntax
200
no longer actively supported, but
201
provided for historical continuity with older versions of Yacc.
(-)/usr/src/share/doc/psd.new/15.yacc/ss1 (+138 lines)
Line 0 Link Here
1
.\"	@(#)ss1	6.1 (Berkeley) 5/8/86
2
.\"
3
.tr *\(**
4
.tr |\(or
5
.SH
6
1: Basic Specifications
7
.PP
8
Names refer to either tokens or nonterminal symbols.
9
Yacc requires
10
token names to be declared as such.
11
In addition, for reasons discussed in Section 3, it is often desirable
12
to include the lexical analyzer as part of the specification file;
13
it may be useful to include other programs as well.
14
Thus, every specification file consists of three sections:
15
the
16
.I declarations ,
17
.I "(grammar) rules" ,
18
and
19
.I programs .
20
The sections are separated by double percent ``%%'' marks.
21
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
22
.PP
23
In other words, a full specification file looks like
24
.DS
25
declarations
26
%%
27
rules
28
%%
29
programs
30
.DE
31
.PP
32
The declaration section may be empty.
33
Moreover, if the programs section is omitted, the second %% mark may be omitted also;
34
thus, the smallest legal Yacc specification is
35
.DS
36
%%
37
rules
38
.DE
39
.PP
40
Blanks, tabs, and newlines are ignored except
41
that they may not appear in names or multi-character reserved symbols.
42
Comments may appear wherever a name is legal; they are enclosed
43
in /* . . . */, as in C and PL/I.
44
.PP
45
The rules section is made up of one or more grammar rules.
46
A grammar rule has the form:
47
.DS
48
A  :  BODY  ;
49
.DE
50
A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
51
The colon and the semicolon are Yacc punctuation.
52
.PP
53
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
54
non-initial digits.
55
Upper and lower case letters are distinct.
56
The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
57
.PP
58
A literal consists of a character enclosed in single quotes ``\'''.
59
As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
60
are recognized.
61
Thus
62
.DS
63
\'\en\'	newline
64
\'\er\'	return
65
\'\e\'\'	single quote ``\'''
66
\'\e\e\'	backslash ``\e''
67
\'\et\'	tab
68
\'\eb\'	backspace
69
\'\ef\'	form feed
70
\'\exxx\'	``xxx'' in octal
71
.DE
72
For a number of technical reasons, the
73
\s-2NUL\s0
74
character (\'\e0\' or 0) should never
75
be used in grammar rules.
76
.PP
77
If there are several grammar rules with the same left hand side, the vertical bar ``|''
78
can be used to avoid rewriting the left hand side.
79
In addition,
80
the semicolon at the end of a rule can be dropped before a vertical bar.
81
Thus the grammar rules
82
.DS
83
A	:	B  C  D   ;
84
A	:	E  F   ;
85
A	:	G   ;
86
.DE
87
can be given to Yacc as
88
.DS
89
A	:	B  C  D
90
	|	E  F
91
	|	G
92
	;
93
.DE
94
It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
95
although it makes the input much more readable, and easier to change.
96
.PP
97
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
98
.DS
99
empty :   ;
100
.DE
101
.PP
102
Names representing tokens must be declared; this is most simply done by writing
103
.DS
104
%token   name1  name2 . . .
105
.DE
106
in the declarations section.
107
(See Sections 3 , 5, and 6 for much more discussion).
108
Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
109
Every nonterminal symbol must appear on the left side of at least one rule.
110
.PP
111
Of all the nonterminal symbols, one, called the
112
.I "start symbol" ,
113
has particular importance.
114
The parser is designed to recognize the start symbol; thus,
115
this symbol represents the largest,
116
most general structure described by the grammar rules.
117
By default,
118
the start symbol is taken to be the left hand side of the first
119
grammar rule in the rules section.
120
It is possible, and in fact desirable, to declare the start
121
symbol explicitly in the declarations section using the %start keyword:
122
.DS
123
%start   symbol
124
.DE
125
.PP
126
The end of the input to the parser is signaled by a special token, called the
127
.I endmarker .
128
If the tokens up to, but not including, the endmarker form a structure
129
which matches the start symbol, the parser function returns to its caller
130
after the endmarker is seen; it
131
.I accepts
132
the input.
133
If the endmarker is seen in any other context, it is an error.
134
.PP
135
It is the job of the user-supplied lexical analyzer
136
to return the endmarker when appropriate; see section 3, below.
137
Usually the endmarker represents some reasonably obvious 
138
I/O status, such as ``end-of-file'' or ``end-of-record''.
(-)/usr/src/share/doc/psd.new/15.yacc/ss2 (+151 lines)
Line 0 Link Here
1
.\"	@(#)ss2	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
2: Actions
5
.PP
6
With each grammar rule, the user may associate actions to be performed each time
7
the rule is recognized in the input process.
8
These actions may return values, and may obtain the values returned by previous
9
actions.
10
Moreover, the lexical analyzer can return values
11
for tokens, if desired.
12
.PP
13
An action is an arbitrary C statement, and as such can do
14
input and output, call subprograms, and alter
15
external vectors and variables.
16
An action is specified by
17
one or more statements, enclosed in curly braces ``{'' and ``}''.
18
For example,
19
.DS
20
A	:	\'(\'  B  \')\'
21
			{	hello( 1, "abc" );  }
22
.DE
23
and
24
.DS
25
XXX	:	YYY  ZZZ
26
			{	printf("a message\en");
27
				flag = 25;   }
28
.DE
29
are grammar rules with actions.
30
.PP
31
To facilitate easy communication between the actions and the parser, the action statements are altered
32
slightly.
33
The symbol ``dollar sign'' ``$'' is used as a signal to Yacc in this context.
34
.PP
35
To return a value, the action normally sets the
36
pseudo-variable ``$$'' to some value.
37
For example, an action that does nothing but return the value 1 is
38
.DS
39
	{  $$ = 1;  }
40
.DE
41
.PP
42
To obtain the values returned by previous actions and the lexical analyzer, the
43
action may use the pseudo-variables $1, $2, . . .,
44
which refer to the values returned by the
45
components of the right side of a rule, reading from left to right.
46
Thus, if the rule is
47
.DS
48
A	:	B  C  D   ;
49
.DE
50
for example, then $2 has the value returned by C, and $3 the value returned by D.
51
.PP
52
As a more concrete example, consider the rule
53
.DS
54
expr	:	\'(\'  expr  \')\'   ;
55
.DE
56
The value returned by this rule is usually the value of the
57
.I expr
58
in parentheses.
59
This can be indicated by
60
.DS
61
expr	:	 \'(\'  expr  \')\'		{  $$ = $2 ;  }
62
.DE
63
.PP
64
By default, the value of a rule is the value of the first element in it ($1).
65
Thus, grammar rules of the form
66
.DS
67
A	:	B    ;
68
.DE
69
frequently need not have an explicit action.
70
.PP
71
In the examples above, all the actions came at the end of their rules.
72
Sometimes, it is desirable to get control before a rule is fully parsed.
73
Yacc permits an action to be written in the middle of a rule as well
74
as at the end.
75
This rule is assumed to return a value, accessible
76
through the usual \$ mechanism by the actions to
77
the right of it.
78
In turn, it may access the values
79
returned by the symbols to its left.
80
Thus, in the rule
81
.DS
82
A	:	B
83
			{  $$ = 1;  }
84
		C
85
			{   x = $2;   y = $3;  }
86
	;
87
.DE
88
the effect is to set
89
.I x
90
to 1, and
91
.I y
92
to the value returned by C.
93
.PP
94
Actions that do not terminate a rule are actually
95
handled by Yacc by manufacturing a new nonterminal
96
symbol name, and a new rule matching this
97
name to the empty string.
98
The interior action is the action triggered off by recognizing
99
this added rule.
100
Yacc actually treats the above example as if
101
it had been written:
102
.DS
103
$ACT	:	/* empty */
104
			{  $$ = 1;  }
105
	;
106
107
A	:	B  $ACT  C
108
			{   x = $2;   y = $3;  }
109
	;
110
.DE
111
.PP
112
In many applications, output is not done directly by the actions;
113
rather, a data structure, such as a parse tree, is constructed in memory,
114
and transformations are applied to it before output is generated.
115
Parse trees are particularly easy to
116
construct, given routines to build and maintain the tree
117
structure desired.
118
For example, suppose there is a C function
119
.I node ,
120
written so that the call
121
.DS
122
node( L, n1, n2 )
123
.DE
124
creates a node with label L, and descendants n1 and n2, and returns the index of
125
the newly created node.
126
Then parse tree can be built by supplying actions such as:
127
.DS
128
expr	:	expr  \'+\'  expr  
129
			{  $$ = node( \'+\', $1, $3 );  }
130
.DE
131
in the specification.
132
.PP
133
The user may define other variables to be used by the actions.
134
Declarations and definitions can appear in
135
the declarations section,
136
enclosed in the marks ``%{'' and ``%}''.
137
These declarations and definitions have global scope, 
138
so they are known to the action statements and the lexical analyzer.
139
For example,
140
.DS
141
%{   int variable = 0;   %}
142
.DE
143
could be placed in the declarations section,
144
making
145
.I variable
146
accessible to all of the actions.
147
The Yacc parser uses only names beginning in ``yy'';
148
the user should avoid such names.
149
.PP
150
In these examples, all the values are integers: a discussion of
151
values of other types will be found in Section 10.
(-)/usr/src/share/doc/psd.new/15.yacc/ss3 (+104 lines)
Line 0 Link Here
1
.\"	@(#)ss3	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
3: Lexical Analysis
5
.PP
6
The user must supply a lexical analyzer to read the input stream and communicate tokens
7
(with values, if desired) to the parser.
8
The lexical analyzer is an integer-valued function called
9
.I yylex .
10
The function returns an integer, the
11
.I "token number" ,
12
representing the kind of token read.
13
If there is a value associated with that token, it should be assigned
14
to the external variable
15
.I yylval .
16
.PP
17
The parser and the lexical analyzer must agree on these token numbers in order for
18
communication between them to take place.
19
The numbers may be chosen by Yacc, or chosen by the user.
20
In either case, the ``# define'' mechanism of C is used to allow the lexical analyzer
21
to return these numbers symbolically.
22
For example, suppose that the token name DIGIT has been defined in the declarations section of the
23
Yacc specification file.
24
The relevant portion of the lexical analyzer might look like:
25
.DS
26
yylex(){
27
	extern int yylval;
28
	int c;
29
	. . .
30
	c = getchar();
31
	. . .
32
	switch( c ) {
33
		. . .
34
	case \'0\':
35
	case \'1\':
36
	  . . .
37
	case \'9\':
38
		yylval = c\-\'0\';
39
		return( DIGIT );
40
		. . .
41
		}
42
	. . .
43
.DE
44
.PP
45
The intent is to return a token number of DIGIT, and a value equal to the numerical value of the
46
digit.
47
Provided that the lexical analyzer code is placed in the programs section of the specification file,
48
the identifier DIGIT will be defined as the token number associated
49
with the token DIGIT.
50
.PP
51
This mechanism leads to clear,
52
easily modified lexical analyzers; the only pitfall is the need
53
to avoid using any token names in the grammar that are reserved
54
or significant in C or the parser; for example, the use of
55
token names
56
.I if
57
or
58
.I while
59
will almost certainly cause severe
60
difficulties when the lexical analyzer is compiled.
61
The token name
62
.I error
63
is reserved for error handling, and should not be used naively
64
(see Section 7).
65
.PP
66
As mentioned above, the token numbers may be chosen by Yacc or by the user.
67
In the default situation, the numbers are chosen by Yacc.
68
The default token number for a literal
69
character is the numerical value of the character in the local character set.
70
Other names are assigned token numbers
71
starting at 257.
72
.PP
73
To assign a token number to a token (including literals),
74
the first appearance of the token name or literal
75
.I
76
in the declarations section
77
.R
78
can be immediately followed by
79
a nonnegative integer.
80
This integer is taken to be the token number of the name or literal.
81
Names and literals not defined by this mechanism retain their default definition.
82
It is important that all token numbers be distinct.
83
.PP
84
For historical reasons, the endmarker must have token
85
number 0 or negative.
86
This token number cannot be redefined by the user; thus, all
87
lexical analyzers should be prepared to return 0 or negative as a token number
88
upon reaching the end of their input.
89
.PP
90
A very useful tool for constructing lexical analyzers is
91
the
92
.I Lex
93
program developed by Mike Lesk.
94
.[
95
Lesk Lex
96
.]
97
These lexical analyzers are designed to work in close
98
harmony with Yacc parsers.
99
The specifications for these lexical analyzers
100
use regular expressions instead of grammar rules.
101
Lex can be easily used to produce quite complicated lexical analyzers,
102
but there remain some languages (such as FORTRAN) which do not
103
fit any theoretical framework, and whose lexical analyzers
104
must be crafted by hand.
(-)/usr/src/share/doc/psd.new/15.yacc/ss4 (+330 lines)
Line 0 Link Here
1
.\"	@(#)ss4	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
4: How the Parser Works
5
.PP
6
Yacc turns the specification file into a C program, which
7
parses the input according to the specification given.
8
The algorithm used to go from the
9
specification to the parser is complex, and will not be discussed
10
here (see
11
the references for more information).
12
The parser itself, however, is relatively simple,
13
and understanding how it works, while
14
not strictly necessary, will nevertheless make
15
treatment of error recovery and ambiguities much more
16
comprehensible.
17
.PP
18
The parser produced by Yacc consists
19
of a finite state machine with a stack.
20
The parser is also capable of reading and remembering the next
21
input token (called the
22
.I lookahead
23
token).
24
The
25
.I "current state"
26
is always the one on the top of the stack.
27
The states of the finite state machine are given
28
small integer labels; initially, the machine is in state 0,
29
the stack contains only state 0, and no lookahead token has been read.
30
.PP
31
The machine has only four actions available to it, called
32
.I shift ,
33
.I reduce ,
34
.I accept ,
35
and
36
.I error .
37
A move of the parser is done as follows:
38
.IP 1.
39
Based on its current state, the parser decides
40
whether it needs a lookahead token to decide
41
what action should be done; if it needs one, and does
42
not have one, it calls
43
.I yylex
44
to obtain the next token.
45
.IP 2.
46
Using the current state, and the lookahead token
47
if needed, the parser decides on its next action, and
48
carries it out.
49
This may result in states being pushed onto the stack, or popped off of
50
the stack, and in the lookahead token being processed
51
or left alone.
52
.PP
53
The
54
.I shift
55
action is the most common action the parser takes.
56
Whenever a shift action is taken, there is always
57
a lookahead token.
58
For example, in state 56 there may be
59
an action:
60
.DS
61
	IF	shift 34
62
.DE
63
which says, in state 56, if the lookahead token is IF,
64
the current state (56) is pushed down on the stack,
65
and state 34 becomes the current state (on the
66
top of the stack).
67
The lookahead token is cleared.
68
.PP
69
The
70
.I reduce
71
action keeps the stack from growing without
72
bounds.
73
Reduce actions are appropriate when the parser has seen
74
the right hand side of a grammar rule,
75
and is prepared to announce that it has seen
76
an instance of the rule, replacing the right hand side
77
by the left hand side.
78
It may be necessary to consult the lookahead token
79
to decide whether to reduce, but
80
usually it is not; in fact, the default
81
action (represented by a ``.'') is often a reduce action.
82
.PP
83
Reduce actions are associated with individual grammar rules.
84
Grammar rules are also given small integer
85
numbers, leading to some confusion.
86
The action
87
.DS
88
	\fB.\fR	reduce 18
89
.DE
90
refers to
91
.I "grammar rule"
92
18, while the action
93
.DS
94
	IF	shift 34
95
.DE
96
refers to
97
.I state
98
34.
99
.PP
100
Suppose the rule being reduced is
101
.DS
102
A	\fB:\fR	x  y  z    ;
103
.DE
104
The reduce action depends on the
105
left hand symbol (A in this case), and the number of
106
symbols on the right hand side (three in this case).
107
To reduce, first pop off the top three states
108
from the stack
109
(In general, the number of states popped equals the number of symbols on the
110
right side of the rule).
111
In effect, these states were the ones
112
put on the stack while recognizing
113
.I x ,
114
.I y ,
115
and
116
.I z ,
117
and no longer serve any useful purpose.
118
After popping these states, a state is uncovered
119
which was the state the parser was in before beginning to
120
process the rule.
121
Using this uncovered state, and the symbol
122
on the left side of the rule, perform what is in
123
effect a shift of A.
124
A new state is obtained, pushed onto the stack, and parsing continues.
125
There are significant differences between the processing of
126
the left hand symbol and an ordinary shift of a token,
127
however, so this action is called a
128
.I goto
129
action.
130
In particular, the lookahead token is cleared by a shift, and
131
is not affected by a goto.
132
In any case, the uncovered state contains an entry such as:
133
.DS
134
	A	goto 20
135
.DE
136
causing state 20 to be pushed
137
onto the stack, and become the current state.
138
.PP
139
In effect, the reduce action ``turns back the clock'' in the parse,
140
popping the states off the stack to go back to the
141
state where the right hand side of the rule was first seen.
142
The parser then behaves as if it had seen the left side at that time.
143
If the right hand side of the rule is empty,
144
no states are popped off of the stack: the uncovered state
145
is in fact the current state.
146
.PP
147
The reduce action is also important in the treatment of user-supplied
148
actions and values.
149
When a rule is reduced, the code supplied with the rule is executed
150
before the stack is adjusted.
151
In addition to the stack holding the states, another stack,
152
running in parallel with it, holds the values returned
153
from the lexical analyzer and the actions.
154
When a shift takes place, the external variable
155
.I yylval
156
is copied onto the value stack.
157
After the return from the user code, the reduction is carried out.
158
When the
159
.I goto
160
action is done, the external variable
161
.I yyval
162
is copied onto the value stack.
163
The pseudo-variables $1, $2, etc., refer to the value stack.
164
.PP
165
The other two parser actions are conceptually much simpler.
166
The
167
.I accept
168
action indicates that the entire input has been seen and
169
that it matches the specification.
170
This action appears only when the lookahead token is 
171
the endmarker, and indicates that the parser has successfully
172
done its job.
173
The
174
.I error
175
action, on the other hand, represents a place where the parser
176
can no longer continue parsing according to the specification.
177
The input tokens it has seen, together with the lookahead token,
178
cannot be followed by anything that would result
179
in a legal input.
180
The parser reports an error, and attempts to recover the situation and
181
resume parsing: the error recovery (as opposed to the detection of error)
182
will be covered in Section 7.
183
.PP
184
It is time for an example!
185
Consider the specification
186
.DS
187
%token  DING  DONG  DELL
188
%%
189
rhyme	:	sound  place
190
	;
191
sound	:	DING  DONG
192
	;
193
place	:	DELL
194
	;
195
.DE
196
.PP
197
When Yacc is invoked with the
198
.B \-v
199
option, a file called
200
.I y.output
201
is produced, with a human-readable description of the parser.
202
The
203
.I y.output
204
file corresponding to the above grammar (with some statistics
205
stripped off the end) is:
206
.DS
207
state 0
208
	$accept  :  \_rhyme  $end 
209
210
	DING  shift 3
211
	.  error
212
213
	rhyme  goto 1
214
	sound  goto 2
215
216
state 1
217
	$accept  :   rhyme\_$end 
218
219
	$end  accept
220
	.  error
221
222
state 2
223
	rhyme  :   sound\_place 
224
225
	DELL  shift 5
226
	.  error
227
228
	place   goto 4
229
230
state 3
231
	sound   :   DING\_DONG 
232
233
	DONG  shift 6
234
	.  error
235
236
state 4
237
	rhyme  :   sound  place\_    (1)
238
239
	.   reduce  1
240
241
state 5
242
	place  :   DELL\_    (3)
243
244
	.   reduce  3
245
246
state 6
247
	sound   :   DING  DONG\_    (2)
248
249
	.   reduce  2
250
.DE
251
Notice that, in addition to the actions for each state, there is a
252
description of the parsing rules being processed in each
253
state.  The \_ character is used to indicate
254
what has been seen, and what is yet to come, in each rule.
255
Suppose the input is
256
.DS
257
DING  DONG  DELL
258
.DE
259
It is instructive to follow the steps of the parser while
260
processing this input.
261
.PP
262
Initially, the current state is state 0.
263
The parser needs to refer to the input in order to decide
264
between the actions available in state 0, so
265
the first token,
266
.I DING ,
267
is read, becoming the lookahead token.
268
The action in state 0 on
269
.I DING
270
is
271
is ``shift 3'', so state 3 is pushed onto the stack,
272
and the lookahead token is cleared.
273
State 3 becomes the current state.
274
The next token,
275
.I DONG ,
276
is read, becoming the lookahead token.
277
The action in state 3 on the token
278
.I DONG
279
is ``shift 6'',
280
so state 6 is pushed onto the stack, and the lookahead is cleared.
281
The stack now contains 0, 3, and 6.
282
In state 6, without even consulting the lookahead,
283
the parser reduces by rule 2.
284
.DS
285
	sound  :   DING  DONG
286
.DE
287
This rule has two symbols on the right hand side, so
288
two states, 6 and 3, are popped off of the stack, uncovering state 0.
289
Consulting the description of state 0, looking for a goto on 
290
.I sound ,
291
.DS
292
	sound	goto 2
293
.DE
294
is obtained; thus state 2 is pushed onto the stack,
295
becoming the current state.
296
.PP
297
In state 2, the next token,
298
.I DELL ,
299
must be read.
300
The action is ``shift 5'', so state 5 is pushed onto the stack, which now has
301
0, 2, and 5 on it, and the lookahead token is cleared.
302
In state 5, the only action is to reduce by rule 3.
303
This has one symbol on the right hand side, so one state, 5,
304
is popped off, and state 2 is uncovered.
305
The goto in state 2 on
306
.I place ,
307
the left side of rule 3,
308
is state 4.
309
Now, the stack contains 0, 2, and 4.
310
In state 4, the only action is to reduce by rule 1.
311
There are two symbols on the right, so the top two states are popped off,
312
uncovering state 0 again.
313
In state 0, there is a goto on
314
.I rhyme
315
causing the parser to enter state 1.
316
In state 1, the input is read; the endmarker is obtained,
317
indicated by ``$end'' in the
318
.I y.output
319
file.
320
The action in state 1 when the endmarker is seen is to accept,
321
successfully ending the parse.
322
.PP
323
The reader is urged to consider how the parser works
324
when confronted with such incorrect strings as
325
.I "DING DONG DONG" ,
326
.I "DING DONG" ,
327
.I "DING DONG DELL DELL" ,
328
etc.
329
A few minutes spend with this and other simple examples will
330
probably be repaid when problems arise in more complicated contexts.
(-)/usr/src/share/doc/psd.new/15.yacc/ss5 (+302 lines)
Line 0 Link Here
1
.\"	@(#)ss5	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
5: Ambiguity and Conflicts
5
.PP
6
A set of grammar rules is
7
.I ambiguous
8
if there is some input string that can be structured in two or more different ways.
9
For example, the grammar rule
10
.DS
11
expr	:	expr  \'\-\'  expr
12
.DE
13
is a natural way of expressing the fact that one way of forming an arithmetic expression
14
is to put two other expressions together with a minus sign between them.
15
Unfortunately, this grammar rule does not
16
completely specify the way that all complex inputs
17
should be structured.
18
For example, if the input is
19
.DS
20
expr  \-  expr  \-  expr
21
.DE
22
the rule allows this input to be structured as either
23
.DS
24
(  expr  \-  expr  )  \-  expr
25
.DE
26
or as
27
.DS
28
expr  \-  (  expr  \-  expr  )
29
.DE
30
(The first is called
31
.I "left association" ,
32
the second
33
.I "right association" ).
34
.PP
35
Yacc detects such ambiguities when it is attempting to build the parser.
36
It is instructive to consider the problem that confronts the parser when it is
37
given an input such as
38
.DS
39
expr  \-  expr  \-  expr
40
.DE
41
When the parser has read the second expr, the input that it has seen:
42
.DS
43
expr  \-  expr
44
.DE
45
matches the right side of the grammar rule above.
46
The parser could
47
.I reduce
48
the input by applying this rule;
49
after applying the rule;
50
the input is reduced to
51
.I expr (the
52
left side of the rule).
53
The parser would then read the final part of the input:
54
.DS
55
\-  expr
56
.DE
57
and again reduce.
58
The effect of this is to take the left associative interpretation.
59
.PP
60
Alternatively, when the parser has seen
61
.DS
62
expr  \-  expr
63
.DE
64
it could defer the immediate application of the rule, and continue reading
65
the input until it had seen
66
.DS
67
expr  \-  expr  \-  expr
68
.DE
69
It could then apply the rule to the rightmost three symbols, reducing them to
70
.I expr
71
and leaving
72
.DS
73
expr  \-  expr
74
.DE
75
Now the rule can be reduced once more; the effect is to
76
take the right associative interpretation.
77
Thus, having read
78
.DS
79
expr  \-  expr
80
.DE
81
the parser can do two legal things, a shift or a reduction, and has no way of
82
deciding between them.
83
This is called a
84
.I "shift / reduce conflict" .
85
It may also happen that the parser has a choice of two legal reductions;
86
this is called a
87
.I "reduce / reduce conflict" .
88
Note that there are never any ``Shift/shift'' conflicts.
89
.PP
90
When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
91
It does this by selecting one of the valid steps wherever it has a choice.
92
A rule describing which choice to make in a given situation is called
93
a
94
.I "disambiguating rule" .
95
.PP
96
Yacc invokes two disambiguating rules by default:
97
.IP 1.
98
In a shift/reduce conflict, the default is to do the shift.
99
.IP 2.
100
In a reduce/reduce conflict, the default is to reduce by the
101
.I earlier
102
grammar rule (in the input sequence).
103
.PP
104
Rule 1 implies that reductions are deferred whenever there is a choice,
105
in favor of shifts.
106
Rule 2 gives the user rather crude control over the behavior of the parser
107
in this situation, but reduce/reduce conflicts should be avoided whenever possible.
108
.PP
109
Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
110
require a more complex parser than Yacc can construct.
111
The use of actions within rules can also cause conflicts, if the action must
112
be done before the parser can be sure which rule is being recognized.
113
In these cases, the application of disambiguating rules is inappropriate,
114
and leads to an incorrect parser.
115
For this reason, Yacc
116
always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
117
.PP
118
In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
119
possible to rewrite the grammar rules so that the same inputs are read but there are no
120
conflicts.
121
For this reason, most previous parser generators
122
have considered conflicts to be fatal errors.
123
Our experience has suggested that this rewriting is somewhat unnatural,
124
and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
125
.PP
126
As an example of the power of disambiguating rules, consider a fragment from a programming
127
language involving an ``if-then-else'' construction:
128
.DS
129
stat	:	IF  \'(\'  cond  \')\'  stat
130
	|	IF  \'(\'  cond  \')\'  stat  ELSE  stat
131
	;
132
.DE
133
In these rules,
134
.I IF
135
and
136
.I ELSE
137
are tokens,
138
.I cond
139
is a nonterminal symbol describing
140
conditional (logical) expressions, and
141
.I stat
142
is a nonterminal symbol describing statements.
143
The first rule will be called the
144
.ul
145
simple-if
146
rule, and the
147
second the
148
.ul
149
if-else
150
rule.
151
.PP
152
These two rules form an ambiguous construction, since input of the form
153
.DS
154
IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
155
.DE
156
can be structured according to these rules in two ways:
157
.DS
158
IF  (  C1  )  {
159
	IF  (  C2  )  S1
160
	}
161
ELSE  S2
162
.DE
163
or
164
.DS
165
IF  (  C1  )  {
166
	IF  (  C2  )  S1
167
	ELSE  S2
168
	}
169
.DE
170
The second interpretation is the one given in most programming languages
171
having this construct.
172
Each
173
.I ELSE
174
is associated with the last preceding
175
``un-\fIELSE'\fRd''
176
.I IF .
177
In this example, consider the situation where the parser has seen
178
.DS
179
IF  (  C1  )  IF  (  C2  )  S1
180
.DE
181
and is looking at the
182
.I ELSE .
183
It can immediately
184
reduce
185
by the simple-if rule to get
186
.DS
187
IF  (  C1  )  stat
188
.DE
189
and then read the remaining input,
190
.DS
191
ELSE  S2
192
.DE
193
and reduce
194
.DS
195
IF  (  C1  )  stat  ELSE  S2
196
.DE
197
by the if-else rule.
198
This leads to the first of the above groupings of the input.
199
.PP
200
On the other hand, the
201
.I ELSE
202
may be shifted,
203
.I S2
204
read, and then the right hand portion of
205
.DS
206
IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2
207
.DE
208
can be reduced by the if-else rule to get
209
.DS
210
IF  (  C1  )  stat
211
.DE
212
which can be reduced by the simple-if rule.
213
This leads to the second of the above groupings of the input, which
214
is usually desired.
215
.PP
216
Once again the parser can do two valid things \- there is a shift/reduce conflict.
217
The application of disambiguating rule 1 tells the parser to shift in this case,
218
which leads to the desired grouping.
219
.PP
220
This shift/reduce conflict arises only when there is a particular current input symbol,
221
.I ELSE ,
222
and particular inputs already seen, such as
223
.DS
224
IF  (  C1  )  IF  (  C2  )  S1
225
.DE
226
In general, there may be many conflicts, and each one
227
will be associated with an input symbol and
228
a set of previously read inputs.
229
The previously read inputs are characterized by the
230
state
231
of the parser.
232
.PP
233
The conflict messages of Yacc are best understood
234
by examining the verbose (\fB\-v\fR) option output file.
235
For example, the output corresponding to the above
236
conflict state might be:
237
.DS L
238
23: shift/reduce conflict (shift 45, reduce 18) on ELSE
239
240
state 23
241
242
	  stat  :  IF  (  cond  )  stat\_         (18)
243
	  stat  :  IF  (  cond  )  stat\_ELSE  stat
244
245
	 ELSE     shift 45
246
	 .        reduce 18
247
248
.DE
249
The first line describes the conflict, giving the state and the input symbol.
250
The ordinary state description follows, giving
251
the grammar rules active in the state, and the parser actions.
252
Recall that the underline marks the
253
portion of the grammar rules which has been seen.
254
Thus in the example, in state 23 the parser has seen input corresponding
255
to
256
.DS
257
IF  (  cond  )  stat
258
.DE
259
and the two grammar rules shown are active at this time.
260
The parser can do two possible things.
261
If the input symbol is
262
.I ELSE ,
263
it is possible to shift into state
264
45.
265
State 45 will have, as part of its description, the line
266
.DS
267
stat  :  IF  (  cond  )  stat  ELSE\_stat
268
.DE
269
since the
270
.I ELSE
271
will have been shifted in this state.
272
Back in state 23, the alternative action, described by ``\fB.\fR'',
273
is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
274
in this case, if the input symbol is not
275
.I ELSE ,
276
the parser reduces by grammar rule 18:
277
.DS
278
stat  :  IF  \'(\'  cond  \')\'  stat
279
.DE
280
Once again, notice that the numbers following ``shift'' commands refer to other states,
281
while the numbers following ``reduce'' commands refer to grammar
282
rule numbers.
283
In the
284
.I y.output
285
file, the rule numbers are printed after those rules which can be reduced.
286
In most one states, there will be at most reduce action possible in the
287
state, and this will be the default command.
288
The user who encounters unexpected shift/reduce conflicts will probably want to
289
look at the verbose output to decide whether the default actions are appropriate.
290
In really tough cases, the user might need to know more about
291
the behavior and construction of the parser than can be covered here.
292
In this case, one of the theoretical references
293
.[
294
Aho Johnson Surveys Parsing
295
.]
296
.[
297
Aho Johnson Ullman Deterministic Ambiguous
298
.]
299
.[
300
Aho Ullman Principles Design
301
.]
302
might be consulted; the services of a local guru might also be appropriate.
(-)/usr/src/share/doc/psd.new/15.yacc/ss6 (+146 lines)
Line 0 Link Here
1
.\"	@(#)ss6	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
6: Precedence
5
.PP
6
There is one common situation
7
where the rules given above for resolving conflicts are not sufficient;
8
this is in the parsing of arithmetic expressions.
9
Most of the commonly used constructions for arithmetic expressions can be naturally
10
described by the notion of
11
.I precedence
12
levels for operators, together with information about left
13
or right associativity.
14
It turns out that ambiguous grammars with appropriate disambiguating rules
15
can be used to create parsers that are faster and easier to
16
write than parsers constructed from unambiguous grammars.
17
The basic notion is to write grammar rules
18
of the form
19
.DS
20
expr  :  expr  OP  expr
21
.DE
22
and
23
.DS
24
expr  :  UNARY  expr
25
.DE
26
for all binary and unary operators desired.
27
This creates a very ambiguous grammar, with many parsing conflicts.
28
As disambiguating rules, the user specifies the precedence, or binding
29
strength, of all the operators, and the associativity
30
of the binary operators.
31
This information is sufficient to allow Yacc to resolve the parsing conflicts
32
in accordance with these rules, and construct a parser that realizes the desired
33
precedences and associativities.
34
.PP
35
The precedences and associativities are attached to tokens in the declarations section.
36
This is done by a series of lines beginning with a Yacc keyword: %left, %right,
37
or %nonassoc, followed by a list of tokens.
38
All of the tokens on the same line are assumed to have the same precedence level
39
and associativity; the lines are listed in
40
order of increasing precedence or binding strength.
41
Thus,
42
.DS
43
%left  \'+\'  \'\-\'
44
%left  \'*\'  \'/\'
45
.DE
46
describes the precedence and associativity of the four arithmetic operators.
47
Plus and minus are left associative, and have lower precedence than
48
star and slash, which are also left associative.
49
The keyword %right is used to describe right associative operators,
50
and the keyword %nonassoc is used to describe operators, like
51
the operator .LT. in Fortran, that may not associate with themselves; thus,
52
.DS
53
A  .LT.  B  .LT.  C
54
.DE
55
is illegal in Fortran, and such an operator would be described with the keyword
56
%nonassoc in Yacc.
57
As an example of the behavior of these declarations, the description
58
.DS
59
%right  \'=\'
60
%left  \'+\'  \'\-\'
61
%left  \'*\'  \'/\'
62
63
%%
64
65
expr	:	expr  \'=\'  expr
66
	|	expr  \'+\'  expr
67
	|	expr  \'\-\'  expr
68
	|	expr  \'*\'  expr
69
	|	expr  \'/\'  expr
70
	|	NAME
71
	;
72
.DE
73
might be used to structure the input
74
.DS
75
a  =  b  =  c*d  \-  e  \-  f*g
76
.DE
77
as follows:
78
.DS
79
a = ( b = ( ((c*d)\-e) \- (f*g) ) )
80
.DE
81
When this mechanism is used,
82
unary operators must, in general, be given a precedence.
83
Sometimes a unary operator and a binary operator
84
have the same symbolic representation, but different precedences.
85
An example is unary and binary \'\-\'; unary minus may be given the same
86
strength as multiplication, or even higher, while binary minus has a lower strength than
87
multiplication.
88
The keyword, %prec, changes the precedence level associated with a particular grammar rule.
89
%prec appears immediately after the body of the grammar rule, before the action or closing semicolon,
90
and is followed by a token name or literal.
91
It
92
causes the precedence of the grammar rule to become that of the following token name or literal.
93
For example, to make unary minus have the same precedence as multiplication the rules might resemble:
94
.DS
95
%left  \'+\'  \'\-\'
96
%left  \'*\'  \'/\'
97
98
%%
99
100
expr	:	expr  \'+\'  expr
101
	|	expr  \'\-\'  expr
102
	|	expr  \'*\'  expr
103
	|	expr  \'/\'  expr
104
	|	\'\-\'  expr      %prec  \'*\'
105
	|	NAME
106
	;
107
.DE
108
.PP
109
A token declared
110
by %left, %right, and %nonassoc need not be, but may be, declared by %token as well.
111
.PP
112
The precedences and associativities are used by Yacc to
113
resolve parsing conflicts; they give rise to disambiguating rules.
114
Formally, the rules work as follows:
115
.IP 1.
116
The precedences and associativities are recorded for those tokens and literals
117
that have them.
118
.IP 2.
119
A precedence and associativity is associated with each grammar rule; it is the precedence
120
and associativity of the last token or literal in the body of the rule.
121
If the %prec construction is used, it overrides this default.
122
Some grammar rules may have no precedence and associativity associated with them.
123
.IP 3.
124
When there is a reduce/reduce conflict, or there is a shift/reduce conflict
125
and either the input symbol or the grammar rule has no precedence and associativity,
126
then the two disambiguating rules given at the beginning of the section are used,
127
and the conflicts are reported.
128
.IP 4.
129
If there is a shift/reduce conflict, and both the grammar rule and the input character
130
have precedence and associativity associated with them, then the conflict is resolved
131
in favor of the action (shift or reduce) associated with the higher precedence.
132
If the precedences are the same, then the associativity is used; left
133
associative implies reduce, right associative implies shift, and nonassociating
134
implies error.
135
.PP
136
Conflicts resolved by precedence are not counted in the number of shift/reduce and reduce/reduce
137
conflicts reported by Yacc.
138
This means that mistakes in the specification of precedences may
139
disguise errors in the input grammar; it is a good idea to be sparing
140
with precedences, and use them in an essentially ``cookbook'' fashion,
141
until some experience has been gained.
142
The
143
.I y.output
144
file
145
is very useful in deciding whether the parser is actually doing
146
what was intended.
(-)/usr/src/share/doc/psd.new/15.yacc/ss7 (+124 lines)
Line 0 Link Here
1
.\"	@(#)ss7	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
7: Error Handling
5
.PP
6
Error handling is an extremely difficult area, and many of the problems are semantic ones.
7
When an error is found, for example, it may be necessary to reclaim parse tree storage,
8
delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
9
.PP
10
It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
11
scanning the input to find further syntax errors.
12
This leads to the problem of getting the parser ``restarted'' after an error.
13
A general class of algorithms to do this involves discarding a number of tokens
14
from the input string, and attempting to adjust the parser so that input can continue.
15
.PP
16
To allow the user some control over this process,
17
Yacc provides a simple, but reasonably general, feature.
18
The token name ``error'' is reserved for error handling.
19
This name can be used in grammar rules;
20
in effect, it suggests places where errors are expected, and recovery might take place.
21
The parser pops its stack until it enters a state where the token ``error'' is legal.
22
It then behaves as if the token ``error'' were the current lookahead token,
23
and performs the action encountered.
24
The lookahead token is then reset to the token that caused the error.
25
If no special error rules have been specified, the processing halts when an error is detected.
26
.PP
27
In order to prevent a cascade of error messages, the parser, after
28
detecting an error, remains in error state until three tokens have been successfully
29
read and shifted.
30
If an error is detected when the parser is already in error state,
31
no message is given, and the input token is quietly deleted.
32
.PP
33
As an example, a rule of the form
34
.DS
35
stat	:	error
36
.DE
37
would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
38
in which the error was seen.
39
More precisely, the parser will
40
scan ahead, looking for three tokens that might legally follow
41
a statement, and start processing at the first of these; if
42
the beginnings of statements are not sufficiently distinctive, it may make a
43
false start in the middle of a statement, and end up reporting a
44
second error where there is in fact no error.
45
.PP
46
Actions may be used with these special error rules.
47
These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
48
.PP
49
Error rules such as the above are very general, but difficult to control.
50
Somewhat easier are rules such as
51
.DS
52
stat	:	error  \';\'
53
.DE
54
Here, when there is an error, the parser attempts to skip over the statement, but
55
will do so by skipping to the next \';\'.
56
All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
57
When the \';\' is seen, this rule will be reduced, and any ``cleanup''
58
action associated with it performed.
59
.PP
60
Another form of error rule arises in interactive applications, where
61
it may be desirable to permit a line to be reentered after an error.
62
A possible error rule might be
63
.DS
64
input	:	error  \'\en\'  {  printf( "Reenter last line: " );  }  input
65
			{	$$  =  $4;  }
66
.DE
67
There is one potential difficulty with this approach;
68
the parser must correctly process three input tokens before it
69
admits that it has correctly resynchronized after the error.
70
If the reentered line contains an error
71
in the first two tokens, the parser deletes the offending tokens,
72
and gives no message; this is clearly unacceptable.
73
For this reason, there is a mechanism that
74
can be used to force the parser
75
to believe that an error has been fully recovered from.
76
The statement
77
.DS
78
yyerrok ;
79
.DE
80
in an action
81
resets the parser to its normal mode.
82
The last example is better written
83
.DS
84
input	:	error  \'\en\'
85
			{	yyerrok;
86
				printf( "Reenter last line: " );   }
87
		input
88
			{	$$  =  $4;  }
89
	;
90
.DE
91
.PP
92
As mentioned above, the token seen immediately
93
after the ``error'' symbol is the input token at which the
94
error was discovered.
95
Sometimes, this is inappropriate; for example, an
96
error recovery action might
97
take upon itself the job of finding the correct place to resume input.
98
In this case,
99
the previous lookahead token must be cleared.
100
The statement
101
.DS
102
yyclearin ;
103
.DE
104
in an action will have this effect.
105
For example, suppose the action after error
106
were to call some sophisticated resynchronization routine,
107
supplied by the user, that attempted to advance the input to the
108
beginning of the next valid statement.
109
After this routine was called, the next token returned by yylex would presumably
110
be the first token in a legal statement;
111
the old, illegal token must be discarded, and the error state reset.
112
This could be done by a rule like
113
.DS
114
stat	:	error 
115
			{	resynch();
116
				yyerrok ;
117
				yyclearin ;   }
118
	;
119
.DE
120
.PP
121
These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
122
from many errors;
123
moreover, the user can get control to deal with
124
the error actions required by other portions of the program.
(-)/usr/src/share/doc/psd.new/15.yacc/ss8 (+93 lines)
Line 0 Link Here
1
.\"	@(#)ss8	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
8: The Yacc Environment
5
.PP
6
When the user inputs a specification
7
to Yacc, the output is a file of C programs, called
8
.I y.tab.c
9
on most
10
systems
11
(due to local file system conventions, the names may differ from
12
installation to installation).
13
The function produced by Yacc is called
14
.I yyparse \|;
15
it is an integer valued function.
16
When it is called, it in turn repeatedly calls
17
.I yylex ,
18
the lexical analyzer
19
supplied by the user (see Section 3)
20
to obtain input tokens.
21
Eventually, either an error is detected, in which case
22
(if no error recovery is possible)
23
.I yyparse
24
returns the value 1,
25
or the lexical analyzer returns the endmarker token
26
and the parser accepts.
27
In this case,
28
.I yyparse
29
returns the value 0.
30
.PP
31
The user must provide a certain amount of environment for this
32
parser in order to obtain a working program.
33
For example, as with every C program, a program called
34
.I main
35
must be defined, that eventually calls
36
.I yyparse .
37
In addition, a routine called
38
.I yyerror
39
prints a message
40
when a syntax error is detected.
41
.PP
42
These two routines must be supplied in one form or another by the
43
user.
44
To ease the initial effort of using Yacc, a library has been
45
provided with default versions of
46
.I main
47
and
48
.I yyerror .
49
The name of this library is system dependent;
50
on many systems the library is accessed by a
51
.B \-ly
52
argument to the loader.
53
To show the triviality of these default programs, the source is
54
given below:
55
.DS
56
main(){
57
	return( yyparse() );
58
	}
59
.DE
60
and
61
.DS
62
# include <stdio.h>
63
64
yyerror(s) char *s; {
65
	fprintf( stderr, "%s\en", s );
66
	}
67
.DE
68
The argument to
69
.I yyerror
70
is a string containing an error message, usually
71
the string ``syntax error''.
72
The average application will want to do better than this.
73
Ordinarily, the program should keep track of the input line number, and print it
74
along with the message when a syntax error is detected.
75
The external integer variable
76
.I yychar
77
contains the lookahead token number at the time the error was detected;
78
this may be of some interest in giving better diagnostics.
79
Since the
80
.I main
81
program is probably supplied by the user (to read arguments, etc.)
82
the Yacc library is useful only in small
83
projects, or in the earliest stages of larger ones.
84
.PP
85
The external integer variable
86
.I yydebug
87
is normally set to 0.
88
If it is set to a nonzero value, the parser will output a
89
verbose description of its actions, including
90
a discussion of which input symbols have been read, and
91
what the parser actions are.
92
Depending on the operating environment,
93
it may be possible to set this variable by using a debugging system.
(-)/usr/src/share/doc/psd.new/15.yacc/ss9 (+169 lines)
Line 0 Link Here
1
.\"	@(#)ss9	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
9: Hints for Preparing Specifications
5
.PP
6
This section contains miscellaneous hints on preparing efficient, easy to change,
7
and clear specifications.
8
The individual subsections are more or less
9
independent.
10
.SH
11
Input Style
12
.PP
13
It is difficult to
14
provide rules with substantial actions
15
and still have a readable specification file.
16
The following style hints owe much to Brian Kernighan.
17
.IP a.
18
Use all capital letters for token names, all lower case letters for
19
nonterminal names.
20
This rule comes under the heading of ``knowing who to blame when
21
things go wrong.''
22
.IP b.
23
Put grammar rules and actions on separate lines.
24
This allows either to be changed without
25
an automatic need to change the other.
26
.IP c.
27
Put all rules with the same left hand side together.
28
Put the left hand side in only once, and let all
29
following rules begin with a vertical bar.
30
.IP d.
31
Put a semicolon only after the last rule with a given left hand side,
32
and put the semicolon on a separate line.
33
This allows new rules to be easily added.
34
.IP e.
35
Indent rule bodies by two tab stops, and action bodies by three
36
tab stops.
37
.PP
38
The example in Appendix A is written following this style, as are
39
the examples in the text of this paper (where space permits).
40
The user must make up his own mind about these stylistic questions;
41
the central problem, however, is to make the rules visible through
42
the morass of action code.
43
.SH
44
Left Recursion
45
.PP
46
The algorithm used by the Yacc parser encourages so called ``left recursive''
47
grammar rules: rules of the form
48
.DS
49
name	:	name  rest_of_rule  ;
50
.DE
51
These rules frequently arise when
52
writing specifications of sequences and lists:
53
.DS
54
list	:	item
55
	|	list  \',\'  item
56
	;
57
.DE
58
and
59
.DS
60
seq	:	item
61
	|	seq  item
62
	;
63
.DE
64
In each of these cases, the first rule
65
will be reduced for the first item only, and the second rule
66
will be reduced for the second and all succeeding items.
67
.PP
68
With right recursive rules, such as
69
.DS
70
seq	:	item
71
	|	item  seq
72
	;
73
.DE
74
the parser would be a bit bigger, and the items would be seen, and reduced,
75
from right to left.
76
More seriously, an internal stack in the parser
77
would be in danger of overflowing if a very long sequence were read.
78
Thus, the user should use left recursion wherever reasonable.
79
.PP
80
It is worth considering whether a sequence with zero
81
elements has any meaning, and if so, consider writing
82
the sequence specification with an empty rule:
83
.DS
84
seq	:	/* empty */
85
	|	seq  item
86
	;
87
.DE
88
Once again, the first rule would always be reduced exactly once, before the
89
first item was read,
90
and then the second rule would be reduced once for each item read.
91
Permitting empty sequences
92
often leads to increased generality.
93
However, conflicts might arise if Yacc is asked to decide
94
which empty sequence it has seen, when it hasn't seen enough to
95
know!
96
.SH
97
Lexical Tie-ins
98
.PP
99
Some lexical decisions depend on context.
100
For example, the lexical analyzer might want to
101
delete blanks normally, but not within quoted strings.
102
Or names might be entered into a symbol table in declarations,
103
but not in expressions.
104
.PP
105
One way of handling this situation is
106
to create a global flag that is
107
examined by the lexical analyzer, and set by actions.
108
For example, suppose a program
109
consists of 0 or more declarations, followed by 0 or more statements.
110
Consider:
111
.DS
112
%{
113
	int dflag;
114
%}
115
  ...  other declarations ...
116
117
%%
118
119
prog	:	decls  stats
120
	;
121
122
decls	:	/* empty */
123
			{	dflag = 1;  }
124
	|	decls  declaration
125
	;
126
127
stats	:	/* empty */
128
			{	dflag = 0;  }
129
	|	stats  statement
130
	;
131
132
    ...  other rules ...
133
.DE
134
The flag
135
.I dflag
136
is now 0 when reading statements, and 1 when reading declarations,
137
.ul
138
except for the first token in the first statement.
139
This token must be seen by the parser before it can tell that
140
the declaration section has ended and the statements have
141
begun.
142
In many cases, this single token exception does not
143
affect the lexical scan.
144
.PP
145
This kind of ``backdoor'' approach can be elaborated
146
to a noxious degree.
147
Nevertheless, it represents a way of doing some things
148
that are difficult, if not impossible, to
149
do otherwise.
150
.SH
151
Reserved Words
152
.PP
153
Some programming languages
154
permit the user to
155
use words like ``if'', which are normally reserved,
156
as label or variable names, provided that such use does not
157
conflict with the legal use of these names in the programming language.
158
This is extremely hard to do in the framework of Yacc;
159
it is difficult to pass information to the lexical analyzer
160
telling it ``this instance of `if' is a keyword, and that instance is a variable''.
161
The user can make a stab at it, using the
162
mechanism described in the last subsection,
163
but it is difficult.
164
.PP
165
A number of ways of making this easier are under advisement.
166
Until then, it is better that the keywords be
167
.I reserved \|;
168
that is, be forbidden for use as variable names.
169
There are powerful stylistic reasons for preferring this, anyway.
(-)/usr/src/share/doc/psd.new/15.yacc/ssA (+184 lines)
Line 0 Link Here
1
.\"	@(#)ssA	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
10: Advanced Topics
5
.PP
6
This section discusses a number of advanced features
7
of Yacc.
8
.SH
9
Simulating Error and Accept in Actions
10
.PP
11
The parsing actions of error and accept can be simulated
12
in an action by use of macros YYACCEPT and YYERROR.
13
YYACCEPT causes
14
.I yyparse
15
to return the value 0;
16
YYERROR causes
17
the parser to behave as if the current input symbol
18
had been a syntax error;
19
.I yyerror
20
is called, and error recovery takes place.
21
These mechanisms can be used to simulate parsers
22
with multiple endmarkers or context-sensitive syntax checking.
23
.SH
24
Accessing Values in Enclosing Rules.
25
.PP
26
An action may refer to values
27
returned by actions to the left of the current rule.
28
The mechanism is simply the same as with ordinary actions,
29
a dollar sign followed by a digit, but in this case the
30
digit may be 0 or negative.
31
Consider
32
.DS
33
sent	:	adj  noun  verb  adj  noun
34
			{  \fIlook at the sentence\fR . . .  }
35
	;
36
37
adj	:	THE		{	$$ = THE;  }
38
	|	YOUNG	{	$$ = YOUNG;  }
39
	. . .
40
	;
41
42
noun	:	DOG
43
			{	$$ = DOG;  }
44
	|	CRONE
45
			{	if( $0 == YOUNG ){
46
					printf( "what?\en" );
47
					}
48
				$$ = CRONE;
49
				}
50
	;
51
	. . .
52
.DE
53
In the action following the word CRONE, a check is made that the
54
preceding token shifted was not YOUNG.
55
Obviously, this is only possible when a great deal is known about
56
what might precede the symbol
57
.I noun
58
in the input.
59
There is also a distinctly unstructured flavor about this.
60
Nevertheless, at times this mechanism will save a great
61
deal of trouble, especially when a few combinations are to
62
be excluded from an otherwise regular structure.
63
.SH
64
Support for Arbitrary Value Types
65
.PP
66
By default, the values returned by actions and the lexical analyzer are integers.
67
Yacc can also support
68
values of other types, including structures.
69
In addition, Yacc keeps track of the types, and inserts
70
appropriate union member names so that the resulting parser will
71
be strictly type checked.
72
The Yacc value stack (see Section 4)
73
is declared to be a
74
.I union
75
of the various types of values desired.
76
The user declares the union, and associates union member names
77
to each token and nonterminal symbol having a value.
78
When the value is referenced through a $$ or $n construction,
79
Yacc will automatically insert the appropriate union name, so that
80
no unwanted conversions will take place.
81
In addition, type checking commands such as
82
.I Lint\|
83
.[
84
Johnson Lint Checker 1273
85
.]
86
will be far more silent.
87
.PP
88
There are three mechanisms used to provide for this typing.
89
First, there is a way of defining the union; this must be
90
done by the user since other programs, notably the lexical analyzer,
91
must know about the union member names.
92
Second, there is a way of associating a union member name with tokens
93
and nonterminals.
94
Finally, there is a mechanism for describing the type of those
95
few values where Yacc can not easily determine the type.
96
.PP
97
To declare the union, the user includes in the declaration section:
98
.DS
99
%union  {
100
	body of union ...
101
	}
102
.DE
103
This declares the Yacc value stack,
104
and the external variables
105
.I yylval
106
and
107
.I yyval ,
108
to have type equal to this union.
109
If Yacc was invoked with the
110
.B \-d
111
option, the union declaration
112
is copied onto the
113
.I y.tab.h
114
file.
115
Alternatively,
116
the union may be declared in a header file, and a typedef
117
used to define the variable YYSTYPE to represent
118
this union.
119
Thus, the header file might also have said:
120
.DS
121
typedef union {
122
	body of union ...
123
	} YYSTYPE;
124
.DE
125
The header file must be included in the declarations
126
section, by use of %{ and %}.
127
.PP
128
Once YYSTYPE is defined,
129
the union member names must be associated
130
with the various terminal and nonterminal names.
131
The construction
132
.DS
133
< name >
134
.DE
135
is used to indicate a union member name.
136
If this follows
137
one of the
138
keywords %token,
139
%left, %right, and %nonassoc,
140
the union member name is associated with the tokens listed.
141
Thus, saying
142
.DS
143
%left  <optype>  \'+\'  \'\-\'
144
.DE
145
will cause any reference to values returned by these two tokens to be
146
tagged with
147
the union member name
148
.I optype .
149
Another keyword, %type, is
150
used similarly to associate
151
union member names with nonterminals.
152
Thus, one might say
153
.DS
154
%type  <nodetype>  expr  stat
155
.DE
156
.PP
157
There remain a couple of cases where these mechanisms are insufficient.
158
If there is an action within a rule, the value returned
159
by this action has no
160
.I "a priori"
161
type.
162
Similarly, reference to left context values (such as $0 \- see the
163
previous subsection ) leaves Yacc with no easy way of knowing the type.
164
In this case, a type can be imposed on the reference by inserting
165
a union member name, between < and >, immediately after
166
the first $.
167
An example of this usage is
168
.DS
169
rule	:	aaa  {  $<intval>$  =  3;  } bbb
170
			{	fun( $<intval>2, $<other>0 );  }
171
	;
172
.DE
173
This syntax has little to recommend it, but the situation arises rarely.
174
.PP
175
A sample specification is given in Appendix C.
176
The facilities in this subsection are not triggered until they are used:
177
in particular, the use of %type will turn on these mechanisms.
178
When they are used, there is a fairly strict level of checking.
179
For example, use of $n or $$ to refer to something with no defined type
180
is diagnosed.
181
If these facilities are not triggered, the Yacc value stack is used to
182
hold
183
.I int' s,
184
as was true historically.
(-)/usr/src/share/doc/psd.new/15.yacc/ssB (+26 lines)
Line 0 Link Here
1
.\"	@(#)ssB	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
11: Acknowledgements
5
.PP
6
Yacc owes much to a
7
most stimulating collection of users, who have goaded
8
me beyond my inclination, and frequently beyond my
9
ability, in their endless search for ``one more feature''.
10
Their irritating unwillingness to learn how to
11
do things my way has usually led to my doing things their way;
12
most of the time, they have been right.
13
B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna,
14
M. E. Lesk,
15
and A. Snyder will recognize some of their ideas in the current version
16
of Yacc.
17
C. B. Haley contributed to the error recovery algorithm.
18
D. M. Ritchie, B. W. Kernighan, and M. O. Harris helped translate this document into English.
19
Al Aho also deserves special credit for bringing
20
the mountain to Mohammed, and other favors.
21
.SG "MH-1273-SCJ-unix"
22
.bp
23
.[
24
$LIST$
25
.]
26
.bp
(-)/usr/src/share/doc/psd.new/15.yacc/ssa (+113 lines)
Line 0 Link Here
1
.\"	@(#)ssa	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
Appendix A:  A Simple Example
5
.PP
6
This example gives the complete Yacc specification for a small desk calculator;
7
the desk calculator has 26 registers, labeled ``a'' through ``z'', and accepts
8
arithmetic expressions made up of the operators +, \-, *, /,
9
% (mod operator), & (bitwise and), | (bitwise or), and assignment.
10
If an expression at the top level is an assignment, the value is not
11
printed; otherwise it is.
12
As in C, an integer that begins with 0 (zero) is assumed to be octal;
13
otherwise, it is assumed to be decimal.
14
.PP
15
As an example of a Yacc specification, the desk calculator
16
does a reasonable job of showing how precedences and ambiguities
17
are used, and demonstrating simple error recovery.
18
The major oversimplifications are that the
19
lexical analysis phase is much simpler than for most applications, and the
20
output is produced immediately, line by line.
21
Note the way that decimal and octal integers are read in by the grammar rules;
22
This job is probably better done by the lexical analyzer.
23
.sp
24
.nf
25
.ta .5i 1i 1.5i 2i 2.5i
26
27
%{
28
#  include  <stdio.h>
29
#  include  <ctype.h>
30
31
int  regs[26];
32
int  base;
33
34
%}
35
36
%start  list
37
38
%token  DIGIT  LETTER
39
40
%left  \'|\'
41
%left  \'&\'
42
%left  \'+\'  \'\-\'
43
%left  \'*\'  \'/\'  \'%\'
44
%left  UMINUS      /*  supplies  precedence  for  unary  minus  */
45
46
%%      /*  beginning  of  rules  section  */
47
48
list	:	/*  empty  */
49
	|	list  stat  \'\en\'
50
	|	list  error  \'\en\'
51
			{	yyerrok;  }
52
	;
53
54
stat	:	expr
55
			{	printf( "%d\en", $1 );  }
56
	|	LETTER  \'=\'  expr
57
			{	regs[$1]  =  $3;  }
58
	;
59
60
expr	:	\'(\'  expr  \')\'
61
			{	$$  =  $2;  }
62
	|	expr  \'+\'  expr
63
			{	$$  =  $1  +  $3;  }
64
	|	expr  \'\-\'  expr
65
			{	$$  =  $1  \-  $3;  }
66
	|	expr  \'*\'  expr
67
			{	$$  =  $1  *  $3;  }
68
	|	expr  \'/\'  expr
69
			{	$$  =  $1  /  $3;  }
70
	|	expr  \'%\'  expr
71
			{	$$  =  $1  %  $3;  }
72
	|	expr  \'&\'  expr
73
			{	$$  =  $1  &  $3;  }
74
	|	expr  \'|\'  expr
75
			{	$$  =  $1  |  $3;  }
76
	|	\'\-\'  expr        %prec  UMINUS
77
			{	$$  =  \-  $2;  }
78
	|	LETTER
79
			{	$$  =  regs[$1];  }
80
	|	number          
81
	;
82
83
number	:	DIGIT
84
			{	$$ = $1;    base  =  ($1==0)  ?  8  :  10;  }
85
	|	number  DIGIT
86
			{	$$  =  base * $1  +  $2;  }
87
	;
88
89
%%      /*  start  of  programs  */
90
91
yylex() {		/*  lexical  analysis  routine  */
92
              /*  returns  LETTER  for  a  lower  case  letter,  yylval = 0  through  25  */
93
              /*  return  DIGIT  for  a  digit,  yylval = 0  through  9  */
94
              /*  all  other  characters  are  returned  immediately  */
95
96
	int  c;
97
98
	while(  (c=getchar())  ==  \' \'  )  {	/*  skip  blanks  */  }
99
100
	/*  c  is  now  nonblank  */
101
102
	if(  islower(  c  )  )  {	
103
		yylval  =  c  \-  \'a\';
104
		return  (  LETTER  );
105
		}
106
	if(  isdigit(  c  )  )  {	
107
		yylval  =  c  \-  \'0\';
108
		return(  DIGIT  );
109
		}
110
	return(  c  );
111
	}
112
.fi
113
.bp
(-)/usr/src/share/doc/psd.new/15.yacc/ssb (+110 lines)
Line 0 Link Here
1
.\"	@(#)ssb	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
Appendix B: Yacc Input Syntax
5
.PP
6
This Appendix has a description of the Yacc input syntax, as a Yacc specification.
7
Context dependencies, etc., are not considered.
8
Ironically, the Yacc input specification language
9
is most naturally specified as an LR(2) grammar; the sticky
10
part comes when an identifier is seen in a rule, immediately
11
following an action.
12
If this identifier is followed by a colon, it is the start of the
13
next rule; otherwise
14
it is a continuation of the current rule, which just happens to have
15
an action embedded in it.
16
As implemented, the lexical analyzer looks
17
ahead after seeing an identifier, and
18
decide whether the next token (skipping blanks, newlines, comments, etc.)
19
is a colon.
20
If so, it returns the token C_IDENTIFIER.
21
Otherwise, it returns IDENTIFIER.
22
Literals (quoted strings) are also returned as IDENTIFIERS,
23
but never as part of C_IDENTIFIERs.
24
.sp
25
.nf
26
.ta .6i 1.2i 1.8i 2.4i 3i 3.6i
27
28
            /*  grammar  for  the  input  to  Yacc  */
29
30
	/*  basic  entities  */
31
%token	IDENTIFIER	/*   includes  identifiers   and  literals  */
32
%token	C_IDENTIFIER	/*    identifier  (but  not  literal)  followed  by  colon    */
33
%token	NUMBER		/*    [0-9]+    */
34
35
	/*  reserved  words:    %type  =>  TYPE,  %left  =>  LEFT,  etc.  */
36
37
%token	LEFT  RIGHT  NONASSOC  TOKEN  PREC  TYPE  START  UNION
38
39
%token	MARK	/*  the  %%  mark  */
40
%token	LCURL	/*  the  %{  mark  */
41
%token	RCURL	/*  the  %}  mark  */
42
43
	/*  ascii  character  literals  stand  for  themselves  */
44
45
%start	spec
46
47
%%
48
49
spec	:	defs  MARK  rules  tail
50
	;
51
52
tail	:	MARK	{    \fIIn  this  action,  eat  up  the  rest  of  the  file\fR    }
53
	|	/*  empty:  the  second  MARK  is  optional  */
54
	;
55
56
defs	:	/*  empty  */
57
	|	defs  def
58
	;
59
60
def	:	START  IDENTIFIER
61
	|	UNION  {  \fICopy union  definition  to  output\fR  }
62
	|	LCURL  {  \fICopy  C  code  to  output  file\fR   }  RCURL
63
	|	ndefs  rword  tag  nlist
64
	;
65
66
rword	:	TOKEN
67
	|	LEFT
68
	|	RIGHT
69
	|	NONASSOC
70
	|	TYPE
71
	;
72
73
tag	:	/*  empty:  union  tag  is  optional  */
74
	|	\'<\'  IDENTIFIER  \'>\'
75
	;
76
77
nlist	:	nmno
78
	|	nlist  nmno
79
	|	nlist  \',\'  nmno
80
	;
81
82
nmno	:	IDENTIFIER		/*  NOTE:  literal  illegal  with  %type  */
83
	|	IDENTIFIER  NUMBER      /*  NOTE:  illegal  with  %type  */
84
	;
85
86
	/*  rules  section  */
87
88
rules	:	C_IDENTIFIER  rbody  prec
89
	|	rules  rule
90
	;
91
92
rule	:	C_IDENTIFIER  rbody  prec
93
	|	'|'  rbody  prec
94
	;
95
96
rbody	:	/*  empty  */
97
	|	rbody  IDENTIFIER
98
	|	rbody  act
99
	;
100
101
act	:	\'{\'  {  \fICopy  action,  translate  $$,  etc.\fR  }  \'}\'
102
	;
103
104
prec	:	/*  empty  */
105
	|	PREC  IDENTIFIER
106
	|	PREC  IDENTIFIER  act
107
	|	prec  \';\'
108
	;
109
.fi
110
.bp
(-)/usr/src/share/doc/psd.new/15.yacc/ssc (+311 lines)
Line 0 Link Here
1
.\"	@(#)ssc	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
Appendix C: An Advanced Example
5
.PP
6
This Appendix gives an example of a grammar using some
7
of the advanced features discussed in Section 10.
8
The desk calculator example in Appendix A is
9
modified to provide a desk calculator that
10
does floating point interval arithmetic.
11
The calculator understands floating point
12
constants, the arithmetic operations +, \-, *, /,
13
unary \-, and = (assignment), and has 26 floating
14
point variables, ``a'' through ``z''.
15
Moreover, it also understands
16
.I intervals ,
17
written
18
.DS
19
	( x , y )
20
.DE
21
where
22
.I x
23
is less than or equal to
24
.I y .
25
There are 26 interval valued variables ``A'' through ``Z''
26
that may also be used.
27
The usage is similar to that in Appendix A; assignments
28
return no value, and print nothing, while expressions print
29
the (floating or interval) value.
30
.PP
31
This example explores a number of interesting features
32
of Yacc and C.
33
Intervals are represented by a structure, consisting of the
34
left and right endpoint values, stored as
35
.I double 's.
36
This structure is given a type name, INTERVAL, by using
37
.I typedef .
38
The Yacc value stack can also contain floating point scalars, and
39
integers (used to index into the arrays holding the variable values).
40
Notice that this entire strategy depends strongly on being able to
41
assign structures and unions in C.
42
In fact, many of the actions call functions that return structures
43
as well.
44
.PP
45
It is also worth noting the use of YYERROR to handle error conditions:
46
division by an interval containing 0, and an interval presented in
47
the wrong order.
48
In effect, the error recovery mechanism of Yacc is used to throw away the
49
rest of the offending line.
50
.PP
51
In addition to the mixing of types on the value stack,
52
this grammar also demonstrates an interesting use of syntax to
53
keep track of the type (e.g. scalar or interval) of intermediate
54
expressions.
55
Note that a scalar can be automatically promoted to an interval if
56
the context demands an interval value.
57
This causes a large number of conflicts when the grammar is run through
58
Yacc: 18 Shift/Reduce and 26 Reduce/Reduce.
59
The problem can be seen by looking at the two input lines:
60
.DS
61
	2.5 + ( 3.5 \- 4. )
62
.DE
63
and
64
.DS
65
	2.5 + ( 3.5 , 4. )
66
.DE
67
Notice that the 2.5 is to be used in an interval valued expression
68
in the second example, but this fact is not known until
69
the ``,'' is read; by this time, 2.5 is finished, and the parser cannot go back
70
and change its mind.
71
More generally, it might be necessary to look ahead an arbitrary number of
72
tokens to decide whether to convert a scalar to an interval.
73
This problem is evaded by having two rules for each binary interval
74
valued operator: one when the left operand is a scalar, and one when
75
the left operand is an interval.
76
In the second case, the right operand must be an interval,
77
so the conversion will be applied automatically.
78
Despite this evasion, there are still many cases where the
79
conversion may be applied or not, leading to the above
80
conflicts.
81
They are resolved by listing the rules that yield scalars first
82
in the specification file; in this way, the conflicts will
83
be resolved in the direction of keeping scalar
84
valued expressions scalar valued until they are forced to become
85
intervals.
86
.PP
87
This way of handling multiple types is very instructive, but not very general.
88
If there were many kinds of expression types, instead of just two,
89
the number of rules needed would increase dramatically, and the conflicts
90
even more dramatically.
91
Thus, while this example is instructive, it is better practice in a
92
more normal programming language environment to
93
keep the type information as part of the value, and not as part
94
of the grammar.
95
.PP
96
Finally, a word about the lexical analysis.
97
The only unusual feature is the treatment of floating point constants.
98
The C library routine
99
.I atof
100
is used to do the actual conversion from a character string
101
to a double precision value.
102
If the lexical analyzer detects an error,
103
it responds by returning a token that
104
is illegal in the grammar, provoking a syntax error
105
in the parser, and thence error recovery.
106
.DS L
107
108
%{
109
110
#  include  <stdio.h>
111
#  include  <ctype.h>
112
113
typedef  struct  interval  {
114
	double  lo,  hi;
115
	}  INTERVAL;
116
117
INTERVAL  vmul(),  vdiv();
118
119
double  atof();
120
121
double  dreg[ 26 ];
122
INTERVAL  vreg[ 26 ];
123
124
%}
125
126
%start    lines
127
128
%union    {
129
	int  ival;
130
	double  dval;
131
	INTERVAL  vval;
132
	}
133
134
%token  <ival>  DREG  VREG	/*  indices  into  dreg,  vreg  arrays  */
135
136
%token  <dval>  CONST		/*  floating  point  constant  */
137
138
%type  <dval>  dexp		/*  expression  */
139
140
%type  <vval>  vexp		/*  interval  expression  */
141
142
	/*  precedence  information  about  the  operators  */
143
144
%left	\'+\'  \'\-\'
145
%left	\'*\'  \'/\'
146
%left	UMINUS        /*  precedence  for  unary  minus  */
147
148
%%
149
150
lines	:	/*  empty  */
151
	|	lines  line
152
	;
153
154
line	:	dexp  \'\en\'
155
			{	printf(  "%15.8f\en",  $1  );  }
156
	|	vexp  \'\en\'
157
			{	printf(  "(%15.8f  ,  %15.8f  )\en",  $1.lo,  $1.hi  );  }
158
	|	DREG  \'=\'  dexp  \'\en\'
159
			{	dreg[$1]  =  $3;  }
160
	|	VREG  \'=\'  vexp  \'\en\'
161
			{	vreg[$1]  =  $3;  }
162
	|	error  \'\en\'
163
			{	yyerrok;  }
164
	;
165
166
dexp	:	CONST
167
	|	DREG
168
			{	$$  =  dreg[$1];  }
169
	|	dexp  \'+\'  dexp
170
			{	$$  =  $1  +  $3;  }
171
	|	dexp  \'\-\'  dexp
172
			{	$$  =  $1  \-  $3;  }
173
	|	dexp  \'*\'  dexp
174
			{	$$  =  $1  *  $3;  }
175
	|	dexp  \'/\'  dexp
176
			{	$$  =  $1  /  $3;  }
177
	|	\'\-\'  dexp	%prec  UMINUS
178
			{	$$  =  \- $2;  }
179
	|	\'(\'  dexp  \')\'
180
			{	$$  =  $2;  }
181
	;
182
183
vexp	:	dexp
184
			{	$$.hi  =  $$.lo  =  $1;  }
185
	|	\'(\'  dexp  \',\'  dexp  \')\'
186
			{
187
			$$.lo  =  $2;
188
			$$.hi  =  $4;
189
			if(  $$.lo  >  $$.hi  ){
190
				printf(  "interval  out  of  order\en"  );
191
				YYERROR;
192
				}
193
			}
194
	|	VREG
195
			{	$$  =  vreg[$1];    }
196
	|	vexp  \'+\'  vexp
197
			{	$$.hi  =  $1.hi  +  $3.hi;
198
				$$.lo  =  $1.lo  +  $3.lo;    }
199
	|	dexp  \'+\'  vexp
200
			{	$$.hi  =  $1  +  $3.hi;
201
				$$.lo  =  $1  +  $3.lo;    }
202
	|	vexp  \'\-\'  vexp
203
			{	$$.hi  =  $1.hi  \-  $3.lo;
204
				$$.lo  =  $1.lo  \-  $3.hi;    }
205
	|	dexp  \'\-\'  vexp
206
			{	$$.hi  =  $1  \-  $3.lo;
207
				$$.lo  =  $1  \-  $3.hi;    }
208
	|	vexp  \'*\'  vexp
209
			{	$$  =  vmul(  $1.lo,  $1.hi,  $3  );  }
210
	|	dexp  \'*\'  vexp
211
			{	$$  =  vmul(  $1,  $1,  $3  );  }
212
	|	vexp  \'/\'  vexp
213
			{	if(  dcheck(  $3  )  )  YYERROR;
214
				$$  =  vdiv(  $1.lo,  $1.hi,  $3  );  }
215
	|	dexp  \'/\'  vexp
216
			{	if(  dcheck(  $3  )  )  YYERROR;
217
				$$  =  vdiv(  $1,  $1,  $3  );  }
218
	|	\'\-\'  vexp	%prec  UMINUS
219
			{	$$.hi  =  \-$2.lo;    $$.lo  =  \-$2.hi;    }
220
	|	\'(\'  vexp  \')\'
221
			{	$$  =  $2;  }
222
	;
223
224
%%
225
226
#  define  BSZ  50        /*  buffer  size  for  floating  point  numbers  */
227
228
	/*  lexical  analysis  */
229
230
yylex(){
231
	register  c;
232
233
	while(  (c=getchar())  ==  \' \'  ){  /*  skip  over  blanks  */  }
234
235
	if(  isupper(  c  )  ){
236
		yylval.ival  =  c  \-  \'A\';
237
		return(  VREG  );
238
		}
239
	if(  islower(  c  )  ){
240
		yylval.ival  =  c  \-  \'a\';
241
		return(  DREG  );
242
		}
243
244
	if(  isdigit(  c  )  ||  c==\'.\'  ){
245
		/*  gobble  up  digits,  points,  exponents  */
246
247
		char  buf[BSZ+1],  *cp  =  buf;
248
		int  dot  =  0,  exp  =  0;
249
250
		for(  ;  (cp\-buf)<BSZ  ;  ++cp,c=getchar()  ){
251
252
			*cp  =  c;
253
			if(  isdigit(  c  )  )  continue;
254
			if(  c  ==  \'.\'  ){
255
				if(  dot++  ||  exp  )  return(  \'.\'  );    /*  will  cause  syntax  error  */
256
				continue;
257
				}
258
259
			if(  c  ==  \'e\'  ){
260
				if(  exp++  )  return(  \'e\'  );    /*  will  cause  syntax  error  */
261
				continue;
262
				}
263
264
			/*  end  of  number  */
265
			break;
266
			}
267
		*cp  =  \'\e0\';
268
		if(  (cp\-buf)  >=  BSZ  )  printf(  "constant  too  long:  truncated\en"  );
269
		else  ungetc(  c,  stdin  );    /*  push  back  last  char  read  */
270
		yylval.dval  =  atof(  buf  );
271
		return(  CONST  );
272
		}
273
	return(  c  );
274
	}
275
276
INTERVAL  hilo(  a,  b,  c,  d  )  double  a,  b,  c,  d;  {
277
	/*  returns  the  smallest  interval  containing  a,  b,  c,  and  d  */
278
	/*  used  by  *,  /  routines  */
279
	INTERVAL  v;
280
281
	if(  a>b  )  {  v.hi  =  a;    v.lo  =  b;  }
282
	else  {  v.hi  =  b;    v.lo  =  a;  }
283
284
	if(  c>d  )  {
285
		if(  c>v.hi  )  v.hi  =  c;
286
		if(  d<v.lo  )  v.lo  =  d;
287
		}
288
	else  {
289
		if(  d>v.hi  )  v.hi  =  d;
290
		if(  c<v.lo  )  v.lo  =  c;
291
		}
292
	return(  v  );
293
	}
294
295
INTERVAL  vmul(  a,  b,  v  )  double  a,  b;    INTERVAL  v;  {
296
	return(  hilo(  a*v.hi,  a*v.lo,  b*v.hi,  b*v.lo  )  );
297
	}
298
299
dcheck(  v  )  INTERVAL  v;  {
300
	if(  v.hi  >=  0.  &&  v.lo  <=  0.  ){
301
		printf(  "divisor  interval  contains  0.\en"  );
302
		return(  1  );
303
		}
304
	return(  0  );
305
	}
306
307
INTERVAL  vdiv(  a,  b,  v  )  double  a,  b;    INTERVAL  v;  {
308
	return(  hilo(  a/v.hi,  a/v.lo,  b/v.hi,  b/v.lo  )  );
309
	}
310
.DE
311
.bp
(-)/usr/src/share/doc/psd.new/15.yacc/ssd (+40 lines)
Line 0 Link Here
1
.\"	@(#)ssd	6.1 (Berkeley) 5/8/86
2
.\"
3
.SH
4
Appendix D: Old Features Supported but not Encouraged
5
.PP
6
This Appendix mentions synonyms and features which are supported for historical
7
continuity, but, for various reasons, are not encouraged.
8
.IP 1.
9
Literals may also be delimited by double quotes ``"''.
10
.IP 2.
11
Literals may be more than one character long.
12
If all the characters are alphabetic, numeric, or \_, the type number of the literal is defined,
13
just as if the literal did not have the quotes around it.
14
Otherwise, it is difficult to find the value for such literals.
15
.IP
16
The use of multi-character literals is likely to mislead those unfamiliar with
17
Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer.
18
.IP 3.
19
Most places where % is legal, backslash ``\e'' may be used.
20
In particular, \e\e is the same as %%, \eleft the same as %left, etc.
21
.IP 4.
22
There are a number of other synonyms:
23
.DS
24
%< is the same as %left
25
%> is the same as %right
26
%binary and %2 are the same as %nonassoc
27
%0 and %term are the same as %token
28
%= is the same as %prec
29
.DE
30
.IP 5.
31
Actions may also have the form
32
.DS
33
={ . . . }
34
.DE
35
and the curly braces can be dropped if the action is a
36
single C statement.
37
.IP 6.
38
C code between %{ and %} used to be permitted at the
39
head of the rules section, as well as in the
40
declaration section.

Return to bug 35345