A. BNF — The DRM

Appendix A

BNF

General Notes

Dylan syntax can be parsed with an LALR(1) grammar.

This appendix uses some special notation to make the presentation of the grammar more readable.

The _opt suffix means that the preceding item is optional.
A trailing ellipsis (…) is used in two different ways to signal possible repetition.
- If there is only one item on the line preceding the ellipsis, the item may appear one or more times.
- If more than one item precedes the ellipsis, the last of these items is designated a separator; the rest may appear one or more times, with the separator appearing after each occurrence but the last. (When only one item appears, the separator does not appear.)
Identifiers for grammar rules are written with uppercase letters when the identifier is used in the phrase grammar but defined in the lexical grammar.
The grammar does not use distinct identifiers for grammar rules that differ only in alphabetic case.

In the following grammar, some tokens are used multiple ways. For example the hyphen, -, is punctuation, a unary operator, and a binary operator; also, method is a begin-word and a define-body-word. In some parsing implementations such multiple meanings of a token may not be possible. However this is just an implementation issue since the meaning of the grammar is clear. method is used as punctuation in local-methods and method-definition; since method is not a core reserved word, this typically has to be implemented by accepting any macro-name and checking semantically that the word used is method. The grammar as presented is not obviously LALR(1), since the required changes would tend to obscure the readability for human beings (especially in macro definitions and case-body). The grammar can be made LALR(1) through well-known standard transformations implemented by most parser generators.

Lexical Notes

In the lexical grammar, the various elements that come together to form a single token on the right-hand sides of rules must not be separated by whitespace, so that the end result will be a single token. This is in contrast to the phrase grammar, where each element is already a complete token or a series of complete tokens.

Arbitrary whitespace is permitted between tokens, but it is required only as necessary to separate tokens that might otherwise blend together.

Case is not significant except within character and string literals. The grammars do not reflect this, using one case or the other, but it is still true.