Lexical Syntax

Dylan source code is a sequence of tokens. Whitespace is required between tokens if the tokens would otherwise blend together. Whitespace is optional between self-delimiting tokens. Alphabetic case is not significant except within character and string literals.

Whitespace can be a space character, a tab character, a newline character, or a comment. Implementations can define additional whitespace characters.

A comment can be single-line or delimited. Although comments count as whitespace, the beginning of a comment can blend with a preceding token, so in general comments should be surrounded by genuine whitespace.

A single-line comment consists of two slash characters in a row, followed by any number of characters up to and including the first newline character or the end of the source record. For example, // This line is a kludge!

A delimited comment consists of a slash character immediately followed by a star character, any number of characters (including complete single-line and complete delimited comments), and finally a star character immediately followed by a slash character. For example, /* set x to 3 */.

A single-line comment may appear within a delimited comment; occurrences of slash-star or star-slash within the single line comment are ignored.

A token is a name, a #-word, an operator, a number, a character literal, a string literal, a symbol literal, or punctuation.

A name is one of the following four possibilities:

An alphabetic character followed by zero or more name characters.
A numeric character followed by two or more name characters including at least two alphabetic characters in a row.
A graphic character followed by one or more name characters including at least one alphabetic character.
A \ (backslash) followed by a function operator or another name.

An alphabetic character is any of the 26 letters of the Roman alphabet in upper and lower case.

A numeric character is any of the 10 digits.

A graphic character is one of the following:

! & * < = > | ^ $ % @ _

A name character is an alphabetic character, a numeric character, a graphic character, or one of the following:

- + ~ ? /

The rich set of name characters means that name and operator tokens can blend. Thus Dylan programs usually set off operators with whitespace.

Implementations can add additional characters but programs using them will not be portable.

A #-word is one of #t, #f, #next, #rest, #key, or #all-keys. The first two are literal constants, the others are used in parameter lists. Implementations can add additional implementation-defined #-words, but programmers cannot add their own #-words.

An operator is one of the following:

`+`	addition
`-`	subtraction and negation
`*`	multiplication
`/`	division
`^`	exponentiation
`=`	equality
`==`	identity
`<`	less than
`>`	greater than
`<=`	less than or equal
`>=`	greater than or equal
`~=`	not equal
`~==`	not identical
`&`	logical and
`\|`	logical or
`:=`	assignment
`~`	logical negation

Programmers cannot add their own operators.

A number is a decimal integer with an optional leading sign, a binary integer, an octal integer, a hexadecimal integer, a ratio of two decimal integers with an optional leading sign, or a floating-point number. The complete syntax of numbers is given in Numbers on page 418. Note that the ratios are not included in the Dylan language. The ratio syntax is reserved for future expansion and for implementation-specific extensions.

A character literal is a printing character (including space, but not ' or \) or a backslash escape sequence enclosed in a pair of single-quote characters '.

A string literal is a sequence of printing characters (including space, but not " or \) and backslash escape sequences enclosed in a pair of double-quote characters ".

A backslash escape sequence is a backslash followed by a character or character code. A backslash escape sequence in character and string literals allows quoting of the special characters ', ", and \, provides names for control characters, and allows Unicode characters to be specified by their hexadecimal codes.

The control characters are:

`a`	alarm
`b`	backspace
`e`	escape
`f`	form feed
`n`	newline
`r`	carriage return
`t`	tab
`0`	null

Unicode characters are represented by a series of hexadecimal digits enclosed in angle brackets. If Unicode does not explicitly define the character represented by the hexadecimal digits, the meaning in Dylan is undefined. If an implementation is not able to represent the entire Unicode character set, and the number specifies a character that is not representable in the implementation, an error must be signaled.

For example: "\<44>\<79>\<6c>\<61>\<6e>" = "Dylan"

A symbol literal is a keyword or a unique string. A keyword is a name followed immediately by a colon character :. A unique string is a sharp sign # followed immediately by a string literal.

Punctuation is one of the following:

`(` `)`	parentheses
`[` `]`	square brackets
`{` `}`	curly brackets
`,`	comma
`.`	period
`;`	semicolon
`=`	defaulting/initialization
`::`	type specialization
`==`	singleton specialization
`=>`	arrow
`#(`	list/pair literal
`#[`	vector literal
`?`, `??`	macro pattern variables
`...`	macro ellipsis

Note that some tokens are both punctuation and operators. This ambiguity is resolved by grammatical context.

Note also that some punctuation tokens (for example, period and equal sign) are capable of blending into some other tokens. Where this can occur, whitespace must be inserted to delimit the token boundary.