Chapter 2
Syntax
Lexical Syntax
Dylan source code is a sequence of tokens. Whitespace is required between tokens if the tokens would otherwise blend together. Whitespace is optional between self-delimiting tokens. Alphabetic case is not significant except within character and string literals.
Whitespace can be a space character, a tab character, a newline character, or a comment. Implementations can define additional whitespace characters.
A comment can be single-line or delimited. Although comments count as whitespace, the beginning of a comment can blend with a preceding token, so in general comments should be surrounded by genuine whitespace.
A single-line comment consists of two slash characters in a
row, followed by any number of characters up to and including the first newline character or
the end of the source record. For example, // This line is a
kludge!
A delimited comment consists of a slash character immediately
followed by a star character, any number of characters (including complete single-line and
complete delimited comments), and finally a star character immediately followed by a slash
character. For example, /* set x to 3 */
.
A single-line comment may appear within a delimited comment; occurrences of slash-star or star-slash within the single line comment are ignored.
A token is a name, a #-word, an operator, a number, a character literal, a string literal, a symbol literal, or punctuation.
A name is one of the following four possibilities:
- An alphabetic character followed by zero or more name characters.
- A numeric character followed by two or more name characters including at least two alphabetic characters in a row.
- A graphic character followed by one or more name characters including at least one alphabetic character.
- A
\
(backslash) followed by a function operator or another name.
An alphabetic character is any of the 26 letters of the Roman alphabet in upper and lower case.
A numeric character is any of the 10 digits.
A graphic character is one of the following:
!
&
*
<
=
>
|
^
$
%
@
_
A name character is an alphabetic character, a numeric character, a graphic character, or one of the following:
-
+
~
?
/
The rich set of name characters means that name and operator tokens can blend. Thus Dylan programs usually set off operators with whitespace.
Implementations can add additional characters but programs using them will not be portable.
A #-word is one
of #t
, #f
, #next
, #rest
, #key
,
or #all-keys
. The first two are literal constants, the others are used in
parameter lists. Implementations can add additional implementation-defined #-words, but
programmers cannot add their own #-words.
An operator is one of the following:
+ | addition |
- | subtraction and negation |
* | multiplication |
/ | division |
^ | exponentiation |
= | equality |
== | identity |
< | less than |
> | greater than |
<= | less than or equal |
>= | greater than or equal |
~= | not equal |
~== | not identical |
& | logical and |
| | logical or |
:= | assignment |
~ | logical negation |
Programmers cannot add their own operators.
A number is a decimal integer with an optional leading sign, a
binary integer, an octal integer, a hexadecimal integer, a ratio of two decimal integers
with an optional leading sign, or a floating-point number. The complete syntax of numbers is
given in Numbers
on page 418. Note that
the ratios are not included in the Dylan language. The ratio syntax is reserved for future
expansion and for implementation-specific extensions.
A character literal is a printing character (including space,
but not '
or \
) or a backslash escape sequence enclosed in a pair
of single-quote characters '.
A string literal is a sequence of printing characters
(including space, but not "
or \
) and backslash escape
sequences enclosed in a pair of double-quote characters ".
A backslash escape sequence is a backslash followed by a character or character code. A backslash escape sequence in character and string literals allows quoting of the special characters ', ", and \, provides names for control characters, and allows Unicode characters to be specified by their hexadecimal codes.
The control characters are:
a | alarm |
b | backspace |
e | escape |
f | form feed |
n | newline |
r | carriage return |
t | tab |
0 | null |
Unicode characters are represented by a series of hexadecimal digits enclosed in angle brackets. If Unicode does not explicitly define the character represented by the hexadecimal digits, the meaning in Dylan is undefined. If an implementation is not able to represent the entire Unicode character set, and the number specifies a character that is not representable in the implementation, an error must be signaled.
For example: "\<44>\<79>\<6c>\<61>\<6e>" = "Dylan"
A symbol literal is a keyword or a unique
string. A keyword is a name followed immediately by a colon
character :
. A unique string is a sharp sign #
followed immediately by a string literal.
Punctuation is one of the following:
( ) | parentheses |
[ ] | square brackets |
{ } | curly brackets |
, | comma |
. | period |
; | semicolon |
= | defaulting/initialization |
:: | type specialization |
== | singleton specialization |
=> | arrow |
#( | list/pair literal |
#[ | vector literal |
? , ?? | macro pattern variables |
... | macro ellipsis |
Note that some tokens are both punctuation and operators. This ambiguity is resolved by grammatical context.
Note also that some punctuation tokens (for example, period and equal sign) are capable of blending into some other tokens. Where this can occur, whitespace must be inserted to delimit the token boundary.