The Meta Library#
Introduction#
This is an implementation of Meta, a technique used to simplify the task of writing parsers. [Baker91] describes Meta and shows the main ideas for an implementation in Common Lisp.
If all META did was recognize regular expressions, it would not be very useful. It is a programming language, however, and the operations [], {} and $ correspond to the Common Lisp control structures AND, OR, and DO.[8] Therefore, we can utilize META to not only parse, but also to transform. In this way, META is analogous to “attributed grammars” [Aho86], but it is an order of magnitude simpler and more efficient. Thus, with the addition of the “escape” operation “!”, which allows us to incorporate arbitrary Lisp expressions into META, we can not only parse integers, but produce their integral value as a result. – [Baker91]
The macro defined here is an attempt to implement Meta (with slightly adapted syntax) for Dylan. It is functional, but not yet optimized.
Exported facilities#
The meta
-library exports the meta
module with the following
macros:
- meta-definer Macro#
Meta integrates the ability to parse from streams and strings in one facility. (The parsing of lists is not implemented yet, because it’s rather useless in Dylan. This addition would be simple to do, though.)
- Macro-call:
define meta name (variables) => (results) meta body end
- Parameters:
name – The meta-function name, which is immediately transformed into
scan-
namevariables – – token-holders used in meta body.
results – An expression returned on a successful scan. This can be omitted in which case, the return value will be
#f
.body (meta) – A sequence of Meta expressions to scan
- Discussion:
The
meta-definer
form works only with theparse-string
source-type of thewith-meta-syntax
form.The user of this form has control over the return value. Usually
#t
is sufficient (in which case the results clause may be omitted, see below); however, e.g., the values of the variables may need to be manipulated during the parse phase.- Example:
define meta public-id(s, pub) => (pub) "PUBLIC", scan-s(s), scan-pubid-literal(pub) end meta public-id;
This definition returns
pub
when it successfully scans the tokens “PUBLIC”, (some) spaces, and a literal whichpub
receives. Note that, hereafter, the meta definition is referred to asscan-public-id
outside the meta syntax block.This example, without an explicit results block, is from the
meta
library:define meta s(c) element-of($space, c), loop(element-of($space, c)) end meta s;
Scans in at least one space (
element-of
andloop
are discussed in the section on Meta expressions).
- with-meta-syntax Macro#
The guts of the
meta-definer
form; use when requiring precise control of variables or constructs- Macro Call:
with-meta-syntax source-type (source #key keys) [ variables ] meta; body end
- Parameters:
source-type – Either
parse-stream
orparse-string
.source – Either a stream or a string, depending on source-type
start (#key) – If source-type is
parse-string
, the index to start at.end (#key) – If source-type is
parse-string
, the index to finish before.pos (#key) – If source-type is
parse-string
, a name that will be bound to the current index during execution of thewith-meta-syntax
forms.meta – A Meta expression.
body – A body. Evaluated only if parsing is successful.
- Values:
results – If parsing fails
#f
, otherwise the values of body.
- Discussion:
Special programming aids:
variables (variable [ :: type ] [ = init ], ...);
Bind variables to init, which defaults to
#f
;Future versions will have further special forms.
- Example:
with-meta-syntax parse-stream (*standard-input*) body end with-meta-syntax; let query :: <string> = ask-user(); with-meta-syntax parse-string (query, start: 23, end: 42) body end with-meta-syntax; with-meta-syntax parse-string (query) ... ['\n', finish()] ... values(these, values, will, be, returned); end with-meta-syntax;
- collector-definer Macro#
General facility to collect data into sequences (by default, into strings). Initially
with-meta-syntax
had this functionality integrated. This is a more modular approach.
- with-collector Macro#
The guts of the
collector-definer
form. This macro allows collecting data into a sequence. This is similar in spirit to Common Lisp’sLOOP
clausesCOLLECT
andAPPEND
, but more flexible. If you want to extract subsequences from a string while parsing it, this is the tool to use.- Macro Call:
with-collector operation ... #key collect, append; body end
- Parameters:
operation – Specifies the mode of operation. See below.
collect – A name for a function that, called with a parameter, inserts this parameter into the sequence.
append – A name for a function that, called with a sequence, appends this parameter to the sequence.
body – A Dylan body (bnf).
- Values:
result – Normally the values of body. There is minimal form of
with-collector
, which always returns the collected sequence.
- Discussion:
Like
COLLECT
,with-collector
can put objects into a list. UnlikeCOLLECT
, it can also create vectors or write into already created vectors.Collecting into a list or vector
Two simple forms of
with-collector
areinto-list
andinto-vector
. They create a list or a vector and write into it. The sequence is available as a variable with a user-defined name:with-collector into-list name #key collect, append; body end with-collector into-vector name #key collect, append; body end
Writing into an existing vector
into-vector
, by default, creates a <stretchy-sequence>. If you don’t like this behaviour, you can specify a different vector that will be used. For instance, if you already know how long the result will be, you might want to create a string in the first place.with-collector into-vector name = init, #key collect, append; body end
Using buffers
Normally it is not known in advance how long the result will be. What is really needed is a sequence that automatically reduces its size after processing is finished.
into-buffer
implements this by returning a subsequence of the original vector.Instead of a variable holding the sequence there is now a function which creates the subsequence.
with-collector into-buffer function-name, #key collect, append; body end
But how do you find out what the maximum buffer size has to be? A safe guess is the length of the original vector you are extracting elements from. The following construct automatically creates a vector of the same class (well,
type-for-copy
) and size as big-one:with-collector into-buffer function-name like big-one, #key collect, append; body end
A minimal collection form
If you don’t need to write into vectors or use buffers, but just want to collect some stuff and return it, use this idiom:
with-collector {into-list|into-vector}, #key collect, append; body // Note: Values of body will be thrown away. end
- Example:
define function parse-finger-query (query :: <string>) with-collector into-buffer user like query, collect: collect; with-meta-syntax parse-string (query) let (whois, at, c); [loop(' '), {[{"/W", "/w"}, yes!(whois)], []}, // Whois switch? loop(' '), loop({[{'\n', '\r'}, finish()], // Newline? Quit. {['@', yes!(at), do(collect('@'))], // @? Indirect. [type(<character>, c), // Else: do(collect(c))]}})]; // Collect char values(whois, user(), at); end with-meta-syntax; end with-collector; end function parse-finger-query;
Aside from the syntactic constructors exported above, the Meta library also provides some commonly-used forms for scanning and parsing:
- scan-s Function#
Scans in at least one space.
- scan-word Function#
Scans in a token and returns that token as a
<string>
instance. A “word” is surrounded by spaces or any of the following characters: ‘<’, ‘>’, ‘{’, ‘}’, ‘[’, ‘]’, punctuation (‘,’, ‘?’, or ‘!’), or the single- or double- quotation-mark.
- scan-int Function#
Reads in digit characters and returns an
<integer>
instance.
- scan-number Function#
Although this is not an all-encompassing conversion utility (although, IMHO, it’s good enough to be part of the standard, once there is one, YMMV), it reads in just about any fixed-point number format and returns a
<real>
instance.
- string-to-number Function#
- Parameters:
- Values:
ans – An instance of
<real>
, the resulting number.
- Discussion:
This really should belong to the common-dylan spec, so that instead of rolling their own, everyone should use this function. It is therefore exported to this end.
Scanning tokens usually entails using some common character types. Meta exports the following:
- $space Constant#
- Equivalent:
Any whitespace.
- $digit Constant#
- Equivalent:
[0-9]
- $letter Constant#
- Equivalent:
[a-zA-Z]
- $num-char Constant#
- Equivalent:
$digit ++ [.eE+]
- $graphic-char Constant#
- Equivalent:
[_@#$%^&*()+=~/]
- $any-char Constant#
- Equivalent:
$letter ++ $num-char ++ $graphic-char
Meta expressions#
Meta is a small, but featureful language, so naturally it has its own syntax. This syntax is adapted to Dylan’s way of writing things, of course.
There are several basic Meta expressions implementing the core functionality. Additionally there are some pseudo-functions, syntactically function-like constructs which simplify certain tasks that would otherwise have to be written manually.
Basic Meta expressions as described by Baker#
Baker
``with-meta-syntax``
Description
fragment
fragment
try to match this
[a b c ... n]
[a, b, c, ..., n]
and/try all
{a b c ... n}
{a, b, c, ..., n}
or/first hit
@(type variable)
type(type, variable)
match any type, store result in variable
Warning
deprecated type
is most often used in seeing if a
character is one of several possibilities. Use element-of
instead.
$foo
loop(foo)
zero or more
!Lisp
(Dylan)
call the code (and check result)
The same grammar which works for streams will works for strings. When parsing strings, more than just one-character look-ahead is possible, though. You can therefore not only match against characters, but also whole substrings. This does not work when reading from a stream.
Additional pseudo-function expressions#
``with-meta-syntax`` |
Description |
Could be written as |
---|---|---|
|
call the code and continue (whatever the result is) |
|
|
finish parsing successfully |
not possible |
|
Match against a predicate. |
not possible |
|
Match against a predicate, saving the result. |
not possible |
|
Save result first, so that expression test can use it. |
|
|
Look one character ahead and store in variable if it passes test. Leave the character on the stream. |
not possible |
|
Sees if the variable (a character) is a member of the sequence, storing the result |
{ ‘a’, ‘b’, ‘c’ } (but not storing result) |
|
Set variable to |
|
|
Set variable to |
|
|
Set variable to value and continue. |
|
|
Match anything and save result. |
|
Example code#
Parsing an integer (base 10)#
Common Lisp version:
(defun parse-integer (&aux (s +1) d (n 0))
(and
(matchit
[{#\+ [#\- !(setq s -1)] []}
@(digit d) !(setq n (digit-to-integer d))
$[@(digit d) !(setq n (+ (* n 10) (digit-to-integer d)))]])
(* s n)))
Direct translation to Dylan:
define constant <digit> = one-of('0','1','2','3','4','5','6','7','8','9');
define function parse-integer (source :: <stream>);
let s = +1; // sign
let n = 0; // number
with-meta-syntax parse-stream (source)
variables(d);
[{'+', ['-', (s := -1)], []},
type(<digit>, d), (n := digit-to-integer(d)),
loop([type(<digit>, d), (n := digit-to-integer(d) + 10 * n)])];
(s * n)
end with-meta-syntax;
end function parse-integer;
Alternative version:
// this will actually return a fn named 'scan-int', not 'parse-int'
define collector int(i) => (as(<string>, str).string-to-integer)
loop([element-of("+-0123456789", i), do(collect(i))])
end collector int;
Parsing finger queries#
define function parse-finger-query (query :: <string>)
with-collector into-buffer user like query (collect: collect)
with-meta-syntax parse-string (query)
variables (whois, at, c);
[loop(' '), {[{"/W", "/w"}, yes!(whois)], []}, // Whois switch?
loop(' '), loop({[{'\n', '\r'}, finish()], // Newline? Quit.
{['@', yes!(at), do(collect('@'))], // @? Indirect.
[accept(c), do(collect(c))]}})]; // then collect char
values(whois, user(), at);
end with-meta-syntax;
end with-collector;
end function parse-finger-query;
References#
[Baker91] Baker, Henry. “Pragmatic Parsing in Common Lisp”. ACM Lisp Pointers 4, 2 (Apr-Jun 1991), 3-15.