Performance and Flexibility

This chapter covers the advanced topic of balancing performance and flexibility in a Dylan program. If you are writing a stand-alone program and are comfortable with using type constraints as you would in a static language, you do not need to read this chapter carefully. You may want to skim the chapter, so that you have an idea of what options are available to you in the future for larger or more complex projects.

We start out by describing what Dylan’s execution model is, and what we mean by an efficiency model. The efficiency model can help a programmer to choose the appropriate language features for a particular problem. We also explore advanced features of Dylan that will let the programmer negotiate with the compiler to trade away part of the flexibility of the execution model for enhanced performance.

Execution model

Dylan is a dynamic language — everything in Dylan is defined in terms of a dynamic execution model. As we saw in Method dispatch, the execution model of how a method is chosen when a generic function is called with a particular set of arguments is highly dynamic: the arguments are evaluated; the types of the arguments are determined; the applicable methods are found and sorted according to specificity; and, finally, the most specific, applicable method is called. This model implies that values and types can change, and that methods can be added right up until the generic function is called, and any of these changes still have an effect on which method is ultimately chosen. This dynamism — the model that value, number, and type of arguments; return values; applicable method; and method choice and execution are all determined at the last possible moment — is what gives the Dylan language its power.

You might think that this dynamism also means that Dylan must perform poorly, because the only way to obey its execution model is to do a lot of extra computation at run time. But not every program makes use of dynamic features. Most functions accept and return a fixed number of values (often they return only one), and those values are often of a fixed or constrained type. Even programs that do use dynamism will not require it everywhere. So, a good Dylan compiler will identify the static parts of a program, and will compile them statically (that is, in a manner that is competitive with what a compiler of any good static language would do). To do that, the compiler uses a technique called partial evaluation — operations that can be evaluated at compile time (that the compiler knows can have only one outcome), will be done at compile time. Thus, even though the programmer can continue to think and program in terms of Dylan’s dynamic execution model, the compiler will generate efficient code when it can show that it can obtain the same return value without carrying out the full process at run time.

For small projects — projects that can fit in a single library — the compiler can analyze the entire project and generate code that is competitive with any static language. If type constraints are used for all module variables, slots, parameters, and return values (as they would be in a static language), the compiler can generate code equivalent to that generated by compilers for static languages. In the remainder of this chapter, we examine how we can use type constraints, limited types, open classes, open generic functions, domain sealing, and primary classes to balance performance and flexibility in Dylan programs.

Efficiency model

Dylan is a powerful language: Many of the built-in, or primitive, language operations are high-level operations, such as the method-dispatch mechanism, the collection facility, and the exception mechanism. Because of Dylan’s powerful features, it can be hard for the programmer to develop an efficiency model — a model of the absolute or relative cost of different approaches to a problem.

In contrast, in other languages, such as C, every language construct can be explained directly in terms of a small number of machine instructions. Although it may be easy to understand the performance of a C program in terms of a simple model, programming in C is more work for the programmer — the higher-level abstractions are not provided, and must often be built from scratch.

For example, a C programmer expects that the run-time cost of calling a function is the cost of possibly saving registers on a stack, passing the arguments, executing a machine instruction for jumping to a subroutine, and then executing a return instruction at the end of the function; if it is a call through a function pointer, or a C++ virtual function, the cost of an indirect jump must be added. In Dylan, the story is more complicated, because Dylan has a more sophisticated execution model: A call to a generic function might be much more expensive in a dynamic situation, because computing the most specific method could take much longer than would execution of the method itself.

To write efficient programs in Dylan, you have to understand what constructs in the language can be expensive in time or space, and how you can reduce those costs in common cases. This understanding is based on an efficiency model — a conceptual model of how a program in Dylan runs at a low level.

One problem with developing an efficiency model is that there is no single way to implement many Dylan operations. Different compilers do things in different ways, and certain compilers have multiple techniques for compiling the same piece of code, depending on circumstances. Nonetheless, we shall try to give an intuitive feel for which features of Dylan are costly, and which features enable the compiler to make optimizations.

Type constraints

In Dylan, variables, parameters, return values, and slots can all have type constraints. Dylan’s dynamic nature means that type constraints can be looser than is typical of a static language, or can even be deferred altogether, in support of rapid prototyping or evolutionary development. Type constraints in a dynamic language serve three primary purposes:

  1. Type constraints are required for method dispatch: the methods of a generic function are distinguished by the types of their required arguments. The generic function chooses the applicable methods by sorting them according to the type constraints of their parameters.
  2. Type constraints can be used optionally to enforce program restrictions. The compiler ensures that a variable, parameter, return value, or slot will never take on a value that is incompatible with the type constraint of the parameter, return value, or slot. (If the compiler cannot prove at compile time that an incorrect type is impossible, it inserts a run-time check to enforce the type constraint.)
  3. Type constraints allow the compiler to generate better code, because they are a contract between the programmer and the compiler that the variable, parameter, return value, or slot in question will never take on a value that is incompatible with its type constraint; hence, the compiler needs only to generate code for dealing with the declared type.

Many Dylan compilers use type inferencing to determine the possible types of variables, parameters, and slots that do not have explicit type constraints. Within a library, the compiler essentially knows everything about the variables and functions that are not exported at the library interface — it can analyze all uses of variables, and all callers and callees of functions. Through this analysis, the compiler can develop a worst-case scenario of the possible types of every variable, parameter, return value, and slot. As a result, these compilers generate efficient code even if the programmer does not fully declare all types (as would be required in most static languages).

Comparison with C:

Static languages such as C have little need for type inferencing, because the type of every value must be declared, and the types can be checked easily at compile time. On the other hand, when a problem domain is ill-specified, the program is evolving through development, or a value may take on one of several types, the programmer must construct union types, and must use variant records or other bookkeeping to track the actual type of the value manually.

Dylan automatically handles this bookkeeping and uses type inferencing to minimize the associated overhead. At the same time, when the type of a variable can change at run time, Dylan also automatically tracks the changing type.

Some compilers have a facility for generating performance warnings, which inform you when type inferencing is not able to determine types sufficiently to generate optimal code. Some compilers have a facility for generating safety warnings, informing you when type inferencing is not able to determine types sufficiently to omit run-time type checking. As an example, consider these definitions (which are similar to, but not exactly the same as, the definitions on which we settled in Four Complete Libraries):

define abstract open class <sixty-unit> (<object>)
  slot total-seconds :: <integer> = 0, init-keyword: total-seconds:;
end class <sixty-unit>;

define method decode-total-seconds
    (sixty-unit :: <sixty-unit>)
 => (hours :: <integer>, minutes :: <integer>, seconds :: <integer>)
  let total-seconds = abs(sixty-unit.total-seconds);
  let (total-minutes, seconds) = truncate/(total-seconds, 60);
  let (max-unit, minutes) = truncate/(total-minutes, 60);
  values (max-unit, minutes, seconds);
end method decode-total-seconds;

Because we made the choice to store total-seconds as an integer, and because 60 is an integer constant, the compiler can infer that the truncate/ calls are for an integer divided by integer. There is no need to consider whether to use floating-point or integer division.

If we were more concerned with testing out ideas, we might have left unspecified the type of the total-seconds slot (implicitly, its type would then be <object>), or, if we wanted to keep the option of having times more accurate than just seconds, we might have specified that its type was <real>, allowing for the possibility of using floating-point numbers, which can express fractional seconds.

If we left the type of the total-seconds slot unspecified, the compiler would need to check the arguments to truncate/, on the off chance that an argument was not numeric at all. In some compilers, you would be able to get a compile-time safety warning stating that a run-time type error is possible (which, if unhandled, will result in program failure), and that the check, and the possibility of a run-time error, could be avoided if the compiler knew that total-seconds was a <real>.

What is a safe program?

Dylan is always safe in that a programming error cannot cause a corruption of the program (or of other programs). For example, an out-of-bound array access or passing an argument of incompatible type simply cannot happen. The compiler will either prove that the requested action is impossible, or will insert code to verify bounds or type at run time, and will signal an error if the bounds or type is incorrect.

When we discuss safety in this section, we are referring to whether or not such errors will be visible to the user. If we have not provided for a recovery action, signaling of an error will halt the program. See Exceptions, for an example of how run-time errors can be handled by the program.

Comparison with Java:

Java recognizes the need for safe operations, and has eliminated many of the unsafe practices of C and C++, adding such checks as array-bounds checks and type-cast checks at run time. However, Java retains the C mathematical model that trades performance for correctness. Java integers are of a fixed size, and computations that cannot be represented in that size silently overflow. In contrast, Dylan requires numeric operations to complete correctly or to signal an error. Several Dylan implementations are also expected to provide libraries for infinite-precision numerical operations.

If we specified the type of the total-seconds slot as <real>, the compiler would have to dispatch on the type of total-seconds, using either floating-point or integer division as necessary. In some compilers, we would be able to get a compile-time performance warning stating that this dispatch could be omitted if the compiler knew that total-seconds was of a more restricted type.

Note that the type of the return value of decode-total-seconds can be inferred: max-unit and minutes must be <integer> (inferred from the definition of truncate/), and seconds must have the same type as total-seconds (<integer>, in our example); thus, the compiler does not have to insert any type checks on the return values of decode-total-seconds. Dylan enforces declared return types in the same way as it enforces parameter types, by eliminating the check where type inferencing can show it is not needed, and using the enforced types to make further inferences.

From this example, you can see how the compiler can get a lot of mileage from a small number of constraints, and how it can point you to the places where further clarification will produce the most performance and safety benefits. At the same time, Dylan does not require that you have all your types thought out in advance of compiling the program; the dynamic nature of the language allows Dylan to defer considering type information until the program is actually running. In good Dylan development environments, there is support for resolving and continuing from run-time type errors during program development (rather than requiring editing of the code and recompilation).

Remember that your code is more suited to reuse when it has fewer and more general type constraints. If you have a compiler that can issue safety and performance notes, try to generalize and minimize your type constraints, being guided by your safety and performance requirements. Often, just the constraints required to specify method applicability will be sufficient for good safety and performance. Declaring the types of module variables, slots, and return values of functions is also useful and can help to document your program. Declaring types for constants and local variables can be useful for enforcing program correctness, but is unlikely to create optimization opportunities, and might actually reduce performance, because the compiler will insert type checks to enforce such constraints if they are overly restrictive.

Limited types

Some of Dylan’s built-in types are extremely general. When these types are used, the compiler’s type inferencing is thwarted, and less efficient code will be generated. The place where this situation is most obvious is in the <collection> types, where the elements of a collection are essentially like multiple slots, all with the same type constraint. For the built-in collections, elements typically have a general default type (often simply <object>), and there can be an arbitrary number of them. The limited mechanism is a way to specify that you expect to store objects of a particular type in the collection, and to specify how many elements will be in the collection.

As an example, in Vehicle containers, the generate-gates method returns a <vector>. Without further information, the compiler must assume that that vector might contain objects of any types. As a result, the following code in the build-simple-airport method from The airport-test.dylan file, will be inefficient:

let gates = generate-gates(gates-per-terminal, capacity);
...
for (gate in gates)
  gate.connected-to := taxiway-vector;
end for;

Because the compiler can infer only that gates is a <vector>, it must generate extra code to determine whether each gate has a connected-to method on it. We can use limited types to constrain gate-instances as follows:

define constant <gate-vector> = limited(<vector>, of: <gate>);

define method generate-gates
    (gates-per-terminal :: <vector>, default-gate-capacity :: <size>)
 => (gates :: <gate-vector>)
  let result = make(<gate-vector>, size: reduce1(\+,
                                                 gates-per-terminal));
  ...
  values(result);
end method generate-gates;

With the limited constraint of the return value of generate-gates, the compiler can ensure that only gate objects will ever be stored in the vector; hence, it can be sure that each gate will be a <gate> and will have a connected-to method.

Note that limited-collection types are instantiable types; that is, you can make an object of a limited type. This capability is different from similar constructs in certain other languages, in which those constructs are only an assertion about the range or type of values to be stored in the collection. Having declared the return value of generate-gates to be a <gate-vector>, it would be an error to return a <vector> instead; hence, we changed the argument to make when constructing result to be <gate-vector> instead of the original <vector>.

If <gate> and connected-to are not open (as described in Open generic functions and Open classes), the compiler can infer that connected-to is used here to set a slot in the gate instance and to further optimize the code generated. We do not delve into the exact details of what the compiler has to know to make this optimization, but it is worth noting that, if either the class or the generic function were open, the optimization could not be made.

Comparison with C++:

The Dylan limited-collection types provide a capability similar to that offered by the C++ template classes. Unlike in C++, the base type of a limited-collection type (the equivalent of a C++ class template — in the example above, <vector>) is also a valid type. Dylan’s dynamic capabilities mean that Dylan can defer determining the element type of a collection until run time, in effect adapting the class template as it goes along. By using a limited type, the compiler can generate more efficient code.

Another use of limited types is to allow compact representations. We can use limited with the built-in type <integer> to specify numbers with a limited range that can be stored more compactly than integers. It is especially useful to use a limited range in combination with a limited collection; for example,

define constant <signed-byte-vector>
  = limited(<simple-vector>,
            of: limited(<integer>, min: -128, max 127));

In the preceding example, we define a type that can be represented as a one-dimensional array of 8-bit bytes.

Comparison with C:

C provides efficient data representations, because its data types typically map directly to underlying hardware representations. A drawback of C is that its efficient data representations are often not portable: The size of a short int may vary across platforms, for instance. Dylan takes the more abstract approach of describing the requirements of a data type, and letting the compiler choose the most efficient underlying representation. A drawback of the Dylan approach is that it cannot easily be used for low-level systems programming, where data structures must map reliably to the underlying hardware. Most Dylan systems provide a foreign-function interface to allow calling out to C or some other language more suitable to these low-level tasks. Some Dylan systems augment the language with machine-level constructs that provide the level of control necessary while staying within the object model as much as possible.

Comparison with Java:

Java recognizes that portable programs need well-defined data types, rather than types that map to the particular underlying hardware differently in each implementation. However, Java retains some of C’s concreteness in simply specifying four distinct sizes of integer (in terms of how many binary digits they hold), and forcing the programmer to convert integer types to objects manually, when object-oriented operations are to be performed. In contrast, Dylan’s limited-integer types specify, at the program level, the abstract requirements of the type, giving the compiler freedom to map the program requirements as efficiently as possible to the underlying architecture.

Enumerations

Many languages provide enumeration types both to enforce program correctness and to provide more compact representation of multiple-choice values. Dylan does not have a built-in enumeration type, but you can easily construct enumerations using the type-union and singleton type constructors.

For example, consider the <latitude> and <longitude> classes, where there are only two valid values for the direction slot in each class. Rather than enforcing the restrictions programmatically, as we did in Virtual slots, we can create types that do the job for us:

define abstract class <directed-angle> (<sixty-unit>)
  slot direction :: <symbol>, required-init-keyword: direction:;
end class <directed-angle>;

define constant <latitude-direction>
  = type-union(singleton(#"north"), singleton(#"south"));

define class <latitude> (<directed-angle>)
  keyword direction:, type: <latitude-direction>;
end class <latitude>;

define constant <longitude-direction>
  = type-union(singleton(#"east"), singleton(#"west"));

define class <longitude> (<directed-angle>)
  keyword direction:, type: <longitude-direction>;
end class <longitude>;

Here, the abstract superclass specifies that the read-only slot direction must be a <symbol>, and that it must be initialized when an instance is created with the keyword direction:. The constant <latitude-direction> is a type specification that permits only the symbol #"north" or the symbol #"south". The class <latitude> specifies that, when an instance of <latitude> is made, the initial value must be of the <latitude-direction> type. We handled the longitude case similarly.

The use of type-union and singleton to create enumeration types in this fashion is common enough that the function one-of is usually available in a utility library as a shorthand:

define constant one-of
  = method (#rest objects)
      apply(type-union, map(singleton, objects))
    end method;

With this abbreviation, the direction types can be written more compactly:

define constant <latitude-direction> = one-of(#"north", #"south");

define constant <longitude-direction> = one-of(#"east", #"west");

Some Dylan compilers will recognize the idiomatic use of type-union and singleton to represent such enumerations more compactly. For instance, a compiler could represent the direction slot of a latitude or longitude as a single bit, using the getter and setter functions to translate back and forth to the appropriate symbol.

Direct methods

The definition of the one-of constant is a method called a direct method or bare method. It is the equivalent of a function in other languages. A bare method does not create an implicit generic function, and invoking a bare method does not use method-dispatch procedure, but rather calls the method directly. We choose to use a bare method here because we are sure that one-of will never need method dispatch: it performs the same operation independent of the types of its arguments. The bare method serves to document this intent. If there were some possibility of additional methods, it would be more perspicuous to use a generic function, even if there is initially only one method. Most Dylan compilers will generate equally efficient code for a bare method and for a generic function with only one method, so the choice of which to use should be based on whether or not it would ever make sense to have additional methods that discriminate on parameter types.

Tail calls

The most important construct in the Dylan execution model is the function call, because function calls are the most common operation in the language. Remember that all slot accesses and assignments, arithmetic operations, and collection accesses obey the execution model of function calls, even if the syntax for them does not look like that of function calls.

We have already discussed how Dylan compilers can optimize away run-time checking of argument types and the overhead of method dispatch, and that good compilers will generate equally efficient code for calls to single-method generic functions or direct methods.

There is one additional optimization that good Dylan compilers will make, which is enabled by a particular style of programming. If the final operation in a method is a call to another function (called a tail call) then the calling function can jump directly to the called function, rather than using a call-and-return sequence. Thus, the return from the called function returns to its caller’s caller.

As an example, consider this decode-total-seconds method:

define method decode-total-seconds
    (sixty-unit :: <sixty-unit>)
 => (hours :: <integer>, minutes :: <integer>, seconds :: <integer>)
  decode-total-seconds(sixty-unit.total-seconds);
end method decode-total-seconds;

The inner call to decode-total-seconds can be a direct jump rather than a function call, because the compiler can infer which method should be called and that the return values already have the correct constraints.

Typed generic functions

In addition to specifying the types of the parameters and return values of methods, you can specify the types of the parameters and return values of a generic function. You usually restrict the parameter types of a generic function to establish the contract of the generic function — that is, to define the domain of arguments that the generic function is intended to handle, and the domain of the values that it will return.

If we define a method without also defining a generic function, Dylan creates an implicit generic function with the most general types for each parameter and return value that are compatible with the method. For example, assume that we defined a method for next-landing-step, and did not explicitly create a generic function for it. The method is as follows:

define method next-landing-step
    (storage :: <sky>, aircraft :: <aircraft>)
 => (next-class :: false-or(<class>), duration ::
     false-or(<time-offset>))
  ...
  end if;
end method next-landing-step;

When we define a method without also defining a generic function, the compiler will generate an implicit generic function for us, which, in this case, will be as though we had defined the generic function like this:

define generic next-landing-step (o1 :: <object>, o2 :: <object>)
  => (#rest r :: <object>);

In The schedule.dylan file, where we did define a generic function, we used a simple definition, just documenting the number of arguments, and giving them mnemonic names:

define generic next-landing-step (container, vehicle);

Because we did not specify types of the arguments or return values, they default to <object>, just as they did in the preceding implicit generic function.

Although the generic function that we wrote does prevent us from defining methods with the wrong number of arguments, it does not constrain the types of those arguments or the format or type of return values in any way. A sophisticated compiler may be able to make inferences based on the methods that we define, but we could both aid the compiler and more clearly document the protocol of next-landing-step by specifying the types of the parameters and return values in the definition of the generic function:

define generic next-landing-step
  (storage :: <vehicle-storage>, aircraft :: <aircraft>)
 => (next-storage :: <vehicle-storage>, elapsed-time :: <time-offset>);

Now, the compiler can help us. If we define a method whose arguments are not a subclass of <vehicle-storage> and a subclass of <aircraft> (for example, if we provided the arguments in the wrong order), the compiler will report the error. Furthermore, the compiler can use the value declaration to detect errors in the return values (for example, if we returned only a single value or returned a value of the wrong type). Finally, the compiler can be asked to issue a warning if there is a subclass of the argument types for which no method is applicable.

In addition to establishing a contract, specifying the types of the parameters and return values of generic functions can allow the compiler to make additional inferences, as described in Type constraints with regard to truncate/. In the absence of other information, the compiler is limited in the optimizations that it can make based solely on the parameter types in the generic function, so it is generally best not to restrict artificially the types of a generic function, but rather to use the restricted types to document the generic function’s protocol.

Open generic functions

By default, generic functions are sealed. When you use define generic, that is the same as using define sealed generic. No other library can add methods to a sealed generic function — not even on new classes that they may introduce. Methods cannot be added to, or removed from, the generic function at run time. The only methods on a sealed generic function are the methods that are defined in the library where the generic function itself is defined. Because of the restrictions on a sealed generic function, the compiler, using type-inference information, can usually narrow the choice of applicable methods for any particular call to the generic function, eliminating most or all of the overhead of run-time dispatching that would normally be expected of a dynamic language.

We saw in Libraries and Modules, that we must define a generic function that is part of a shared protocol using define open generic, so that libraries sharing the protocol can implement the protocol for the classes that they define, by adding methods. If we do not define the generic function to be open, other libraries are prohibited from adding methods to the generic function, which would make it useless as a protocol. Unfortunately, a generic function that is open cannot be optimized. Even when the compiler may be able to infer the exact types of the arguments to the generic function in a particular call, because an open generic function may have methods added or removed, even at run time, the compiler must produce code to handle all these possibilities.

Because open generic functions cannot be optimized, you should use them only when necessary. You need to balance the division of your program into libraries against the need to export and open more generic functions if the program is too finely divided. This balance is illustrated by the considerations we made in designing a protocol in Protocol design. When we chose to split the time and angle libraries, we were forced to create the say protocol library and open the generic function say. In Sealed domains, we show how to regain certain optimizations when you decide that opening a generic function is required.

Note that generic functions that are defined implicitly in a library — such as those that are defined when you define only a single method, or those that are defined for slot accessors — are sealed by default. If you expect other libraries to add methods to one of these implicit generic functions, you must define the generic function explicitly to be open using define open generic.

Open classes

By default, classes are sealed. When you use define class, that is the same as using define sealed class. Other libraries cannot directly subclass a sealed class — they cannot define new classes that have your sealed class as a direct superclass. The only direct subclasses of the class are those subclasses that are defined in the library where the class itself is defined. Extensive optimization opportunities occur when the methods of a sealed generic function are specialized on sealed classes. In this case, the compiler can usually choose the correct method of the generic function to call at compile time, eliminating any run-time overhead for using a generic function.

We saw in Libraries and Modules, that we must define a class that is a shared substrate, such as <sixty-unit>, using define open class, if the libraries sharing the substrate are expected to subclass the class. If we did not define the class to be open, other libraries would be prevented from subclassing it — which might be reasonable if the substrate were not intended to be extended by subclassing.

Unlike an open generic function, an open class does not prevent all optimization. If a generic function has a method applicable to an open class, but the generic function is sealed, then the compiler might still be able to optimize method dispatch if that compiler can infer the types of the arguments to the generic function at a particular call. Sometimes, the dispatch code will be slightly less optimal, because it must allow for arbitrary subclasses, rather than a fixed set of subclasses; in general, however, opening a class is less costly than is opening a generic function.

Note that, although you cannot directly subclass a sealed class from another library, you can subclass a sealed class in the library that defines the sealed class. It may not be obvious, but a corollary of this rule of sealing is that you can define an open subclass of a sealed class in the library that defines the sealed class. Using a sealed class with an open subclass is one simple way to get both flexibility and efficiency — the classes in the sealed branch will be optimized by the compiler, while the open subclass can be exported for other libraries to build on and extend.

Sealed domains

When you define a protocol that is meant to be extended by many libraries, both the base classes and the generic functions that make up the protocol must be open. This simple exigency might make it seem that there is no hope of optimizing such a protocol — however, there is hope. You use the define sealed domain form to seal selectively subsets or branches of the protocol, permitting the compiler to make all the optimizations that would be possible if the classes and generic functions were sealed, but only for the particular subset or branch in question.

Advanced topic:

Sealed domains are one of the most difficult concepts of the Dylan language to understand fully. It is reasonable to defer careful reading of this section until you are faced with a situation similar to the example — an imported open class and generic function that will be specialized by your library.

As an example, consider the say protocol as used in the time library. Because the say generic function is defined to be open, even if the compiler can infer that the argument to say is a <time> or <time-offset>, it must insert code to choose the appropriate method to call at run time on the off chance that some other library has added or removed methods for say. The solution is to add the following definition to the time library:

// Declare the say generic function sealed, for all time classes
define sealed domain say (<time>);

This statement is essentially a guarantee to the compiler that the only methods on say that are applicable to <time> objects (and also to <time-of-day> and <time-offset> objects, because <time-of-day> and <time-offset> are subclasses of <time>) are those that are defined explicitly in the time library (and in any libraries from which that one imports). Thus, when the compiler can prove that the argument to say is a <time-offset>, it can call the correct method directly, without any run-time dispatch overhead.

Another way to get the same effect as a sealed domain, which is also self-documenting, is to use define sealed method when defining individual methods on the protocol. So, for instance, in the case of the time library, we might have defined the two methods on say as follows:

define sealed method say (time :: <time>)
  let (hours, minutes) = decode-total-seconds (time);
  format-out("%d:%s%d", hours, if (minutes < 10) "0" else " " end,
             minutes);
end method say;

define sealed method say (time :: <time-offset>) => ()
  format-out("%s ", if (time.past?) "minus" else "plus" end);
  next-method();
end method say;

Defining a sealed method is the same as defining the generic function to be sealed over the domain of the method’s specializers. In effect, this technique says that you do not intend anyone to add more specific methods in that domain, or to create classes that would change the applicability of the sealed methods.

With either the define sealed domain form or the sealed methods, the use of say on <time> objects will be as efficient as it would be were say not an open generic function after all. At the same time, other libraries that create new classes can still extend the say protocol to cover those classes.

Sealed domains impose restrictions on the ability of other libraries to create new methods, to remove new methods, and to create new classes:

You cannot add methods to an open generic function imported from another library that would fall into the sealed domain of any other library. You can avoid this restriction by ensuring that at least one of the specializers of your method is a subtype of a type defined in your library.

Comparison with C++:

A C++ compiler could optimize out the dispatching of a virtual function by analyzing the entire scope of the argument on which the virtual function dispatches, and proving that argument’s exact class. Unfortunately, that scope is often the entire program, so this optimization often can be performed only by a linker. Even a linker cannot make this optimization when a library is compiled, because the classes of a library can be subclassed by a client. The complexity is compounded for dynamic-link libraries, where there may be multiple clients at once. As a result, this optimization is rarely achieved in C++.

In Dylan, sealed classes, sealed generic functions, and sealed domains explicitly state which generic functions and classes may be extended, and, more important, which cannot. The library designer plans in advance exactly what extensibility the library will have. The Dylan compiler can then optimize dispatching on sealed generic functions and classes and within sealed domains with the assurance that no client will violate the assumptions of the optimization. The sealing restrictions against subclassing or changing method applicability are automatically enforced on each client of a Dylan library.

When you seal a domain of a generic function imported from another library, you will not cause conflicts with other libraries, as long as both of the following conditions hold:

  1. At least one of the types in the sealed domain is a subtype of a class defined in your library
  2. No additional subtypes can be defined for any of the types in the sealed domain

In the case of a type that is a class, the first condition means that you must have defined either the class or one of its superclasses in your library. The second condition means that the classes in the domain must not have any open subclasses (a degenerate case of which is a leaf class — a class with no subclasses at all).

If you need to seal a domain over a class that has open subclasses, you will need a thorough understanding of the sealing constraints detailed in The Dylan Reference Manual, but these two simple rules should handle many common cases.

In our example, we obeyed both rules of thumb: our methods for say are on classes we defined, and our sealing was over classes that will not be further subclassed. The rules of thumb not only keep you from violating sealing constraints, they make for good protocol design: a library that extends a protocol really should extend it only for classes it fully understands, which usually means classes it creates.

As an example of the restriction on subclassing open classes involved in a sealed domain, if the <time> class were an open class, we still could not add the following class in a library that used the time library:

define class <place-and-time> (<position>, <time>)
end class <place-and-time>;

As far as the compiler is concerned, it “knows” that the only say method applicable to a <time> is the one in the time library. (That is what we have told it with our sealed domain definition.) It would be valid to pass a <place-and-time> object as an argument to a function that accepted <time> objects, but within that function the compiler might have already optimized a call to say to the method for <time> objects (based on <time> being in the sealed domain of say). But there is also a method for say on <position>, and, more important, we probably will want to define a method specifically for <place-and-time>. Because of this ambiguity, the class <place-and-time> cannot be defined in a separate library, and the compiler will signal an error.

Note that the class <place-and-time> could be defined in the time library. The compiler can deal correctly with classes that may straddle a sealed domain, if they are known in the library where the sealed domain is defined. It would also be valid to subclass <time> in any way that did not change the applicability of methods in any sealed generic-function domains that include <time>. The actual rule involved depends on an analysis of the exact methods of the generic function, and the rule is complicated enough that you should just rely on your compiler to detect illegal situations.

Slot accessors

Dylan does allow you to omit definition of a generic function. As we mentioned earlier, if you define a method without also defining a generic function, Dylan implicitly creates a generic function with the most general types for the parameters and return values that are compatible with the method. The most common case of implicit generic functions is for the slot-accessor methods that are created when a new class is defined. Because these generic functions typically have only a single method and are sealed by default (see Open generic functions), the compiler can make extensive optimizations for slot accessors, ideally making slot access no more expensive than an array reference or structure-member access in other languages.

Even when a slot is inherited by subclassing, a good Dylan compiler will use a coloring algorithm to assign slots to the same offset in each subclass, keeping the cost of slot access to a minimum. You can use primary classes (see Primary classes) to guarantee efficient slot access. When a program defines explicit methods for a slot getter or setter generic function, of course, the overhead is greater.

Comparison with C++:

Dylan classes are similar to virtual base classes with virtual data members in that the offsets of their data members are not fixed, and access to the data members can be overridden. See The concept of classes in Dylan Object Model for C and C++ Programmers, for a more detailed analogy.

In the <sixty-unit> class, we specified an initial value for total-seconds; hence, there is no need to check that the slot has been initialized before it is accessed. In some situations, it may not be feasible to give a default or initial value for a slot. Dylan permits this omission and will ensure that the slot is initialized before that slot is used; of course, this check does not come for free, so it is preferable to provide initial values where possible. In fact, because we always expect to initialize the total-seconds slot when we make a new <sixty-unit>, it would be more accurate to specify <sixty-unit> as follows:

define open abstract class <sixty-unit> (<object>)
  slot total-seconds :: <integer>,
    required-init-keyword: total-seconds:,
end class <sixty-unit>;

That is, rather than giving the slot an initial value of 0 and an optional init-keyword:, we simply require that the slot be initialized when we make a <sixty-unit> object. Of course, the initial value must obey the type constraint of <integer>. The compiler can still make the inference that the slot will always be initialized and will always have an integer value.

Comparison with C:

Dylan always ensures that a slot is initialized before that slot is accessed, automatically inserting a run-time check when it cannot prove at compile time that the slot is always properly initialized. C puts this burden of safety on the programmer, and that can be the source of subtle bugs. A number of debugging and analysis tools are available as addons to C, to help the programmer with this task.

Always initializing slots, either with a default value or required init-keyword, will make slot access efficient.

Finally, in many cases, slots hold values that will not change over the lifetime of each instance (although they may be different values for each instance). In the case of the <sixty-unit> class, we never change the value of total-seconds. When adding two instances, we create a new one to hold the new value, rather than changing one of the argument instances (that way, we do not have to worry about changing an instance that may still be in use by some other part of the program). In such cases, declaring the slot to be constant both documents and enforces this intent. Furthermore, the compiler can often make additional optimizations for slots that are known never to be modified. The final definition of <sixty-unit> is as follows:

define open abstract class <sixty-unit> (<object>)
  constant slot total-seconds :: <integer>,
    required-init-keyword: total-seconds:,
end class <sixty-unit>;

(The constant declaration is simply shorthand for the slot option setter: #f, meaning that there is no way to set the slot.)

Primary classes

Classes have one additional variation that you can use to optimize performance. A class that is defined as primary allows the compiler to generate the most efficient code for accessing the slots defined in the primary class (whether the accessor is applied to the primary class or to one of that class’s subclasses). However, a primary class cannot be combined with any other primary class (unless one is a subclass of the other). This restriction implies that you should delay declaring a class to be primary until you are sure of your inheritance design. Also, because sealed classes are already highly optimized, the primary declaration is of most use for open classes.

As an example, consider the class <sixty-unit>, and its slot total-seconds, as used in this method for decode-total-seconds:

define method decode-total-seconds
    (sixty-unit :: <sixty-unit>)
 => (hours :: <integer>, minutes :: <integer>, seconds :: <integer>)
  decode-total-seconds(sixty-unit.total-seconds);
end method decode-total-seconds;

Although the generic function for the slot accessor total-seconds is sealed, and it is trivial for the compiler to infer that its argument is a <sixty-unit> in the call sixty-unit.total-seconds, because <sixty-unit> is declared open, the compiler cannot emit the most efficient code for that call. Because an open class could be mixed with any number of other classes, there is no guarantee that the slots of every object that is a <sixty-unit> will always be stored in the same order — there is no guarantee that total-seconds will always be the first slot in an object that is an indirect instance of <sixty-unit>, for instance.

Declaring a class primary is essentially making a guarantee that the compiler can always put the primary class’s slots in the same place in an instance, and that any other superclasses will have to adjust:

define abstract open primary class <sixty-unit> (<object>)
 constant slot total-seconds :: <integer>,
   required-init-keyword: total-seconds:;
end class <sixty-unit>;

By adding the primary declaration to the definition, any library that subclasses <sixty-unit> is guaranteed to put total-seconds at the same offset. Hence, the compiler can turn the call sixty-unit.total-seconds into a single machine instruction (load with constant offset), without concern over which subclass of <sixty-unit> was passed as an argument.

Comparison with C++:

A primary class is like an ordinary base class in C++. Because only one primary class is allowed as a base class, its data members can be assigned the same fixed offset for all derived classes. See The concept of classes, for a more detailed analogy.

It is permissible to make subclasses of a primary class also primary, essentially freezing the assignment of all the slots in the subclass too. What is not permissible is to multiply inherit from more than one primary class; as you can see, such behavior would lead to a conflict between the fixed slot assignments.

Because primary classes restrict extension in this way, you should use them sparingly in libraries intended to be software components. Primary classes are of most benefit in large, modular programs, where all the clients of each component are known, and the need for extensibility is bounded; typically that occurs toward the end of a project, when you are tuning for performance.

Additional efficiency information

In this section, we review additional techniques that compilers can use to generate code that obeys the Dylan execution model, but is more efficient than a straightforward implementation of that model might suggest. Knowing about these techniques can help you to evaluate different vendors’ compilers. You will have to consult the documentation of your particular implementation to discover whether or not these techniques are used.

Efficiency of generic function calls

In addition to using type inferencing and sealed domains, another way to speed up generic function calls when they must dispatch at run time is to cache the return values of previous calls. So, for example, the first time that a given generic function is called with certain classes of arguments, the full sorted sequence of applicable methods is computed; after that, however, it only to be only looked up in a table. Thus, if the generic function is called often with the same type of arguments, most calls will be fast. This technique is used in other object-oriented languages, such as Smalltalk and CLOS, and is useful for speeding up completely dynamic situations. Most good Dylan compilers will use some form of cached dispatching.

A second form of cached dispatching is called call-site caching. Although a generic function may have many calls throughout a program, often the types of arguments passed are directly related to where (that is, in what other method) the call is made. Some Dylan compilers will cache the types and methods of each call at the point of call, and will use this cache to avoid dispatch if the same types are passed as arguments in a subsequent call from the same place.

Efficiency of keyword arguments and of multiple values

Keyword arguments are a powerful and flexible, but potentially expensive, feature of Dylan. The processing of keywords and values at run time can be an expensive operation, especially if many keywords are used. A Dylan compiler can pass keyword arguments as efficiently as it can required arguments, if the called function is known at compilation time.

Returning multiple values again raises performance issues. In some implementations of Dylan, there is an extra cost for returning more than one value; in others, the cost is associated with calling a function that does not declare how many values it returns. When the compiler knows what function is being called, these costs usually can be eliminated, but certain costs may still exist — for example, certain implementations may not optimize tail calls between functions that return different numbers of arguments.

Memory usage

Dylan uses automatic storage-management; thus, programmers explicitly allocate objects, and hence memory, but deallocation is automatic and occurs after all references to an object are gone. The process of reclaiming memory when objects are no longer in use is known as garbage collection.

There are strong advantages to automatic storage-management. With manual storage-management, small program bugs, such as freeing of an object that is still in use, can cause subtle bugs that lead to crashes in parts of the program unrelated to where the real problem lies. Dylan is able to guarantee that all programs fail in disciplined ways, usually with exceptions, because the type system and memory management are safe.

But automatic storage-management may create performance concerns. Although early implementations of garbage collection were infamously slow, modern garbage collectors are usually fast enough that using one should not raise concerns for most programs. But some programs with specialized or tuned use of memory may run slower with automatic management.

Whether storage management is automatic or manual, the use of memory raises performance issues. Every allocation of memory takes time, including the time to reclaim unused memory; either the programmer must free it explicitly, or the garbage collector has to do more work.

It is obvious that calling a function such as make, vector, or pair in Dylan allocates memory, but there are operations that implicitly use memory. For example, creating a closure (see Closures) will usually cause Dylan to allocate memory for the closure.

On the other hand, sometimes the compiler is able to prove that an object is never used after the function that creates it returns. In a good compiler, such objects are allocated on the stack, and are reclaimed automatically when the function exits.

A good Dylan development environment will have tools that help you to meter and profile memory usage, so that you can adjust your program to utilize memory efficiently.

Inlining, constant folding, and partial evaluation

One optimization that is common in many computer languages is inlining. Inlining replaces a call to a known function with the body of the function. Inlining is an important optimization in Dylan, because almost all Dylan operations — slot access, array indexing, and collection iteration — involve function calls.

All good Dylan compilers, when compiling for speed, can be aggressive about inlining any computations, as long as doing so would not make a program grow too large. Constant folding (evaluating expressions involving constant values at compile time) and inlining are just two of the partial-evaluation techniques that you should expect to find in any good Dylan compiler.

Comparison with C:

A programmer familiar with the optimizations done in C compilers can think of partial evaluation as an extreme combination of inlining and constant folding. One way in which Dylan has an advantage over C for partial evaluation is that it hard for a compiler to evaluate expressions that involve dereferencing pointers. For example, in C, it is difficult to evaluate partially a call to malloc, but Dylan compilers can often evaluate a call to make at compile time.

Type inference

The quality of type inference can vary greatly among Dylan compilers. Type inference — like most forms of program analysis — works best with simple, straightforward code. Some constructs that are typically difficult for type inference are assignment and calling of block exit functions outside of the method that defines the block exit functions.

One other way in which type constraints can be helpful is that they permit the compiler to choose efficient representations for objects. Most Dylan objects contain enough information for Dylan to determine their class — this one is an important feature for the dynamic aspects of the language. But, suppose we have a 1000 x 1000 limited(<array>, of: <single-float>). There is no reason that each of the numbers in that array should also contain a reference to the <single-float> class; the one reference in the limited type is sufficient. (Note that, if we had used of: <real> or of: <float>, we would have needed more information, since multiple classes would have been possible.)

When an object is represented in such a way, often many of the operations on it can be optimized. For example, the conventional representation of <double-float> will usually require an indirect-memory-reference machine instruction to get at the actual number, so adding two such objects is one floating-point machine instruction and two load-from-memory machine instructions; if a direct representation is used, just the add machine instruction is needed. Further, if the return value is saved in a variable for which type information is not available, it may be necessary to allocate memory dynamically to store the return value.

Types that may have more efficient representations include certain integer classes, the floating-point classes, characters, and Booleans. Precise declarations about these types, especially in slots and limited collections, can lead to significant improvements in both the time and memory needed to run a program.

Summary

The most important point about performance is that it is important to pay attention to efficiency during the entire design and development cycle of a project. During the design phase, try to ensure that the algorithms chosen have the right asymptotic behavior and constant factors, and that it is possible to implement the needed operations efficiently. During the implementation phase, use the language constructs that most clearly express what the program is doing. Once the program is working correctly, it is then time to add type and sealing declarations, and to use metering and profiling tools to find and rewrite heavily used, slow parts of the program, in order to improve the performance.

One of the most important considerations when programming is not to worry about performance too soon. It is always more important that your design and implementation be clear and correct, first. There is no value in arriving at an answer with lightning speed, if it turns out to be the wrong answer.

In this chapter, we covered the following:

  • We showed how Dylan can balance performance and flexibility to support a range of programming requirements.

  • We showed how type constraints affect performance.

  • We showed how limited types can improve performance.

  • We showed how open generic functions provide modularity and flexibility.

  • We showed how open classes provide modularity and flexibility.

  • We showed how sealed generic function domains mitigate the performance penalty of open classes and generic functions.

  • We showed how primary classes permit efficient slot access.

  • We presented both an execution and efficiency model that provides a conceptual model of how a program in Dylan runs, and what the relative cost of different program elements are.

  • We examined the method constructs for flexibility and performance available in Dylan; see Methods: flexibility versus performance.

    Methods: flexibility versus performance
    Construct Effects
    direct method
    • highly optimizable
    • no method dispatch
    sealed generic function on a sealed class
    • highly optimizable
    • not extensible by other libraries
    sealed generic function on an open class
    • optimizable
    • other libraries can subclass
    open generic function on an open class in a sealed domain
    • highly optimizable
    • other libraries can add methods
    • other libraries can subclass
    open generic function on an open class
    • not optimizable
    • methods can be added at run time
    • subclasses can be created at run time
  • We discussed the constructs that can have type constraints, and the influence on performance or flexibility of using such a declaration; see Type constraint: flexibility versus performance..

    Type constraint: flexibility versus performance.
    Construct Effects
    module constants
    • enforce program correctness
    module variables
    • permit type inferencing
    required parameters
    • required for method dispatch
    • permit type inferencing
    optional parameters
    • permit type inferencing
    return values
    • enforce program correctness
    • permit type inferencing
    limited types
    • permit type inferencing
    • permit compact data representation
    slots
    • permit type inferencing