Home | About | Partners | Contact Us

SourceForge Logo

Quick Links
Building XL
XL Mailing List

Understanding XL
Conceptual overview
XL examples
Inside XL
Concept Programming

In depth
Browse GIT
SourceForge Info

Other projects
GNU Project
The Mozart Project

XLR: Extensible Language and Runtime

The art of turning ideas into code

Evaluating Languages by Concept Programming Standards

Prev: Limits of existing languages


Next: The Difference between Objects and Concepts

In addition to built-in limits, which are an obstacle to comfortably representing a wide variety of concepts, many languages exhibit characteristics which do not rank well by concept programming metrics. Let's consider four simple examples, by no means exhaustive:

Output parameters in C

Unlike several other languages (for instance Ada), C doesn't feature output parameters. Rather, based on the observation that many compilers of the old times used by-reference argument for output parameters, C requested programmers to make the reference explicit (a pointer) whenever they want an output parameter.

For instance, C programmers cannot write the following:

/* Illegal C */ errorcode_t GetWindowBounds(in window_t win, out rect_t rect);

Instead, they need to write something like:

errorcode_t GetWindowBounds(window_t win, rect_t *rect);
where the rect_t * notation (a pointer to a rect_t) really means rect is an output parameter rather than rect is a pointer.

This is the archetypal example of semantic noise: the notation introduce a significant semantic difference between what the programmer means (getting data out of the function) and what he needs to write (passing a parameter containing the address where data can be copied). The notation also introduces a bit of syntactic noise, since the * notation doesn't represent anything in the problem space.

Why is this bad? Here is one reason: on any modern 64-bit architecture, the rect_t structure might fit in one or two 64-bit words (for 16-bit or 32-bit coordinates respectively). Passing two registers out of the function is certainly much faster than passing one register and then performing four sub-word stores (in the GetWindowBounds function) followed by four sub-word loads (in the caller, to retrieve the data). In general, exposing pointers to unknown addresses like this also reduces optimization opportunities dramatically. This is a general rule: semantic noise limits possible optimizations because the compiler gets an incomplete or incorrect view of the desired semantics.

Parentheses in Lisp

Lisp has a very small number of very powerful core constructs. Among these is the notion of list. Practically anything in Lisp is a list. In particular, programs are written as lists. This makes the manipulation of programs fragments by Lisp programs not only possible, but quite easy. This is a very powerful tool, and a necessary basis for language extensions.

However, Lisp lists, while very easy to parse and store for computers, are not the ideal representation for a number of very fundamental concepts. Consider something as simple as the addition. In standard Lisp, to add two and two, you do not write 2 + 2 but rather (+ 2 2), which is a Lisp list containing three elements, the + symbol and two numbers. According to Lisp rules, this list will evaluate as 4. This notation is called prefix polish notation, and is close in principle to the reverse polish notation (RPN) used in many HP calculators.

This notation is universal, simple to understand, and easy to implement. But it is not the natural notation we have been taught in school. This is an example of syntactic noise where the syntax being used is constrained by the tool, rather than by the choice of the programmer. The fact that some programmers would actually choose this notation if given a choice illustrates how subjective the concept programming metrics are.

Expressions in Smalltalk

In Smalltalk, you can actually write 2 + 2, and it does indeed computes 4. So far, so good.

Unfortunately, the familiar notation hides a particularly sneaky form of semantic noise, and expressions will give the familiar result only for the simplest scenario. This is because in Smalltalk, everything is an object, and expressions had to bend to that rule. So when you write 2 + 2, what this really means is: send object 2 the message + with argument 2.

Why does it matter? Consider an expression such as 2+3*5. In traditional notation, precedences are such that 3*5 is computed first, so the result is 2+15, or 17. Not so in Smalltalk, where 2 receives message + with argument 3 and the resulting object, 5, receives the message * with the argument 5. The resulting value is 25.

Of course, one could argue that this is really a form of syntactic noise and that appropriate precedences could easily be restored with a minor change in the language. But the point is that the default precedences are not the one some people think they know, and this causes unnecessary risks for programmers.

The 'string' types

Originally, a string was seen as some ideal, variable-sized array of contiguous elements. Of particular interest was the notion of string of characters, which was generally used to represent text. For instance, in practically any language, something like "Hello World" is called a string of characters. In practice, most early languages implemented fixed-size arrays (as a built-in construct), but did not provide a general-purpose string construct for arbitrary types. Yet, because of their use to represent text, character strings were often implemented as a special, built-in case.

Consequently, in many languages, the term string ended up being understood as "string of character", and the type system sometimes contained this denomination. Pascal or C++ are examples of languages where the term "string" really denotes a string of characters, in other words what anybody but programmers would call text. In an ironic twist, when C++ developers introduced what could arguably be thought of as a real "string" type, they called it vector, despite the fact that it is quite different from a mathematical vector, and reserved string for strings of characters.

XL features a general variable-sized array type, which is consequently called string and not something like vector. A particular instance of that type is called text and used to represent text in the code. The text type is actually implemented as string of character. You can, however, make use of a string of integer if this is what you need, and it will behave much like a vector<int> in C++


We can reasonably assert that the designers of existing programming languages were smart and did as good a job as they could. Therefore, the shortcomings illustrated by Concept Programming are not obvious. The Concept Programming methodology brings something new to the programmers' toolbox.

None of the problems above are really serious. They only serve as an illustration of how simple metrics can be used to identify otherwise non-obvious issues. There are many cases where the problem is much more difficult to pinpoint without the appropriate tools, and where its effects are much more subtle and/or more dangerous.

Prev: Limits of existing languages


Next: The Difference between Objects and Concepts

Copyright 2008 Christophe de Dinechin (Blog)
E-mail: XL Mailing List (polluted by spam, unfortunately)