|
|
|
|
Quick Links
Understanding XL
In depth
Other projects
|
XLR: Extensible Language and Runtime
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
XL, an extensible programming language, implements
concept programming
If you want to know more, you should start here.
We were discussing how to simplify the implementation and extension of Write. Currently, two modules implement slightly different versions of Write: - XL.UI.CONSOLE.Write writes to the console - XL.TEXT_IO.Write writes to a file The first one is implemented using the second one, writing to standard output. More are to come, like writing to a string. Like C++, we could decide to use a stream class, that would tell where to write, whether it's a file, a string, etc. We also need either that stream or something else to store formatting preferences, e.g. number of digits. One problem with this approach is that it makes extending write rather complicated. For example, if you want to write a complex number, you need something like:
to Write (output : stream; Z : complex) is Write Z, "(", Z.re, ";", Z.im, ")" This is not too complicated, but if all I actually use is writing to the console, the code above has a rather unnecessary built-in notion that "Write to stdout" is really a write to a stream. Also, when you write Z.re or Z.im, you will use the formatting for real-numbers (e.g. number of digits), but how would you specify additional formatting info for complex numbers, e.g. allow you to choose between (1;3), [1, 3] or 1 + 3i ? Ideally, I would like this:
to Write(Z : complex) is Write ComplexFormat.OpenSeparator, Z.re, ComplexFormat.InnerSeparator, Z.im, ComplexFormat.ClosingSeparator But then, that variant would not allow you to write to a file... or would it? This is where the idea of scope injection comes in.
It seems like we could actually make it work with a file, if we add the idea that the Write we call could have an implicit generic dependency on a StandardOutput variable. Consider that the final step in writing, writing a character, is currently written as something like:
to Write(F : stream; C : character) is C.putc(C, F) What if we used any-lookup with an extension, and replaced that tail with one that doesn't take the standard output argument, but introduces it:
to Write(C : character) is Write any.StandardOutput, C The idea is the following: if you do not redefine StandardOutput in the instantiation context, then any.StandardOutput evaluates as the StandardOutput in the current context, which is a stream, so you end up calling the Write-to-stream function. But you can override StandardOutput and instantiate complex to write into some text like this:
to Write (in out target : text; C : character) is target += C Now, here is what happens. The call to Write Z gets decomposed to its components, until the final Write C for individual characters. But at that stage, the any.StandardOutput finds a StandardOutput in the instantiation context, specifically the local parameter to the second Write we just defined. A few things are a bit complicated to make this work: - We need to detect implicit dependencies to generic items, e.g. the any.StandardOutput in the character Write. - This may now resolve to a local variable, not just some global. In that case, we need to pass some implicit parameter down the instantiation stack, in that case the StandardOutput : text parameter. It is also not clear if it is a good idea to create this implicit dependency based solely on the implementation, i.e. based on the use of any.StandardOutput inside a function that, otherwise, would not even be generic. An alternative is to add some new syntax indicating that generic dependency in the interface, something like:
generic [name StandardOutput] to Write (C : character) is Write StandardOutput, C This would make it much clearer which part is generic and which part is not. Any opinion is welcome...
There is a blog entry on Eric Raymond's web page that is worth pointing at, in particular because of the discussion in the comments. What strikes me in these comments is that people still oppose dynamic or scripting languages like Lisp or Python to system languages like C or C++. One commenter indicates that he has to use C++ because no other advanced language can do real-time. That is one of the reasons I chose to design XL (I started my professional life writing hard real-time code, think sub-millisecond.) But as XL proves, even in its present unfinished state, it is possible to design a language that offers a much higher level of abstraction than C++ and is also much more efficient. For example, I remember at some point timing the code for native/TESTS/12.Library/julia.xl on Itanium, and getting almost a 70% speed boost relative to C++. Why? Because the XL construction does not require any kind of memory touch to perform complex operations, whereas C++ mandates them. Specifically, the complex class has an implicit this pointer, so that an operator like operator+ implicitly performs four loads (real and imaginary part of both arguments) and two stores (storing real and imaginary part of the result). These loads and stores can be eliminated with really advanced back-end optimizations such as register field promotion, the kind of which the HP C++ compiler only performs at O3 or above. The XL complex arithmetic code in native/library/xl.math.complex.xl, by contrast, generates code that can be done entirely in registers, even at the lowest optimization levels. And this, despite having a level of abstraction that is at least comparable to the C++ implementation of the same. What this tells me is that it is possible to design a language that is very high level, yet at the same time retains and even enhances the good attributes of C++, its ability to generate high-performance code.
This is a comment to the C++ considered harmful blog entry Eric Raymond posted last month. Eric, I agree with Jeff that writing about the problems of C++ will only be effective if you can offer something that is better. Consider your own critique of the Unix Hater's Handbook: the best chapters were those where the authors had something better in mind that they could use as a reference. I believe you suggest that Python or similar languages are the "better" solution in the language space, but as many have pointed out, there are many things C++ can do that Python can't. Let me respectfully propose my own biased answer. Designing a modern language is a problem I have thought about long and hard for a very long time (I'd say 15 years). The result is the XL programming language, and a programming paradigm I called concept programming. Highly idiosyncratic, certainly, and even more highly confidential at this point. But I think it's worth sharing with you. Your criticism of C++ will be stronger if you know of other ways to achieve the same goals as C++ than if you suggest that we should change the goals. And to make it clear, the goal in my opinion is to be able to develop very large and complex programs that still take advantage of the machine to the maximum possible extent. In other words, I am not ready to sacrifice performance or memory usage for convenience, and I don't think we should have to. Many commenters here expressed the same feeling.
So how exactly is XL improving relative to C++? First, XL stands for "Extensible Language". The initial intuition is that the problem set is not bounded. So I wanted to create a language that made the code representation of arbitrary concepts possible, not just "functions" or "objects". Actually, it should not just be possible, it should be easy. To illustrate, it should make it just as easy to add symbolic differentiation to the language as it would be to add a class in C++. I wouldn't comment here if I didn't think I have achieved that objective. But it goes further. When you start thinking in terms of concepts and representations, you start seeing flaws in other languages and designs that are not obvious, and you start designing things in a different way. Highly idiosyncratic, I told you :-) In short, instead of a monolithic compiler like practically everybody else has done so far, XL has a tiny compiler core implementing a standard syntax, and a number of compiler plug-ins that cooperate to implement the language. Let me illustrate this with things that you or your readers seem to care about: efficiency, code density, high abstraction levels, operator overloading, preprocessing, garbage collection, rapid prototyping, quick build cycles, interaction with other languages such as C, simple syntax. Obviously, that won't be a short post...
There are other things I did not even talk about, like how XL will deal with real-time or parallelism. Can you guess I'm proud of my language? OK, it doesn't exactly have a cult following yet, and a lot of work remains to be done. But hey, I hear you are good at evangelizing stuff, so if you like it, why don't you talk about it and send good programmers my way? ;-)
DevX.com has a special report on the new C++ standard, currently referred to as C++ 0x because it was due sometime between 2000 and 2009. Well, the standard was quite late compared to early expectations, so an easy joke was that we might end up with 0xA or 0xB, the C++ hexadecimal notation for 10 or 11. Ultimately, chances are that the standard will make it for 2009, so we will probably refer to it as C++ 09... This new iteration of the language is of interest to all programmers, because it brings a number of major changes to one of the most popular programming languages today, and one that is already very complex (and therefore hard to extend). But for me in particular, it is all the more interesting to consider how various "innovations" in that new standard compare to features that play the same role in XL.
Concepts?One of the major features in C++ is concepts. DevX has a dedicated article about concepts. In short, concepts in C++ are a way to describe categories of templates, and to help the compiler figure out what the programmer intended for a given template. This new aspect of the language makes it easier to define a real contract between the users of a template and its implementers.
C++ concepts, however, are somewhat annoying to me. One reason is that XL has been for a long time based on an approach that I dubbed concept programming. Concept programming, in the XL sense, is about the relationship between concepts that exist only in our head, and concept representations that exist in the computer. The key idea is to make sure that implementations look and feel like the concepts they represent. One key consequence of that idea is that a programming language should comfortably support arbitrary concepts, not some finite set (e.g. functions or objects), because the set of concepts we manipulate is not a-priori limited. This is the key reason so much effort was put into making XL extensible. To summarize, "concepts" in XL are only very remotely related to "concepts" in C++, although, arguably, the XL usage of the word is closer to the standard meaning.
XL generic validation = C++ concepts
Many aspects of XL are a direct consequence of the concept programming design philosophy. For example, XL implemented, since at least 2002, the idea that one can describe how a generic type can be used. This feature is called generic validation in XL terminology. I invite the reader to compare the XL implementation of a minimum function with the C++-with-concepts implementation of the same. This should convince you that the two ideas are basically almost identical. So where are the differences between C++ concepts and XL generic validation? One of them is how the contract is being specified. In C++, you specify the kind of operators and functions that define the concept. For example, you would write something like the following to indicate that a min function requires a less-than operator:
concept LessThanComparable<typename T> { bool operator<(const T& x, const T& y); } In XL, by contrast, you give an example of code that has to compile with the generic type you want to validate. For example, in XL, you would write something like:
generic type ordered where A, B : ordered Test : boolean := A < B Now, as you can see from this simple example, a significant difference is that XL considers the validation to be tied to a generic type, which can then be used to declare a function like Min directly. In other words, since you declared that ordered is generic, Min becomes implicitly generic. By contrast, in C++, LessThanComparable is a kind of predicate that applies to template classes, so you need one additional "connection" using the require statement, to let the compiler relate the T in the definition of min with the T in LessThanComparable. As a result, the C++ code for that example is more verbose and more convoluted. This becomes more visible as the code becomes more complex. Another drawback is that the C++ concept specification as written doesn't work for, say, int because the less-than operator in that case doesn't have the right signature. So you need an additional concept_map in that case, making the code even more verbose, as shown below:
concept_map LessThanComparable<int> { }
One benefit of the C++ approach, however, is that the specification of the concept makes it easier to validate early that the implementation actually doesn't require anything besides what is declared in the concept. For example, if the body of min attempts to refer to an operator that is not present in the concept specification, the compiler may detect this. Doing this with the kind of specification given in XL is much more complicated. I am considering various ways to fix this problem, which is much easier in XL since practically nobody uses it yet.
Multitasking and ThreadsC++ 0x also adds standard support for threads. In my opinion, it is ironic that they manage to shoe-in support for a thread model that is so "last century". Today, the difficult problem is not threading on a SMP system, but threading on non-uniform architectures, for example threading between a CPU and a GPU, or between the components of a Cell microprocessor, or threads that cooperate on machines with different architectures across the Internet. This kind of problem is much more complicated, and is already, to some extent, solved by other languages such as Java or Erlang. At this point, XL has little to offer in that space, because what is needed is not coded yet. However, I am confident that XL's extensibility will make it easy to implement not one, but a multitude of tasking models. Among the top candidates are rendez-vous based mechanisms similar to Ada, message-passing protocols similar to Erlang, or data-driven parallelism similar to several functional languages. Stay tuned.
Variadic templatesC++ 0x will, at long last, implement variadic templates. This feature will make it possible to write functions that take a variable number of arguments, yet are type-safe. This is, again, something that existed in XL since 2001 or earlier. You can see that XL implementation of the Max function takes advantage of this feature. The C++ implementation is more complete, however, as it makes it possible to create not just variadic functions, but also variadic classes. This is something that is planned, but not currently implemented in XL.
Range-based iterationsA new range-based iteration mechanism was also added to C++ 0x. XL has a more general form of iteration, that already covers this specific case. Here is for example how for loops are declared in XL:
iterator IntegerIterator(var It : integer; Low, High : integer) written It in Low..High is It := Low while It <= High loop yield It := It + 1 The notation It in Low..High is how you will invoke the iterator, and the yield statement in the iterator is where the body of the loop will go. The usage of the iterator is very natural:
for I in 1..5 loop for J in 1..I loop WriteLn "I=", I, " and J=", J The benefit of this more general approach is that you can for example define two-variable iterators:
iterator MatrixIterator ( var I : integer; LI, HI : integer; var J : integer; LJ, HJ : integer) written I,J in [LI..HI, LJ..HJ] is I := LI while I <= HI loop J := LJ while J <= HJ loop yield J := J + 1 I := I + 1 You can also define iterators over any kind of data structure, using any syntax you need for this particular data structure.
Constant ExpressionsC++ 0x introduces the notion of generalized constant expression. This makes it possible to declare functions that the compiler will be able to evaluate at compile time. Once again, the XL approach is very different. The XL compiler has various phases, implemented as "plug-ins" for the compiler. One of them deals with constant folding (i.e. evaluation of constant expressions). Here is an example showing how to compute factorials at compile-time using that technique. The XL pre-processor also makes it easy to implement compile-time assertions, something that is also a new feature of C++ 0x. The XL implementation, however, will automatically optimize a static assertion if it can evaluate the argument at compile time, instead of requiring a specific keyword.
ConclusionC++ is an extremely complex language, and extending it took a lot of effort. Many of the new features have already existed in XL for a while, and are much easier to implement. However, the implementation in C++ points out some weaknesses in the way things are currently done in XL, something that is fortunately still easy to change that early in the language's life. Note: This article was originally published here.
In addition to the constructors work and various other cleanup, I've also been attempting to bring the Java tests in line with the C tests. This has proven relatively difficult, because Java is not a very practical language as a target of code generation: no pointers, no unsigned types, ... Another problem I'm having now is how to bring back the work I did from the git into Subversion. It turns out that git-svn is unable to follow the majority of branch merges done by the git. This makes it unable to follow complex branch histories, and you have to "squash" the work you did on a branch into the branch you will use for Subversion commits. That is annoying, as it loses a lot of the history, which was the primary benefit I was seeing in this approach. I hope that I will not run into the problem if I avoid cross-branch merges, but this is not entirely clear. In any event, I now have a relatively large set of git history that is tricky to merge back. I'm losing a lot of history data in the process, which is unfortunate.
I had a long week of vacation, which we spent mostly around home. Living in the French Riviera has its perks, like beautiful places to visit during vacation... The only bad news is that I hurt my knee during a hike, and I've been stuck at home for a week, with another week at least of restricted movement. Anyway, vacation was a good time for doing some XL development, and I started attacking a couple rather tough nuts, bugs that had annoyed me for a while. Two of them made it difficult to implement complex numbers the way I wanted. The problem was, put simply, how to initialize a field to zero. How hard can that be? Well, consider that the complex type is generic and depends on a number type. That number type can be integer or real for example (although you could consider intervals and any other kind of fancy possibility). Now, if I'm writing the constructor for complex taking a single value, it's going to look something like:
function Complex (re : complex.value) return complex written re is result.re := re // result.im := 0.0 If I write result.im := 0.0, then instantiation will fail when instantiating complex with integer. Conversely, result.im := 0 will not work for real, because XL has no implicit conversion from integer to real (although you can add one easily). So there are two solutions, neither of which worked correctly. The first one was to leave the code as above, without an initializer, and rely on some kind of default initialization semantics inside constructors. The problem with this solution is that such initializers did not exist. The second solution was to add some new notation that would allow me to explicitly call a constructor, something like:
function Complex (re : complex.value) return complex written re is result.re := re result.im := complex.value(0) Now, it's an explicit call to a constructor, and there is no problem with real(0). There are cases where this second solution is the only one (e.g. the default constructor doesn't work for you). The test for this is here. That was actually a pretty large change set. I embarked into doing that, but decided that I would do that in the GIT to avoid corrupting the SVN database. That way, I could keep multiple work branches, etc. For an introduction on how to use the GIT with SVN, see http://tsunanet.blogspot.com/2007/07/learning-git-svn-in-5min.html. The GIT works pretty well for that, and I was really happy... ... Until the point where I wanted to commit to Subversion. I used git svn dcommit as the blog says. Things started rather well, see revisions after Revision 381... The problem is that I had made a mistake initially, and added some LLVM code that was not ready for prime time. The GIT somehow tried to check-in into SVN a .svn directory, wrecking havoc. git svn dcommit stopped with an error, the SVN state was bad (would not even compile), yuck yuck yuck... It took me a good 45 minutes to sort out the mess on both sides, taking more than I had planned to use during my lunch break, but I think that I'm back on track. I lost a lot of the history on the SVN side, though, having one big massive check-in at the end with the end result instead of the individual steps. The constructor work is not entirely there, but at least it's in a stable state. There are further fixes and other GIT branches that I want to commit at some point, but I will first revisit everything to check that it's safe...
Added support for function pointers in Revision 370. The real problem was support for overloading, as illustrated in this test, reproduced below:
In that example, the problem is with Invoke Foo, which needs to be able to decide which Foo is to be selected (the first one in that case).
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright 2008 Christophe de Dinechin (Blog)
E-mail: XL Mailing List (polluted by spam, unfortunately)