Home | About | Partners | Contact Us

SourceForge Logo

Quick Links
Home
News
Status
Building XL
XL Mailing List

Understanding XL
Conceptual overview
XL examples
Inside XL
Concept Programming

In depth
Browse GIT
Bugs
SourceForge Info
Contact

Other projects
GNU Project
The Mozart Project

XLR: Extensible Language and Runtime

The art of turning ideas into code

The XL Scanner

Up

Next: The XL Parser

The XL scanner takes a sequence of characters from a file and turns it into a sequence of tokens. It is implemented in the module xl.scanner.

XL scanning is quite simple. There are only five types of tokens:

  1. Integer or real numbers, beginning with a digit
  2. Names, beginning with a letter
  3. Strings, enclosed in single or double quotes
  4. Symbols, formed by consecutive sequences of punctuation characters
  5. Blanks and line separators

NUMBERS: Numbers can be written in any base, using the '#' notation: 16#FF. They can contain a decimal dot to specify real numbers: 5.21. They can contain single underscores to group digits: 1_980_000. They can contain an exponent introduced with the letter E: 1.31E6. The exponent can be negative, indicating a real number: 1.31E-6; 1E-3. Another '#' sign can be used before 'E', in particular when 'E' is a digit of the base: 16#FF#E20. The exponent represents a power of the base: 16#FF#E2 is 16#FF00 Combinations of the above are valid: 16#FF_00.00_FF#E-5.

NAMES: Names begin with any letter, and are made of letters or digits: R19,Hello. Names can contain single underscores to group words: Big_Number Names are not case- nor underscore-sensitive: Joe_Dalton=JOEDALTON

STRINGS: Strings begin with a single or double quote, and terminate with the same quote used to begin them. They cannot contain a line termination. A quote character can be embedded in a string by doubling it. "ABC" and 'def ghi' are examples of valid strings.

Note that the type associated with strings of characters is called text, not string.

SYMBOLS: Symbols are sequences of punctuation characters other than a quote that are not separated by spaces. In symbols, the underscore is a significant character. Examples of valid symbols include ++ , ---> %-% Symbols are normally made of the longest possible sequence of punctuation characters (being terminated by any space, digit, letter or quote). However, the six "parenthese" characters ( ) [ ] { always represent a complete symbol by themselves.

Examples: ---X is the token --- followed by the token X --((X)) is the token -- followed by two tokens ( followed by the token X followed by two tokens )

BLANKS: In XL, indentation is significant, and represented internally by two special forms of parentheses, denoted as 'indent' and 'end'. Indentation can use space or tabs, but not both in the same source file.

COMMENTS: The scanner doesn't decide what is a comment. This decision is taken by the caller (normally the parser). The Comment function can be called, and skips until an 'end of comment' token is found. For XL, this is under-utilized, since an end-of-comment is always an end of line.

Up

Next: The XL Parser


Copyright 2008 Christophe de Dinechin (Blog)
E-mail: XL Mailing List (polluted by spam, unfortunately)