grammar book

Sunday, January 25, 2004

To get started, here's the text of a letter I wrote to M about my progress, last Wednes:

I'll just fill you in on what I've tried. I rewrote the Lexer to use different classes as tokens. I've stuck with five token types so far,
        LeftParen 
        RightParen 

        Number 
        Quote - just the ' symbol 
        Symbol - anything that's not a paren, number or quote. 
I've done this because I decided to, taking Greg's lead, treat all builtin functions and user-defined functions in the same way. So their names all get packed into a Symbol token.
The parser accepts these tokens and builds up SExpressions with them. An SExpression is an abstract class with three concrete children:
        Number 
        Symbol 
        SList (since 'List' had too many name conflicts) 
Number tokens are turned into Numbers, Symbol tokens into Symbols. SLists are built from the rest. So, a Quote is actually turned into an SList with the first element as the symbol "quote".
Evaluation of all of this is handled by a VirtualMachine. The VirtualMachine has the job of keeping track of variable bindings, scoping and execution of SExpressions and functions (more on functions in a minute). Every SExpression has an evaluate(VirtualMachine) method, and this is what the VM calls to execute an expression.
When a Number is executed it returns itself. When a Symbol is executed it calls VirtualMachine.lookup() which returns an SExpression bound to that symbol, if it exists. If an SExpression is returned then the Symbol returns it, otherwise it throws an exception about trying to evaluate an unbound variable.
When an SList is executed things are a little different. It does a VM.lookup() on the head of the list (which must be a symbol). Here's where I tell you that there are actually four children of SExpression. Function is the fourth. The SList checks that what it got back was a function, and takes this function and calls VM.evaluation(function, this.rest()). where rest() returns a new SList containing everything but the first element in the list.
Now, here's where I tell you about there being two types of Functions, Builtin and UserDefined. Since a Builtin function has unnamed parameters, whereas a user defined function has named ones, I decided to treat them as seperate cases. A builtin function is evaluated by calling function.evaluate(VirtualMachine, SExpression[]) where SExpression[] comes from the SList.rest() SList, and is treated as the arguments to the function.
Calling a UserDefined function involves the VM creating a new Scope and binding each of the UserDefined function's parameters with the arguments, executing the function, and then popping the stack of Scopes afterwards.
So, the upshot of all of this to evaluate *any* SList you have to have a Function bound to the first Symbol in the list. So the first thing the REPL does is bind the Builtin functions to their symbols. I've created a few: Addition, Car, Cdr, Define, Eval, Quote, and Set.
Addition works by just summing the evaluations of each of its arguments (must be at least 2). Quote returns its arguement. Eval returns the evaluation of its first argument. Set calls vm.bind(firstargument, secondargument). Define creates new UserDefined function and binds it to it's first arguemnt (which must be a Symbol). And so on..
It all seems to be working pretty well. And it'll make creating lambda functions a snap (just like Define except you don't bind the function to anything, and just return it).
Some things I'm not totally comfortable with are the fact that each function gets a name (as a Symbol). This might make sense for builtin functions (which have a set name) but for userdefined function it seems dumb because the function *itself* shouldn't know about it's own binding name. Especially because you can assign a function to a new variable.
For instance,
-> (define y () '(happy)) 
          y 
-> y 
          FUNCTION: y = '(happy) 
-> (set x y) 
          FUNCTION: y = '(happy) 
-> x 
          FUNCTION: y = '(happy) 
So, the define function returns the symbol that was just defined. When a UserDefined function prints itself it prints "FUNCTION: <name> = <body>", so that's why you see that when I entered "y" (since y returned the function it was bound too).
The set function returns the function that was just defined (I don't know why, it probably should return the symbol or something).
When I typed "x" I got back the function, but the function's name wasn't updated when I bound it to x. So this whole UserDefinedFunction naming bit is a little screwy.
The other thing that's a little wonky is that the parser only consumes enough tokens to create a single SExpression each time parse() is called. So if you type more than one SExpression on a line, you only get the first one evaluated. But if you enter a blank line after that, then you get the second one evaluated (because the tokenizer tokenizes the entire line). Hence,
-> 'a 'b 
          a 
-> 
          b 

# posted 7:01 PM.

A very dormant blog. I'll wake it up by sending it news on my work following along with the book, "Programming Languages: An interpreter-based approach", by Samuel N. Kamin.

# posted 6:58 PM.

grammar book

Sunday, January 25, 2004

Links

archives