grammar book

# posted 10:12 AM.

XAST

So the task at hand is to use XSL operations on the AST of a java program to expand macros in that program. In order to do this we need an XML representation of an AST. Last time I mentioned using JSE's parse tree. Here's what I came up with last night.

Consider the following program, with the macro 'unless',

unless (world.isOver()) { System.out.println("The world is not over, yet."); }

A reasonable macro expansion would be,

if (!world.isOver()) { System.out.println("The world is not over, yet."); }

To get things started, I first created an XMLReader which serializes JSE's AST. Its output for the above program looks like so,

<Program> <Identifier>unless</Identifier> <Parens> <Identifier>world</Identifier> <Dot/> <Identifier>isOver</Identifier> <Parens/> </Parens> <Braces> <Identifier>System</Identifier> <Dot/> <Identifier>out</Identifier> <Dot/> <Identifier>println</Identifier> <Parens> <String>The world is not over, yet.</String> </Parens> <Semicolon/> </Braces> </Program>

As you can imagine, the above XML leaves out much of what the AST contained. That is, I didn't bother outputting source coordinates, and some semantic information (for instance, nodes on the AST knew whether they represented the end of a clause or not). For now, I decided to just spit out enough to allow me to reconstituted the original java program.

Also notice that the tree is pretty well unstructured. Only the most fundamental bits of the syntax rules (braces, parenthesis and brackets must match) are reflected in the structure. Maybe this is the most we can hope for, since we don't know anything else about the semantics of program until we start matching macros?

Transforming

I decided to use XSL to convert the above tree into a similarly unstructured representation of the program, but with the macros expanded. That is, I decided that the XSL will output an XMLized AST that is identical to the AST of the expanded program. In other words, nothing fancy is going on..

So, by default I structured the XSL so it visits every node in order, from top to bottom, depth first, and outputs the nodes as it finds them. The template for the 'unless' macro matches at the point of reaching the unless identifier, by using the following XPath,

Identifier[. = 'unless']/*[1][Parens]

This says, "match the identifier 'unless' which is followed immediately by a pair of parenthesis". The output of the template is like this,

<Identifer>if</Identifer> <Parens> <Identifier>!</Identifier> ... expansion of the original expression ... </Parens> ... expansion of what follows the 'Parens' node ...

If I was to write this template in JSE notation it would look like this,

case #{ unless (?:expression) }: return #{ if (!?:expression) };

Actually, my template is a little more general, since it doesn't care whether the 'unless' expression is actually an expression, but only that it is followed by parentheses. This points to something larger. Using the current representation of the source code, I can't easily do much better. That is, I can't easily express, in XPath, the fact that inside the parentheses I should find an expression. Why? because expressions aren't marked as expressions in the AST, instead they're represented as unstructured goo. This came up again when I tried to write a template for a more complicated macro...

Traversing properly

The last thing I want to point out about this approach is an implementation detail. In order to get the proper traversal behaviour I created a general template like so:

<xsl:template match="*"> <xsl:element name="{name()}"> <xsl:apply-templates select="child::*[1] | text()"/> </xsl:element> <xsl:apply-templates select="following-sibling::*[1]"/> </xsl:template>

This captures the "top down, depth-first, and output each node" behaviour. But notice that it does it in a peculiar way. Each time we match an element we explicitly call apply-templates to match its next sibling. This is in opposition to calling apply-templates on all of its children.

We do this is because there is an interplay between a macro transformation and the default traversal of the XSL. We don't know ahead of time whether a child of an element needs to be expanded until we've inspected the child's previous siblings. That is, when doing a macro transformation we need to add bits into the result tree and we need to control how and where different bits of the source tree are further expanded.

For instance, without the 'unless' macro, the default behaviour is to visit the 'unless' identifier, then visit the following 'Paren', then the 'Braces', and so on. But this isn't what we want to have happen when we transform the 'unless' macro. What we want is to output an 'if' and 'not' operator, then visit the inside of the 'Paren', and then visit the following 'Braces'.

So this is what the 'unless' template turned out looking like,

<xsl:template match="Identifier[. = 'unless']/*[1][Parens]"> <Identifer>if</Identifer> <Parens> <Identifier>!</Identifier>  <xsl:apply-templates select="*[1]/*[1]"/> </Parens>  <xsl:apply-templates select="following-sibling::*[2]"/> </xsl:template>

Blah.

Questions and thoughts

So, I have to say my head is slightly spinning with all of this. I could go on and on, but instead I'll just jot down a few questions and thoughts I have.

Q: Matching on type information.. Hard to do. It's hard, because we don't have any type information in our tree, as I mentioned above. How would we write a template for the JSE macro template,

using (?:type ?:name = ?:expression) ?:statement

The best we could with an XPath seems to be something like,

Identifier[. = 'using'] [following-sibling::*[1][name() = 'Parens'] [Identifier[1] and Identifier[2] and Identifier[3][. = '='] and *[4] and not(*[5])] ]

Yuck. This doesn't work entirely because *[4] matches any single element, so it could be a set of Parens, Braces, Braces, Comma, Semicolon, Identifier, ... Yuck, again. And how do we match the following statement?

Q: Macro uses expanding to macro uses... Imagine the following macro (expressed as a JSE template),

case #{ unless (?:expression) ?statement; }: return #{ if (!?:expression) { using (...) { ... ?statement; } } };

Well, whatever, the basic idea is that the 'unless' macro transformer produces the use of a 'using' macro. How do we deal with that? In XSL-speak, the bit of the tree holding the 'unless' expansion is the result tree, and a template is not allowed to operate on the result tree. To operate on the result tree (conceptually) we could run the stylesheet again over its own output. But what if a macro use expanded to a macro use which expanded to a macro use...?

# posted 4:52 PM.

Structured change detection

yum.

# posted 12:59 AM.

Jejune Ejen

yep, Ejen does the whole pipeline of XSLT thing pretty well, and they do have a component for generating SAX events from an ANTLR parse tree, which all seems like a good idea.. but the real headscratching i'm doing seems to be over what sort of representation of the parse tree makes sense to operate on. i won't go into detail right now, but the gist of it is that since we are trying to operate on a parse tree of a program that can contain new constructs (to be expanded as macros) we need to have a principled way to represent those new constructs without knowing anything about the macro syntax itself.

JSE parses and matches successfully , so i'll have to look more into how they do it, and maybe use their AST (of course, their matching strategy might be more complex than what can be done nicely with XSL). hrm...

# posted 9:58 PM.

out there *

What else is out there on using XSL to generate code (in this case, generate code by expanding macros)? Lots. GIYF. Try searching for XSL code generation or something. And then take a look at Ejen. I quote,

[Ejen] allows the use of almost any kind of text file as input, without the need of developing lengthy and complex Java extensions. The use of the Antlr compiler compiler, when corresponding grammars ("Chomsky grammars") are provided, allows the transformation of those files into XML tree representations. Resulting XML trees may then be used in the generation process, as well as other native XML files.
Ejen is not therefore by itself a code generator. It is rather a system based on Java/XML/XSLT technologies that makes the actual creation of code generators easier and faster.

Coolness. It's always fun (though, I have to admit, slightly disappointing) to find a big chunk of the work started.

* the only reason I've taken to putting titles on my posts (something I find pretty silly) is that the hacked rss stream (look left) takes the first line as the title.

# posted 6:51 PM.

the story so far...

this hasn't been the week to blog, has it? I'm going to try to run down everything i've been up to over the past week, very quickly. i've been looking in to JSE, the Java Syntactic Extender. It's trying to do what define-syntax does for Scheme. I played with the JSE, b uilt a few silly macros, and looked into its implementation a bit. I downloaded ANTLR and created a small program to dump its parse tree to XML. Built a silly plugin to eclipse to dump it's internal AST to xml (just handled method calls so far. having no experience writing a plugin i was impressed that i could do it in the hour before i left for work one morning. eclipse + 1). Been reading more about syntax-rules implementations in scheme , as well as interpreter implementations (i'll post the links tonight) -- in particular I've been looking at how SISC (a java implementation of R5RS scheme) is put together.

Been thinking about being able to manipulate java more like how one can manipulate scheme. that is, where the program and all of the bits that make it up could be treated as first class objects. Hopefully I'll take the time tonight to post some examples.

Oh, and what's up with moon?

# posted 9:08 AM.

grammar book

Monday, March 29, 2004

Saturday, March 13, 2004

XAST

Transforming

Traversing properly

Questions and thoughts

Sunday, March 07, 2004

Structured change detection

Thursday, March 04, 2004

Jejune Ejen

Tuesday, March 02, 2004

out there *

Monday, March 01, 2004

the story so far...

Links

archives