Pseudo-Inherited Attributes in MGrammar (@November 2009 CTP)  

Saturday, December 26 2009

The M language and in particular MGrammar have caught my attention recently after completing Programming Languages and Translators this past semester at Columba. I didn’t take compilers as a part of my undergraduate studies at McGill and after completing Professor Aho’s course I am really intrigued by the possibilities offered by DSLs and DSVs.

After getting setup with .NET 4, the SQL Modelling November 2009 CTP, and Intellipad my first inclination was to see how I could implement some classic ideas in language theory with MGrammar. One such idea is “inherited attributes.” An inherited attribute is a value that is captured or calculated at a parent node in an abstract syntax tree and then passed down to child nodes for any calculations or translations that they may need to perform. A classic example of this would be the “with” keyword that is available in Javascript:


with (myObj) {
  x += 3;
  y += 3;
}

This would be equivalent to:


myObj.x += 3;
myObj.y += 3;

Let me know if you run into any issues with the behavior by emailing me: charlie [you know] robbins [obviously] gmail [you know] com. As always, all of this code is available under the MIT license on GitHub.

This may look trivial at first, but let’s take a closer look at a possible abstract syntax for the first production:

With Statement AST

You can see from the AST that by the time anyone walking the AST reaches the node for the statement “x += 3” they do not have the context for that statement and cannot generate the code for the second example “myObj.x += 3.”

The solution to this problem is for the parent node to pass down the contextual information to children nodes for later use. After reading the “M” Modeling Language Specification, I found a solution that I am calling a pseudo-inherited attribute. I prefix it with “psedo” because from what I have read in the language specification, true “inherited attributes” are not possible. Consider what Microsoft calls “Rule Parameters” in the M language (from Section 5.5):


A rule may define parameters which can be used within the body of the rule. A single rule identifier may have multiple definitions with different numbers of parameters. The following example uses List(Content,Separator) to define List(Content) with a default separator of ",".

module HelloWorld {
  language HelloWorld {
    syntax Main
      = List(Hello);
    token Hello
      = "Hello";
    syntax List(Content, Separator)
      = Content
      | List(Content,Separator) Separator Content;
    syntax List(Content) = List(Content, ",");
  }
}

Did you notice the problem? In order to access anything passed down via a Rule Parameter one passes a rule definition, not the value captured by a particular instance of a syntax or token rule. Not following me? Let me illustrate with a more concrete example:


syntax Main = with:WithStatement(Id) => with;

syntax WithStatement(Context) 
  = "with" context:Context '{' statements:Statement* '}' => With { Context { context }, Statements { statements } };

syntax Statement
  = id:Id op:AssignmentOperator val:Number => Statement { Id { id }, Operator { op }, Value { val } };

token Id = (Upper | Lower | '_') (Upper | Lower | Digit | '_')*;

This seems like it performs the task that described above, but unfortunately it does not. Consider the input:


with myObj {
   x += 3;
   y += 3;
}

Which produces:


With{
  Context => "myObj",
  Statements => [
    Statement{
      Id => "x",
      Operator => "+=",
      Value => "3"
    },
    Statement{
      Id => "y",
      Operator => "+=",
      Value => "3"
    }
  ]
} 

We have successfully stored our contextual information, but we were unable to pass it down to the child node! We could modify the productions like so:


syntax WithStatement(Context) 
  = "with" context:Context '{' statements:Statement(Context)* '}' => With { Context { context }, Statements { statements } };

syntax Statement(Context)
  = id:Id op:AssignmentOperator val:Number ";" => Statement { Id { id }, Operator { op }, Value { val } };

You can see that we have added a Rule Parameter named Context to the Statement syntax rule so that we may pass our contextual information down. This is where the breakdown occurs. In the new WithStatement notice that we pass “Context” and not “context” as a parameter to the Statement rule. That is, we are passing down the definition of the rule, and not the value that was captured at our level of the tree.

This kind of parameter passing is available in ANTLR quite with a reasonable straight forward syntax:


withStatement 
    : 'with' ID '{' (statements += statement($ID.text))* '}'
      -> ^(WITH ^(STATEMENTS $statements*)
    ;

statement[String with]
    : id = ID op = ASSIGNMENT_OPERATOR val = NUMBER ';'
    ;

You can see that the withStatement rule passes down the text matched against the ID lexeme. This value can later be consumed by the statement rule without requiring it on the left hand side of the production.

I have big hopes for the M language and MGrammar, but I hope that Microsoft can be more forthright with how to implement classic language translation patterns as well as what’s under the hood. After some searching I discovered that the MGrammar files generate a GLR parser, but how about some more refined concepts such as the one I’ve discussed here?

As always this code is available on GitHub. If this interested you, check back soon. There is an upcoming series of posts comparing ANTLR and MGrammar in depth as part of the implementation of the new Celluloid programming language.


  • Posted by Charlie Robbins

Post a comment


(required, but not displayed)

(optional)