XML-Integrated Languages and Demeter Interfaces

The advent of XML has created strong interest in pattern matching for trees and in techniques for selecting a part of a tree when a match is successful. Both operations (matching and selection) can be made more efficiently by using meta information about the trees. We call meta information about the trees a schema and this leads to schema-aware matching and selection.

Example: Consider an XPath expression .//B//C where the context node is A. We write this also as A//B//C. The pattern matching question is: does a given object (document) satisfy A//B//C? If it does, the selction question is to get the subtree (or its leaves) selected by this pattern. In the presence of a schema the pattern matcher can decide faster in which parts of the document there is possibility for success.

The selection operations are typically embedded in programs. For example, XSLT uses XPath as naviagtion language. More recently, XLinq - Language Integrated Query for XML, http://www.xlinq.net/, a Microsoft project, provides integration of selector operations in C# and VB. IBM also has a project in this space: XML Enhancements for Java (XJ) are a set of extensions to Java that integrate support for XML, XML Schema and XPath into the language. See http://www.alphaworks.ibm.com/tech/xj. There are numerous other projects that integrate XML schemas and XPath-like capabilities with a programming language, e.g. XDuce and CDuce. We call such languages XML integrated languages.

A fundamental problem that all those projects face, but have not addressed yet, is the Demeter Evolution Problem (DEP). Because programs written in XML integrated languages are effectively parameterized by a schema, they may "work" for a family of schema. This feature makes XML integrated languages both attractive and dangerous. Consider a purchase order of some unknown structure that contains items having a price and a quantity and our task is to compute the total price for the purchase order. In Demeter (either DJ, DAJ or DemeterJ), you would use a traversal strategy "from PO to item" (equivalent to PO//item) and at each item object add quantity * price to a total. In XLinq we can program Demeter style as follows, using the descendant axis:

Function ComputeTotal(ByVal PO As XElement) As Double
    For Each Item As XElement In PO.Descendants("item")
      Dim Price As Double = Item.Element("price")
      Dim Quantity As Integer = Item.Element("quantity")
      Total += Quantity * Price
  End Function 
From: http://www.research.microsoft.com/%7Eemeijer/Papers/XMLRefactored.html

Function ComputeTotal is the quintessential example of a structure-shy program which does not hard-wire the schema in the program. See Adaptive Programming for many more examples.

DEP plays a role as follows: The schema for purchase orders might change in such a way that "ComputeTotal" still "works" but produces wrong results. Instead of using PO//item it might be necessary to use a refinement such as PO//Active//item.

The purpose of Demeter interfaces is to mitigate the effects of DEP. This is achieved in two ways: (1) by asking the programmer to write the XML program against a Demeter interface which defines constraints that the schema must satisfy. And (2) by providing tool support to warn the programmer of schema changes that violate the interface and by warning the programmer of changes to the structure of traversals based on transitioning from schema g to g' (even when interface constraints are satisfied).

While Demeter interfaces mitigate the danger in XML programs, they are not perfect and can never be made perfect if we don't want to give up the advantages of structure-shy programming. That is why it is important to report changes in traversal structure when we move from schema G to schema G' (even when the constraints are satisfied or there are no constraints for that particular traversal).

References: trx: Regular-tree expressions, now in Scheme by Ilya Bagrak and Olin Shivers, Workshop on Scheme and Functional Programming, 2004.

Demeter Home Page

Professor Karl J. Lieberherr
College of Computer and Information Science, Northeastern University
360 Avenue of the Arts
Boston, MA 02115-9959
Phone: (617) 373 2077 / Fax: (617) 373 5121

Written Dec. 16, 2005.