XML Schemas and UML Class Diagrams

Draft May 18, 2000

XML and UML are both  important technologies and their optimal integration requires some analysis. We discuss our point of view on integration of XML schemas and UML class diagrams. Our view is that either

It is unfortunate that the XML schema notations cannot express what is common in UML class diagrams: names of parts (instance variables or data members). It is unfortunate that the UML class diagram notation cannot express what is common in XML schemas: the ordering of the parts. This mismatch could have been easily avoided so that there would be one common notation to express XML schemas as well as UML class diagrams.
How such a common notation could look like is explored in this paper.

This note is organized as follows. First we compare XML schemas and UML class diagrams and then we sketch translation algorithms in both directions.

defines object set defines document set plays grammar role
XML schema partially*** yes yes
UML class diagram yes no no
*** XML does not support naming of the parts with other names than the type names. If all part types per class are unique, then the schema defines an objects set where all parts can be uniquely identified by name.

Because both XML schemas and UML class diagrams are defining sets of objects, the natural question arises: how should an organization manage its XML schema repository and its UML class diagram repository? From the above discussion it becomes apparent that there is a danger of significant overlap between XML schemas and UML class diagrams that are needed for the same application.

Because XML schemas are more expressive than UML class diagrams, it would be natural to start the modeling with XML schemas. There are however currently some deficiencies in the XML schema notations that don't make it an ideal object modeling notation. But there are workarounds available and we expect those deficiencies to disappear over time (written in May 2000). It should be noted that the XML schema notation should also be used to describe "functional" objects, like visitor objects, that are not directly related to business concepts. The XML schema should be written with the intent that it will be used to implement the functionality of the application and not just to describe the structure of the business data.

XML schemas have the essential capabilities to model class structures. The essential capabilities are:

To consider the advantages and disadvantages of XML schemas as an object modelling notation, consider prefix expressions. A prefix expression (Exp) is either Simple or Compound. A Simple expression is just a Number. A Compound expression consists of an Operator and two arguments that are themselves expressions (Exp). An operator (Op) is either an addition operator (Add) or a multiplication operator (Mul). In the following we use a tool called  XML Authority from Extensibility Inc. to create schemas graphically and to print them in a schema notation known as DTD. The above description of prefix expressions is expressed by the following schema:

<!ELEMENT Compound  (Op , Exp , Exp )>
<!ELEMENT Simple  (Number )>
<!ELEMENT Exp  (Simple | Compound )>
<!ELEMENT Op  (Add | Mul )>
<!ELEMENT Number  (#PCDATA )>

The following is an example of a document that describes the prefix expression * 3 5 (3*5 in ordinary notation).

<Compound><!-- (Op , Exp , Exp )-->
 <Op><!-- (Add | Mul )-->
 <Exp><!-- (Simple | Compound )-->
  <Simple><!-- (Number )-->
 <Exp><!-- (Simple | Compound )-->
  <Simple><!-- (Number )-->

The diagram shown above is very close to a UML class diagram for representing prefix expressions. The nodes are classes and the edges show relationships between classes. The rectilinear connections show directed associations from left to right. The connections from Op to Add and Mul and from Exp to Simple and Compound are inheritance edges. There is one detail missing (besides the missing edge from Compound to Exp): the association ends are missing in the schema. We would like to say that a Compound expression has two subexpressions called argument1 and argument2 and both being of type Exp. But unfortunately, we cannot express this in the schema while it can easily be expressed in a UML class diagram. The workaround would be to introduce two extra elements, called Argument1 and Argument2 und to define them to contain an Exp. This introduces two extra nodes and two extra edges in the schema which is not so nice. We call this problem the PartNaming problem of XML schemas.

Besides the PartNaming problem that creates systematic differences between XML schemas and UML class diagrams there is the ObjectLinking problem that also creates differences. Consider the following XML schema that describes a network of partners using a graph structure with labels on edges (LinkInfo) and nodes (PartnerInfo).

The above schema defines documents that define Partner structures referring to partners using the PartnerId in the PartnerLinkInfo objects. In a UML class diagram, we would like to represent the Partner structures as a linked structure which means that  PartnerId in PartnerLinkInfo should be replaced by Partner.

XML schemas have the essential capabilities to model class structures. They can be translated to UML class diagrams by using the following systematic process:

  1. Eliminate superfluous elements introduced because of the PartNaming problem.
  2. Introduce object linking where it is needed for efficiency reasons. The linked objects can be automatically created from the parsed XML documents by using a suitable tool.
  3. Identify the abstract classes in the XML schema and mark them in the UML class diagram.

From UML class diagrams to XML schemas

It is useful to translate UML class diagrams to XML schemas provided the UML class diagrams have been written with the purpose in mind that they will play a grammar role. Diagrams written with such an intent can be easily translated. What are the restrictions that a UML class diagram must satisfy so that it is easily translated into an XML schema.

The translation algorithm roughly proceeds as follows: Flatten the UML class diagram, i.e., push all parts of abstract classes down to concrete classes. Replace undirected associations by two directed associations.Follow the ordering of parts and the ordering policy. Concrete classes are translated into a schema element A of the form (B1, B2, B3, ...). Abstract classes are translated into a schema element of the form A (B1 | B2 | B3 | ...). An optional part B is represented as B?. A repeated part B is represented as B+ or B*.


Given the current state of the art of XML and UML technology, it seems useful to develop an integration of XML schemas and UML class diagrams. The combined notation should start either with an XML Schema Notation or the XMI notation (or similar notation) for class diagrams and extend it with the missing information. If we start with XML schemas, we need to add part names. If we start with UML class diagrams we need to add ordering information and we need to follow a certain style.

Because it is easier to add parts to an XML schema notation than to add more information to a UML class diagram, we prefer to take XML schemas as the starting point of a design. But to start with UML class diagrams also makes sense.

Demeter Home Page

Professor Karl J. Lieberherr
College of Computer Science, Northeastern University
Avenue of the Arts
Cullinane Hall, Boston, MA 02115-9959
Phone: (617) 373 2077 / Fax: (617) 373 5121