Constraints (input)

What We Will Cover

Last Week: Examples as Input

Last week we took example instance documents as input, and we produced DTD and Schema descriptions as output. This is a pretty reasonable way to design XML, particularly if you need to convert or otherwise work with "legacy data" which will be (perhaps "batch-mode") converted to XML form.

This Week: Examples as Output

This week we are going to reverse the paradigm,. This week we are going to use a tool to develop Schema as descriptive input, and we are going to produce example instance documents as output. This is also a pretty reasonable way to design XML, particularly if you want to start out trying various approaches to the data structure. With this approach, you can get "reality checks" along the way by generating example instance documents. You can examine examples periodically during the design phase, and test to see if they "make sense".

DTD and XML Schema

XML Schema is an XML vocabulary to describe XML documents. Using XML to describe XML is easier to work with than using the idiosyncratic notation and vocabulary of DTD. XML Schema "subsumes" (provides the same functionality as) ELEMENT and ATTLIST from DTD.

DTDs are primarily based on tag names, while XML Schema is based primarily on type definitions. This simpifies working with / adapting to existing ("legacy") systems. DTDs are blind to namespace issues, while XML Schema is very much aware of namespaces. This is how we are able to build compound documents in XML, which is (as we shall see later in the quarter) a very powerful capability to have.

Because XML Schema documents are just XML documents, we can use the same technology to parse and manipulate XML Schema documents as we use with any other XML instance document. This makes building and using tools much easier.

This week we are going to focus on XML Schema. From this point forward we will use XML Schema and not DTD to define our documents.

A Side Note

Why do I keep writing "documents" instead of simply documents? To remind you that an XML "document" is simply an input stream, and may come from anywhere you can get an input stream. An XML "document" may or may not actually be what we think of as a document in everyday speech. I will stop putting quotes around document from this point forward, assuming you will keep in mind that an "instance document" may or may not actually come from a "document", per se.

Outline

  1. XML Schema Basics
  2. Declaring Elements
  3. Simple Types
  4. Complex Types
  5. Content Models & Particles
  6. Attributes
  7. Extensibility
  8. References & Uniqueness Constraints
  9. Exercises
  10. Readings

Schema Basics

If we are using XML Schema, remember there will always be at least two documents involved:

  1. at least one XML instance document
  2. at least one XML Schema document

You can think of the relationship between an XML instance document and the XML Schema associated with it as like the relationship between an instance of an object-oriented programming object and the class that defines that object. An XML Schema describes the structure and constraints (types) of an XML instance document.

Namespaces

The Schema spec itself uses namespaces:

Open up your XML tool now, and tell it to create a new (empty) XML Schema. You should get something like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

</xs:schema>

This is saying:

Optional Target Namespace

You can also (optionally) declare a targetNamespace for a Schema.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="urn:schema-library-org:holdings">

</xs:schema>

This would allow you to associate a corresponding XML instance document with something like the following:

  <myLibrary xmlns="urn:schema-library-org:holdings" />

We probably will not use that option very often in this course.

[ top of page ]

Declaring Elements

Element declarations are pretty simple. Global element declarations happen within the schema element. Local element declarations happen as part of a type declaration or as sub-children of other element declarations. Taking another look at our Library problem from last week, we might see the following:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified">
  <xs:element name="library">
    <xs:complexType>

...

    </xs:complexType>
  </xs:element>
</xs:schema>

library is "global" (local to schema), and any elements defined within library are local to library.

[ top of page ]

Simple Types

There are a lot of very useful simple types defined by the XMLSchema namespace. With no default namespace declaration (often the case), you refer to the simple types and other things defined in Schema with a namespace prefix. In the above declaration we said the xs: prefix is associated with things defined in Schema, so we use those things with the xs: prefix. Looking back again at our Library problem from last week, here is what a grossly over-simplified Schema might look like:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="book">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string"/>
              <xs:element name="author">
                <xs:complexType>
                  <xs:attribute name="last" type="xs:string"/>
                  <xs:attribute name="first" type="xs:string"/>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Notice how in the library element we have defined a child element called title as being of type xs:string.

[ top of page ]

Complex Types

Complex types are even more interesting, as they can include a mix of (complex and simple) child elements and attributes. In the above example, a library consists of a sequence of book elements, which in turn contain other elements, one of which in turn contains attributes.

[ top of page ]

Content Models & Particles

Complex types are composed of particles. A particle is just one of the following:

  1. a local element declaration
  2. a reference to a global element declaration (element)
  3. a compositor (sequence or choice or all)
  4. a named group (group)
  5. an element wildcard (any)

Usually, anywhere one of those five is allowed the others are also allowed.

[ top of page ]

Attributes

An attribute must be of a simple type. Note in the above example how the author element has two attributes: last and first, both of simple type string from the Schema (in this case, using prefix xs) namespace.

Remember, when an XML parser gets the beginning of an element it also gets the attributes associated with that element. It does not get the child elements or their attributes at that point, but its own attributes available are available right away.

[ top of page ]

Extensibility

You may need to slightly modify an existing Schema to use it in a particular context. You can extend without modifying the original. There are several approaches to this:

Wildcards

You can use the any "content model particle declaration" to allow the insertion of an unanticipated element, and you can use one and only one anyAttribute declaration.

Extention & Restriction

We will look at these in depth this term. You can use namespace prefixes to build a document which extends the schemas it knows about (remember the namespace declarations?).

[ top of page ]

References & Uniqueness Constraints

The types ID and IDREF give you a simple way to uniquely identify things inside an XML document. You can also do this with XPath expressions, and we will look into that more this term.

You might define an element to include an attribute like this:

  <xs:attribute name="myName" type="xs:ID"/>

You can say the following elsewhere to demand a reference to a specific named element:

  <xs:attribute name="myTarget" type="xs:IDREF" use="required"/>

Something of type IDREF always refers to something else with a unique ID.

[ top of page ]

Exercises

Refactor the Library from Last Week

Last week you started with building an example instance of a library book, then generalized it to cover all library holdings (CDs, DVDs, tapes, art, etc). You then had the tool auto-generate a Schema. This week we are going to redesign it, starting with a Schema and using draft versions of it to generate example instance documents (reversing the paradigm).

Open your XML editor tool and use it to redesign you basic book Schema. Think again about what should be attributes and what should be elements. Think again if elements should be required or optional (using minOccurs), if elements should appear only once or may appear more than once (using maxOccurs), and if all the list of child elements must be present (sequence with default minOccurs and maxOccurs of 1), or if some elements may be missing in a particular book (combining all with a minOccurs of 0).

Here is an example from our over-simplified library... with a Schema that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified">
  <xs:element name="library">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="book">
          <xs:complexType>
            <xs:all>
              <xs:element name="isbn" type="xs:string"/>
              <xs:element minOccurs="0" name="title" type="xs:string"/>
              <xs:element name="author">
                <xs:complexType>
                  <xs:attribute name="last" type="xs:string"/>
                  <xs:attribute name="first" type="xs:string"/>
                </xs:complexType>
              </xs:element>
            </xs:all>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

you can have an instance document that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<library xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="http://www.it.rit.edu/~jxs/classes/2006_Winter/770/03_week/examples/library_01.xsd">
  <book>
    <isbn>isbn0</isbn>
    <author first="Fred" last="Flintstone"/>
    <title>Somewhere over the Rubble</title>
  </book>
  <book>
    <isbn>isbn1</isbn>
    <author last="Flintstone" first="Wilma"/>
  </book>
</library>

Notice in the Schema the use of xs:all in defining the element book. Notice also how the element title is a child of book, and has a non-default minOccurs attribute value of 0 (zero). Using all means the child elements can occur in any order (unlike sequence, where they must appear in exactly that order).

Rebuild your libraries Schema from scratch in the XML tool. Save it to the usual exercises sub-directory structure for your account on grace. Generate an instance document containing at least five (5) books, which shows the in-any-order behavior described above, and which contains at least one book without an optional element. Yes, I know this is an artificial exercise. Think of it as practice.

Design a Phone Message Queue/Log

Look back at the phone message examples from week 1. Build a Schema that could serve to describe a queue or list of waiting-to-be-dealt-with phone messages. Include both mandatory and optional elements and attributes in your Schema. Allow entry of child elements in any order. Generate an instance document with at least five (5) messages in it, demonstrating that these specs have been met.

Design a Recipe & a Recipe Book

Think about what a recipe (say, for a chocolate cake) consists of. Design a Schema for a recipe. Now design a Schema for a whole book of recipes. Save them both to the usual exercises sub-directory structure for your account on grace. Generate example instance documents for both, and save them to the same place. Decide which makes more sense to you, a recipe Schema or a recipeBook Schema. Write up a short (3-5 paragraphs) pure-text note explaining why you made the choices you did and which Schema makes the most sense to you for a library. Post it into the usual exercises sub-directory structure for your account on grace.

[ top of page ]

Readings


revalidate the HTML revalidate the CSS Pastafarian Flag

Last modified: 2 Sep 2007 12:28:10 PM