1 RS-XML, WebIt!'s XML Datatype

1.1 Beginning WebIt!

The foundation of WebIt! is an implementation of the XML Infoset as an abstract datatype (ADT), named RS-XML. RS-XML combines an ADT based on structures with a set of functions for constructing XML elements and attributes.

Below is a simple illustration of the use of RS-XML (from a bibtex-like XML document):

(article
 key:
 "IU-SR:Fel:91a"
 author:
 "Matthias Felleisen"
 title:
 "On the Expressive Power of Programming Languages"
 journal:
 "Science of Computer Programming"
 year:
 "1991"
 (link
  url:
  "ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/express.ps.gz"
  format:
  "ps"))

article and link are functions which construct an XML element. "Keywords", such as author: and title: , construct XML attributes. Typically, I write functions which provide abbreviations of common sets of constructors, for example to abbreviate the link element shown above as:

(define (ps-link where) (link url: where format: "ps"))

to allow one to write

(ps-link "ftp://ftp.cs.indiana.edu/...")

The schema for the above language also generates predicate functions for testing an XML element (e.g. article? and link? ) and for accessing attributes (e.g. &author? , &title? , etc). For example, (&title? felleisen-article) will return the string "On the Expressive Power of Programming Languages" . Were no title attribute present in that element, #f would be returned.

Here is another example, this time of an HTML fragment:

(h4:li
 (h4:a
  h4:href:
  "http://library.readscheme.org"
  "Bibliography of Scheme-related Research"))

In this example, the element and attribute constructors begin with "h4" (referring to HTML 4.0). While this resembles an XML namespace prefix, this is actually unrelated. This is done only to provide distinct Scheme function names.

To use the HTML contructors, include the HTML library supplied with WebIt!:

(require (lib "html.ss" "webit"))

WebIt! also provides libraries for the above "Bibtex-like" XML notation, XML Schema, and Scalable Vector Graphics (SVG).

1.2 Data Types of RS-XML

At a high-level, the RS-XML types look like:

(define-struct xml-element (tag attributes contents))
(define-struct xml-attribute (tag value))

This allows the use of the following predicates and accessors to manipulate XML:

   xml-element? : node -> boolean
   xml-element-tag: xml-element -> sym or #f
   xml-element-attributes: xml-element -> listof xml-attribute
   xml-element-contents: xml-element -> listof (xml-element 
                                          or string or xml-comment)
   xml-attribute? : node -> boolean
   xml-attribute-tag: xml-attribute -> sym
   xml-attribute-value: xml-attribute -> string

In RS-XML, PCDATA nodes are represented as strings.

To write XML content, two functions are provided:

   write-xml : xml-node [port] -> void
   display-xml : xml-node [port] -> void

The function write-xml is intended for producing documents for the web, or for parsing by other applications. display-xml is suited for producing XML for viewing by humans. Extra whitespace is inserted for presentation, for instance, to indent child elements.

1.3 Namespaces in RS-XML

Here is a simple RS-XML fragment of an HTML document:

(h4:li
 (h4:a
  h4:href:
  "http://library.readscheme.org"
  "Bibliography of Scheme-related Research"))

In this example, the element and attribute constructors begin with "h4" (as in HTML 4.0). While this resembles an XML namespace prefix, this is actually unrelated. This is done only to provide distinct Scheme function names.

At a high-level, the RS-XML types look like:

(define-struct xml-element (tag attributes contents))
(define-struct xml-attribute (tag value))

If we were to support only local naming, this would be enough. To support namespaces, we add three additional fields: the local name ("print-tag") and the namespace uri of which this node is a member ("target-ns"). (The third field, "ns-list" is described at the end of this section.)

(define-struct
 xml-element
 (tag print-tag target-ns ns-list attributes contents))
(define-struct xml-attribute (tag print-tag target-ns value))

The tag field is always used to store a symbol which includes the expanded name (the concatenation of the namespace uri and the local name, separated by a colon). This is used to support namespace-aware predicates.

In the example below, we construct two elements named link , which are part of separate namespaces:

(define a-bib-tex-link (sb:link sb:url: " ... some url ... " sb:format: "ps"))
(define an-html-link (h4:link ...))

To compare membership of a node in a namespace can be done "manually", by accessing the ns-uri field:

(eq?
 (xml-element-target-ns? some-node)
 'http://celtic.benderweb.net/xml/sbib.xsd)

But more commonly, one can use the predicates generated when the XML schemas are "compiled". With this example of two elements whose local names are both link , we can see the utility of these predicates:

   (sb:link? a-bib-tex-link)
   ==> #t
(sb:link? an-html-link) ==> #f
(h4:link? a-bib-tex-link) ==> #f
(h4:link? an-html-link) ==> #t

These predicate functions are aware of namespaces. The key is that these type tests are performed in terms of (<namespace uri>, <local name>)-- the expanded name--not merely the local name (or even a qualified name using only a prefix).

In the context of SXPath, this approach has implications for the correct use of the node-typeof? predicate. With an "instantiation" of SXPath for RS-XML, it is clear that test of an element tag should test the expanded name (stored in the tag field), not the local name (stored in the print- tag field).

Tests using ntype?? should always use the qualified name, not the local name (except of course for elements defined to use only local names):

(node-typeof? 'http://celtic.benderweb.net/xml/sbib.xsd:link)

not

(node-typeof? 'link)

Of course in practice this is quite awkward with long uri names. Instead, the instantiation of SXPath for RS-XML allows the use of the quote-xml to resolve an element type name into its expanded name:

(ntype?? (quote-xml sb:link))

So far, I have discussed only the internal structure and manipulation of XML nodes with their namespaces. And I have not mentioned "prefixes": user abbreviations for namespace uri's. The reason is that in RS-XML prefixes are only used at "output time", when the XML structures are serialized.

By default, RS-XML documents would be printed using only unqualified or local names. To make use of prefixes, RS-XML provides a bind-namespaces syntax to specify namespace prefixes to be used in qualified names:

(bind-namespaces
 ((sb 'http://celtic.benderweb.net/xml/sbib.xsd))
 (link
  url:
  "ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/express.ps.gz"
  format:
  "ps"))

bind-namespaces provides a convenient way to fill in the ns-list field in an xml-element structure with an association list of prefixes mapped to namespace uri's. The first argument of bind-namespaces is a list of bindings. bind-namespaces can be used to specify a default namespace. Instead of a prefix symbol, the symbol _ is used.

Using this same example, this fragment would be printed as follows:

  <sb:link sb:url="ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/express.ps.gz"
           sb:format="ps"
           xmlns:sb="http://celtic.benderweb.net/xml/sbib.xsd">

In one last example, the handling of the odd case where the same namespace prefix is used for elements from different namespaces:

         <html:furniture xmlns:html="http://how-to-make-a-lunch.com"> 
          <html:p xmlns:html="http://w3c.org/HTML"> 
           <html:table> ... HTML table ... </html:table> 
          </html:p>
  	 <html:table> ... Dining-table ... </html:table>
         </html:furniture>

Since there are two schemas here, in the functions below, I will use lunch.com:table , and h4:table as function names. In practice libraries generated from different schemas will (or should) be given different function name prefixes, since, in this case loading both modules would cause name conflicts.

So, this example would be written as:

(bind-namespaces
 ((html 'http://how-to-make-a-lunch.com))
 (lunch.com:furniture
  (bind-namespaces
   ((html 'http://w3c.org/HTML))
   (h4:p (h4:table ... HTML table ...))
   (lunch.com:table ... Dining-table ...))))

The namespace prefixes have no effect on the type predicates:

(h4:table? (lunch.com:table ... Dining-table...))

will return #f . As will

(lunch.com:table? (h4:table ...HTML table ...))

Last modified: Sunday, January 30th, 2005 1:57:37pm
HTML generated using WebIt!.