(X)Querying XML in Scheme

Query languages typically feature some sort of "select-from-where" construct. In XQuery, this is the FLWOR statement (FOR-LET-WHERE-ORDER-RETURN), which is pronounced as "flower". This note illustrates a Scheme implementation of the FLWOR statement. The examples given use my WebIt! XML framework. As in XQuery, XPath is usually used in conjunction with the FLWOR statement; I use SXPath here.

1 What is FLWOR?

I will introduce the FLWOR statement "in pieces"...

The for clause has the form: (for ((var expr) ...) result-expr). Each expression in the bindings must evaluate to a nodeset. The bindings of the for clause are combined together in a cartesian cross-project to form a tuple stream. The following example illustrates this:

(for ((i '(12 45 23 8))
      (j '(40 32 10)))
     (list i j))

This will return the tuple stream in the form of a list of lists of the values in each tuple:

((12 40)
 (12 32)
 (12 10)
 (45 40)
 (45 32)
 (45 10)
 (23 40)
 (23 32)
 (23 10)
 (8 40)
 (8 32)
 (8 10))

A where clause can be used to filter this tuple stream. The following example, will only include tuples in which i+j is greater than 50:

(for ((i '(12 45 23 8))
      (j '(40 32 10)))
     (where (>= (+ i j) 50)
            (list i j)))

The result will be:

((12 40) (45 40) (45 32) (45 10) (23 40) (23 32))

To sort these, with ias the primary key and j as a secondary key, use the order clause:

(for ((i '(12 45 23 8))
      (j '(40 32 10)))
     (where (>= (+ i j) 50)
            (order (by i j)
                   (list i j))))

This will give the same set of tuples, but sorted:

((12 40) (23 32) (23 40) (45 10) (45 32) (45 40))

2 Grammar of FLWOR Statements

The full form of the FLWOR statement may be described as:

<flwor> ::= <for-clause>

<for-clause> ::= (for ((<var> <expr>) ...) <inner-clause>)

<let-clause> ::= (let ((<var> <expr>) ...) <inner-clause>)
| (let* ((<var> <expr>) ...) <inner-clause>)
| (letrec ((<var> <expr>) ...) <inner-clause>)

<inner-clause> ::= <for-clause>
| <let-clause>
| (where <test-expr> <order-clause>)
| (where <test-expr> <result-expr>)
| <result-expr>

<order-clause> ::= (order (by <key-expr> ...) [ascending | descending]?
<result-expr>)

3 Sample FLWOR Queries

I now illustrate more complex queries--and actual XML queries. The following three examples are taken directly from the XQuery 1.0 specification, in Appendix E.1 which illustrates "joins" of data from 3 different XML documents. [See http://www.w3.org/TR/xquery/#id-joins]

I first describe the input data (used in all three examples):

"parts.xml": this document includes part elements, which in turn contain a partno (part number) element and a description element.

For example (using WebIt! notation):

(top (part (partno "a34") (description "alternator"))
     (part (partno "a75") (description "distributor"))
     (part (partno "a22") (description "spark plugs (4 units)"))
     ...)

"suppliers.xml": this document includes supplier elements, each of which contains a suppno (supplier number) element and a suppname (supplier name) element.

For example:

(top
 (supplier (suppno "1") (suppname "Bright Car Light Company"))
 (supplier (suppno "2") (suppname "Johnson Electrical Car Parts"))
 (supplier (suppno "3") (suppname "Carlson Glass Company"))
 ...)

"catalog.xml": this document includes item elements, which contain a partno element, a suppno element, and a price element.

For example:

(top
 (item (partno "c6782") (suppno "4") (price "$99.99"))
 (item (partno "za1") (suppno "1") (price "$87.99"))
 (item (partno "a34") (suppno "2") (price "$350.00"))
 ...)

(Though I describe the input data using WebIt! element constructors, the queries below actually use _parsed_ XML (using the SSAX parser, adapted for WebIt!'s data structures)--not synthetic WebIt! XML documents.)

3.1 Example 1: A Descriptive Catalog

The first query produces a "descriptive catalog": it reads the catalog and replaces the partno (part number) and suppno (supplier number) in each item with the part description (description) and supplier name (suppname). The returned list of items is sorted by part description and supplier name, and is returned in a descriptive-catalog element.

To achieve this, the partno is used to extract the matching part from "parts.xml"; the description is then puled out of that part. Similarly, the suppno in each item is used to extract the matching supplier element in "suppliers.xml". We defined two "helper functions" to abstract our part of the sxpath expression used in this query:

(define (has-partno? p)
  (let ((pno (sxml:string ((sxpath '(partno)) p))))
    (lambda (x) ((sxpath `(partno (equal? ,pno))) x))))
(define (has-suppno? s)
  (let ((sno (sxml:string ((sxpath '(suppno)) s))))
    (lambda (x) ((sxpath `(suppno (equal? ,sno))) x))))

The function has-partno? takes a part element "p" (as the "key" to be matched), and returns a function. This function accepts a new element and tests where that element has a child partno element which matches that of "p". The role of has-suppno? is similar. In XPath notation, these may be given as:

$x[partno = $p/partno]
and
$x[suppno = $s/suppno]

The flwor statement for this query is:

(descriptive-catalog
  (for ((i ((sxpath '(// item)) (document "catalog.xml")))
        (p
         ((sxml:filter (has-partno? i))
          ((sxpath '(// part)) (document "parts.xml"))))
        (s
         ((sxml:filter (has-suppno? i))
          ((sxpath '(// supplier)) (document "suppliers.xml")))))
    (order (by ((sxpath '(description)) p) ((sxpath '(suppname)) s))
      (item
       ((sxpath '(description)) p)
       ((sxpath '(suppname)) s)
       ((sxpath '(price)) i)))))

[document is a function which invokes ssax to read the specified file.]

To make this query actually "work", we must also have the declarations of the item and descriptive-catalog elements:

(define-element descriptive-catalog)
(define-element item)

These special forms define the element constructors (functions) for these elements.

3.2 Example 2: Suppliers with the Parts they Supply

The second query returns a list of suppliers, with list of part descriptions that each supplies. This query result will include suppliers who do not supply any parts as well. The results are sorted by supplier name, and returned in a results element. Within each supplier element, the list of part descriptions is sorted as well.

A list of suppliers is built; for each supplier, the items delivered by that supplier are extracted from "catalog.xml" (by matching the suppno). In turn, the partno for each item is used to find the description for that part in parts.xml.

(results
  (for ((s ((sxpath '(// supplier)) (document "suppliers.xml"))))
    (order (by ((sxpath '(suppname)) s))
      (supplier
        ((sxpath '(suppname)) s)
        (for ((i
               ((sxml:filter (has-suppno? s))
                ((sxpath '(// item)) (document "catalog.xml"))))
              (p
               ((sxml:filter (has-partno? i))
                ((sxpath '(// part)) (document "parts.xml")))))
          (order (by ((sxpath '(description)) p))
            ((sxpath '(description)) p)))))))

3.3 Example 3: A Master List

The final example in this note produces a "master list": a list of suppliers with the the parts they supply (a part element which contains a part description and price) plus a list of "orphaned parts": parts which have no supplier. Because the part list and item list are consulted twice, I pull the parsing of the input files outside of the flwor statement in this example.

(let ((all-suppliers ((sxpath '(// supplier)) (document "suppliers.xml")))
      (all-parts ((sxpath '(// part)) (document "parts.xml")))
      (all-items ((sxpath '(// item)) (document "catalog.xml"))))
  (master-list
    (for ((s all-suppliers))
      (order (by ((sxpath '(suppname)) s))
        (supplier
          ((sxpath '(suppname)) s)
          (for ((i ((sxml:filter (has-suppno? s)) all-items))
                (p ((sxml:filter (has-partno? i)) all-parts)))
            (order (by ((sxpath '(description)) p))
              (part ((sxpath '(description)) p) ((sxpath '(price)) i)))))))
    (orphan-parts
      (for ((p all-parts))
        (where (null? ((sxml:filter (has-partno? p)) all-items))
          (order (by ((sxpath '(description)) p))
            ((sxpath '(description)) p)))))))

4 Finale

While it is true that a stylesheet (be it XSLT, or the Scheme-based WebIt! stylesheet or SXSLT)can serve as a query mechanism, these examples show cases where FLWOR statements, in conjunction with XPath, can be more natural and more concise.

A prototype implementation of this FLWOR statement is available on my web site:

http://celtic.benderweb.net/webit/docs/xquery-pre/query.ss, latest version: 13 July 2003.

(This file also includes implementations of the XQuery "quantified expressions" constructs: some and every.)

The query examples, input documents, and query results are also available:

http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin1.ss
http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin2.ss
http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin3.ss
http://celtic.benderweb.net/webit/docs/xquery-pre/parts.xml
http://celtic.benderweb.net/webit/docs/xquery-pre/suppliers.xml
http://celtic.benderweb.net/webit/docs/xquery-pre/catalog.xml
http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin1-results.xml
http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin2-results.xml
http://celtic.benderweb.net/webit/docs/xquery-pre/qjoin3-results.xml

Jim Bender
13 July 2003

Last modified: Wednesday, February 16th, 2005 10:22:52pm
HTML generated using WebIt!.