Querying XML Data with XQuery
By Scott Mitchell
Introduction
Let's face it, one of the primary tasks we, as Web developers, are faced with is querying data from some data store and allowing users to view and/or manipulate the information via a Web interface. Typically, the data stores that we query from are traditional relational databases, like Microsoft SQL Server or Oracle. With relational databases, the de facto means for querying data is SQL. However, with the ever-continuing rise in the popularity of Web services, and the need for a platform-independent, Internet-transferable data representation format, XML data stores are becoming more and more popular. SQL was never designed for querying semi-structured data stores, and therefore is not suitable for querying XML data stores. (For more information on creating and using Web services in ASP.NET be sure to read: Creating and Consuming a Web Service. For more information on XML be sure to read the XML FAQ Category on ASPFAQs.com.)
So, how does one query an XML data store and retrieve results from such a query? Most developers currently use XSLT and XPath to accomplish this task. XPath is a syntax for accessing only parts of an XML document that meet a certain criteria; XSLT is a technology that transforms an XML document from one form to another.
While XSLT and XPath have been in use for a while now, there is a new kid on the block: XML Query, or XQuery for short. XQuery is a querying language designed specifically to work with XML data stores using a SQL-like syntax. As of July 2003, XQuery 1.0 is still under development by the W3C standards body. However, the core features and syntax or XQuery are solid, so now is as good a time as any to learn about XQuery, especially since Yukon, the next version of MS SQL Server, will have built-in XQuery support.
In this article we will examine the XQuery syntax and examine some use cases. Following this we'll examine Microsoft's XQuery classes, which are currently available. With these classes, you can start using XQuery in your ASP.NET Web applications today!
XQuery Basics
XQuery is used to query an XML document, so, first things first, we need an XML document to talk about while examining our various queries. For this article, let's use the following XML document which describes the structure of a file system:
|
The root element of this XML document is <filesystem>
, which contains an arbitrary
number of <drive>
elements. Each <drive>
element, in turn, contains
an arbitrary number of <folder>
elements, and each <folder>
element
contains an arbitrary number of <folder>
elements and <file>
elements.
In its simplest form, an XQuery query can simply be an XPath expression. (If you're unfamiliar with XPath, I would strongly encourage you to work through this XPath tutorial before continuing.) For example, if we wanted to get a list of all of the files in the C drive, we could use the following XPath expression as our XQuery query:
document("FileSystem.xml")/filesystem/drive[@letter="C"]//file
|
The document("FileSystem.xml")
part indicates the XML data store: an XML file named
FileSystem.xml
. The output of this query, given the FileSystem.xml
file above,
would be:
<file>game1.sav</file>
|
The output of an XQuery statement is a collection of XML elements. In the above example, it's a collection
of <file>
elements. You can add static XML elements by just inserting them in the
XQuery query. For example, in the above example, perhaps we want all of the <file>
elements
to appear within an XML root element titled MyFiles
. This could be accomplished with the
following XQuery expression:
<MyFiles>
|
With this addition, the output would be:
|
Note that in our query we used braces ({ ... }
) around the XPath expression within the
<MyFiles>
element. The braces denote that the content within the braces is
an XQuery expression, and not literal content. For example, had we omitted the braces and used the query:
<MyFiles>
|
The output would have been:
<MyFiles>
|
FLWR Expressions
While simple XPath expressions are fine and good, the real power of XQuery shines through with FLWR expressions. FLWR stands for For-Let-Where-Return, and is pronounced "flower". The FLWR expression is akin to SQL's SELECT query; it allows for XML data to be queried with conditional statements, and then returns a set of XML elements as a result.
Take a moment to think about a SQL SELECT clause. The main ingredients there are the SELECT
,
FROM
, and WHERE
clauses. The FROM
clause specifies the table(s)
to query over. Then, for each row for the table(s) in the FROM
clause, the WHERE
clause is evaluated. Those rows that pass the evaluation have those fields that are specified in the
SELECT
clause outputted. FLWR statements are strikingly similar, as we'll see in a moment.
FLWR, as the acronym implies, has four parts, or clauses, to it:
- for - the
for
clause specifies the XML node list to iterate over, and is akin to the SELECT statement'sFROM
clause. The list of XML nodes is specified via an XPath expression. For example, if we wanted to iterate over all of the<folder>
elements, we'd use the XPath expressiondocument("FileSystem.xml")//folder
. - where - the
where
clause contains an expression that evaluates to a Boolean, just like theWHERE
clause in a SQL SELECT statement. Each XML node in the XML node list in thefor
clause is evaluated by thewhere
clause expression; those that evaluate to True move on, those that don't are passed over. - return - the
return
clause specifies the content that is returned from the FLRW expression.
You may have noticed that I have omitted the let
clause. The examples we'll be looking
at in this article will not use the let
clause.
Now that we have briefly examined the three essential parts of the FLWR expressions, let's see some examples!
Here's a relatively straightforward example, showing how to get all <folder>
elements
whose name
attribute equals "Quake":
for $myNode in document("FileSystem.xml")//folder
|
Notice that the for
clause has the following form:
for variableName in nodeList
In XQuery, variable names are prefixed with $
(i.e., $myNode
). The for
clause enumerates the node list and for each node in the node list it binds the variable
$myNode
to the node. Then, in the where
and return
clauses, $myNode
can be used to reference the current node being evaluated. So, the for
clause iterates
through all of the <folder>
elements, and for each element, the where
clause
asks, "Is this element's name
attribute equal to Quake?", and if it is, then the return
clause outputs the <folder>
element.
The return
clause can be more involved. In fact, the return
clause can return
any XQuery expression. For example, we might want our output to look like the following:
|
We could accomplish this using the following XQuery expression:
|
Realize that FLWR expressions are just as powerful as SQL SELECT queries. FLWRs are capable of joins, subqueries, and set-based operations, just like SELECT queries.
Now that we've quickly looked at the XQuery syntax, let's turn our attention to using XQuery in .NET!
In Part 2 we'll see how to get Microsoft's XQuery
classes
and how to start using them in an ASP.NET Web application!