Posts Tagged ‘SAX’

SAX-like Xml parsing

For those of you who don’t know or can’t remember, SAX (Simple API for XML) is a technology for reading Xml in an event based way. Instead of loading your Xml into a great big DOM and looping thru it or plucking out nodes via XPath or LINQ to XML, SAX allows the Xml parser to notify your application as new nodes were encountered. SAX (like XmlReader, is a forward-only Xml reader) and efficient for parsing large and streaming chunks of Xml.

Reactive extensions fits well with the SAX way of thinking because it is designed to “push” information to it’s consumer instead of making you “pull” information from it. At the same time, Reactive LINQ allows you to select on observable objects in a very natural feeling pull-like language. Therefore, it occurred to me: why not combine XmlReader and Reactive extensions to build a SAX-like Xml reader that you could use via Reactive LINQ!?

An observable XmlReader would allow you to subscribe to it and would iterate over your Xml document for you, notifying you when each node was read. The programmer using this could easily write reactive LINQ expressions to select on specific nodes in the Xml resulting in code that would look and feel much like LINQ to XML but with all the performance benefits of XmlReader.

For example, imagine you wanted to find all the values in nodes of a certain name. You could write something like this…

    1 XmlReader reader = XmlReader.Create(“TreeOfLife.xml”);

    2 IObservable<XmlReader> RxReader = reader.ToObservable();

    3 IObservable<string> NameFinder =

    4     from nodeReader in RxReader

    5     where nodeReader.NodeType == XmlNodeType.Element && nodeReader.Name == “NAME”

    6     select nodeReader.ReadElementContentAsString();

    7 NameFinder.Subscribe((item) => names.Add(item));

I’ll dissect:

  1. Create an XmlReader
  2. Use my newly created extension method to turn that XmlReader into an IObservable.
  3. Construct a new IObservable
  4. Select all the XmlReaders in the IObservable (one for each node)
  5. Filter for node type and element name
  6. Select their string values
  7. Actually initiate the XmlReader to iterate and notify you when a new name is available.

Now let’s look at the code in my extension method. It’s surprisingly simple!

public static IObservable<XmlReader> ToObservable(this XmlReader reader)



    Observable.CreateWithDisposable<XmlReader>(observer =>




            while (reader.Read())




        catch (Exception e)





        return reader;



All I am doing is looping on the XmlReader and passing the XmlReader itself (at it’s current state) to the observer. Violla!

Canceling the operation

Suppose you want to cancel the operation mid subscribe: Because I create the observer via Observable.CreateWithDisposable and return the XmlReader itself as my Disposable object. This allows me to cancel at any time by simply calling:

IDisposable processor = NameFinder.Subscribe((item) => names.Add(item));


More complex selections

Suppose you want to get only the child nodes within a parent node:

IObservable<string> NameFinder =

    from r1 in RxReader

    where r1.NodeType == XmlNodeType.Element && r1.GetAttribute(“ID”) == “76937”

    from r2 in r1.ReadSubtree().ToObservable()

    where r2.NodeType == XmlNodeType.Element && r2.Name == “NAME”

    select r2.ReadElementContentAsString();

I simply combine reactive LINQ statements and create a new reactive XmlReader to iterate a sub node using XmlReader.ReadSubtree().ToObservable().

Here’s a demo of the code above using the tree of life Xml for a Danaus butterfly genus.

Here’s the source code for the project.


Read Full Post »