Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

Feverish Notes on Serialized DOM

An ideal DOM, with a small footprint would write the file flat, and read it into an array.

Any DOM could be made easier if Bento could divide into blocks and there was a way to read across blocks, as one contiguous output stream.

This low memory answer would be somewhat like tiny tree. An array of nodes, rather than a tree. The arrays are gathered into a linked list. A node wraps a reference to a list node an index into the list node. At regular offsets in the list node, one has an indication of the depth or the index of the parent, so that it is easy to find your way back up the tree. When an array is broken, an offset is adjust so that these things can still be found.

Writing out the array mends it. It gets written out in the clean format, the broken arrays discarded and reread.

Indexing Fragments Versus Indexing Nodes

Fragments by K Tucker.

Memento does not index nodes. It indexes fragments based on the contents.

XPath can be used to extract the fields of an index, but XPath is not otherwise in play.

With a document, such as…


<person>
<first-name>Alan</first-name>
<last-name>Gutierrez</last-name>
</person>

(Uh, oh! WordPress cannot render example XML? Man this is broken!)

You can create an index using the XPath results from the query /person/last-name. I plan on writing a nice implementation that will select index participants using those XPath statements.

This doesn’t mean that you query the bin that contains the fragment, that you will receive an element with the name last-name. You’ll receive a list of fragments. If you want to extract nodes from those fragments, such as the element with the name last-name, then you’ll have to navigate the fragments by another means.

In fact, the query will return fragment identifiers, so that you can run the query, without deserializing the fragments. The fragments could be indexed using XOM and XPath, for example, but deserialized using JiXB, into a Java “Person” object.

At first, because I imagined a seamless integration with Saxon, it seemed necessary to index individual nodes.

Question: Is it possible to create an index of individual nodes by running the query used to create the fragment index against the returned fragments? Is it desirable?

Question: What happens when a fragment is duplicated in a index because a list property is indexed by individual element? Let’s say, a list of favorite colors indexes a person by both blue and red.