Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

Serialization Conundrums

Serialized by Silus Grok.

Strata is a B+Tree that I’ve written in Java. It implements Java serialization. Not to store the leaves of the tree, but to store the definition of the tree itself.

BentoStorage.Creator newStorage = new BentoStorage.Creator();

newStorage.setSize(8 + 8);
newStorage.setReader(new MyRecordReader());
newStorage.setWriter(new MyRecordWriter());

Strata.Creator newIndex = new Strata.Creator();

newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new MyFieldExtractor());

Strata strata = newStrata.create();

Bento.Creator newBento = new Bento.Creator();

FileOutputStream outFile = new FileOutputStream(new File("index"));
ObjectOutputStream out = new ObjectOutputStream(outFile);
out.writeObject(strata);
out.close();

Here is how MyReader might be defined.

public final class MyReader
implements BentoStorage.Reader, Serializable
{
    private final static int serialVersionID = 20070602L;

    public Object read(ByteBuffer bytes)
    {
        return new MyRecord(bytes.getLong(), bytes.getLong());
    }
}

What if I felt like using this from Groovy and taking advantage of closures?

BentoStorage.Creator newStorage = new BentoStorage.Creator();

newStorage.size = 8 + 8
newStorage.writer = { object, bytes ->
    bytes.putLong(object.key)
    bytes.putLong(object.version)
}
newStorage.reader = { bytes ->
    return new MyRecord(bytes.getLong(), bytes.getLong())
}

Strata.Creator newIndex = new Strata.Creator()

newIndex.storage = newStorage.create()
newIndex.listExtractor = { txn, object ->
    Person person = txn.heap.get(txn.unmarshaller, object.key)
    return [ person.lastName, person.firstName ]
}

Strata index = newIndex.create()

The problem is that Groovy closures cannot be serialized. Now I cannot serialize Strata. This struck me as a setback for Groovy support, and a problem specific to Groovy, until I stubbed my toe on annonymous inner classes.

Strata.Creator newIndex = new Strata.Creator();

newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new FieldExtractor()
{
    public Comparable[] getFields(Object txn, Object object)
    {
        MyRecord record = (MyRecord) object;
        MyDatabase database = (MyDatabase) txn;
        Person person = database.getHeap()
            .get(database.getUnmarshaller(), record.getKey());
        return new Comparable[] {  person.getLastName(),
                                   person.getFirstName() };
    }
});

Strata index = newIndex.create()

Even if I make the anonymous inner class Serializable, perhaps by deriving from an interface that mixes FieldExtractor and Serializable the class that defines the method that builds the Strata must also be serializable. I’m serializing the hidden reference to the containing class. Unnecessary code. Pointless code. Not really part of the data structure.

This must be a conundrum faced any library that could be built using anonymous inner classes.

I wanted my creational pattern to follow the notion of creatation of the object through construction using the Creator classes once. Thereafter, the Strata is serialized and deserialized.

I do not have to get rid of this pattern, but in order to implement it, people will have to avoid the use of anonymous inner classes to define their readers, writers, and extactors. The should always create static classes that implement Serializable.

If they do want to use anonyomous inner classes for readers, writers and extractors, they can simply repeat the creation of the object through construction using the Creator classes. Same goes for the use of Groovy closures.

This muddles an assumption that I had higher in my application stack. I am working on an object database that I call Depot.

The object database I’m building based on Strata stores the Strata B+Tree definitions in a heap. I created a bridge interface for the FieldExtractor, where the object database fishes the object out of the heap, and so only has one parameter, the object found in the heap, and does not pass in the application specific transaction context object.

public class Strata
{
    public interface FieldExtractor
    {
        Comparable[] getFields(Object txn, Object object);
    }
}

public class Depot
{
    public interface FieldExtractor
    {
        Comparable[] getFields(Object object);
    }
}

I’d written example code to imagine how the API would work. With Strata I’d always made Strata.FieldExtractor static, because it was always so complicated, turning a key into an object. With Depot that was taken care of, so the application only had to crack the object and return the array of @Comparable@S upon which an alterate index would sort a set of objects.

Furthermore, I’d decided that for a desktop application, it would be nice to keep the definition, trees and heap in one file. Not thinking, I’d made so that the initial version of Depot demands that the Depot.FieldExtractor be Serializable. My code examples of how clever I am, to define an index in a few short lines of code, did not work out.

It is not so much a conundrum. I merely need to make this one file format optional. The definition could be deserialized from the one file, or it could simply be rebuilt. It’s an application developers choice. If the application developer choses to make her @Depot.FieldExtractor@S static and serializable, then I can provide a static function that will create a file store for use as a heap, that tucks that definition into a longish header field.

I’m sure you could serialize a compound Swing view after you’ve built it, but you could also build it each time and only serialize the user preference settings, like the position of a window splitter.

If anyone out thinks that I misunderstand the trade off, please let me know. It caught me off gaurd. I was going to trouble the Groovy listserv with some questions, but this understanding came to me while I was getting around to that.

Programming Versus Learning

Too much learning programming can be too much to bear. It is less cumbersome to implement algorithms than to learn APIs.

Buy Boogaloo Stuff

A placeholder post, for notes on how to setup a web store lickity split.

Civic Organization Sign Up Form

Registration Desk For The Previously Unregistered by Maitri Venkat-Ramani.

I’ve created a sign up form for CHAT. This sign up form is based on software that I’ve developed, primarily Stencil, for those of you who want to look behind the sceens. For those of you who don’t, consider this a general purpose nonprofit or civic organization registration program. Please, help by suggesting features, and reporting programs in the comments of this post.

Subversion Merging

Today I merged in a branch that I created for Strata. I’m observing some revision and branch discipline, even though I’m the only person working on my code. I do this because I’m looking for techniques to master, rules to follow so I don’t have to think so much about so many little things. The merge procedure I’ve used is decribed in Common Use-Cases for Merging in Version Control with
Subversion.

Time Tracking With Hindsight

A program someone should write. I’ll start a timeslip, start programming, and then get a phone call. If the phone call is quick, then I go back to programming and forget about it. If the phone call takes a long time, then I’m probably providing technical support. When I hang up, I’d like to create two time slips, working backwards. I want to start with the stop. It’s the difference between, “What are you going to do?”, and “Tell me what you’ve been doing.”

Binding XML to Maps and Beans

A photograph of a stencil by What What.

With Stencil, you start with the output XHTML page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html
  xmlns:s="flag://stencil.agtrz.com/snippits"
  xmlns:var="flag://stencil.agtrz.com/attributes" s:snippit="hello"
  xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello, World!</title>
</head>
<body>
  <form>
    First Name: <input name="firstName" value="Fred" var:firstName="value"/><br/>
    Last Name: <input name="lastName" value="Flintstone" var:lastName="value"/>
    Hometown: <input name="lastName" value="Bedrock" var:address.city="value"/>
  </form>
  <s:switch>
    <s:if test="neighbors">
      <table>
        <tr>
          <td>First Name</td>
          <td>Last Name</td>
        </tr>
        <s:each name="neighbors">
          <tr>
            <td><s:var name="firstName">Barney</s:var></td>
            <td><s:var name="lastName">Rubble</s:var></td>
          </tr>
        </s:each>
      </table>
    </s:if>
    <s:default>
      <p>No neighbors.</p>
    </s:default>
  </s:switch>
</body>
</html>
<!-- vim: set ts=2 sw=2 et: -->

You will note that this XHTML will not look right in a browser. You have to strip all things in the flag://stencil.agtrz.com/snippits namespace before it looks like XHTML. Stripping the Stencil elements is easy to do though. XSLT anyone?

It is easier to preview the output through an XSLT transform, or through a servlet that strips the Stencil elements, then depend on mockups of business logic.

Binding

public interface Person {
    public String getFirstName();

    public String getLastName();

    public Iterator getNeighbors();
}

It all comes down to strings. Back in the days of Perl, HTML was string, SQL was a string, and CGI was a string. String processing made life easy.

We don’t want to extract structure out of the Java, we want to extract strings. The structure is dictated by the Stencil. You compose an object graph that matches the Stencil, creating facade objects where necessary.

How do we get the strings out of Java? How do we bind Java and XML?

There are three ways to bind XML to Java. One it to go the JSP route, and write Java code into the template language. Another is to go the XStream or Castor route, and express the structure of a Java object graph in XML. The other is to go an OGNL route, create a path language, that will evaluate a little bit or a lot of Java, to get string values and put them in a structure.

I’m going to go with a minimal path language. Bare minimum. No swapping. No choosing your own path language.

Dirt Simple Path Language

Dirt Simple Path Language (DSPL) is a dot separated path to a final object that will be evaluated to a String using that objects toString method.

For any Java object, you will get the class name in DSLP using:

class.name

This is for the case of obvious nested objects like:

employee.address.zip

That would save the hassle of writing a bean wrapper to get the ZIP.

There are no accomodations for indexes or map keys. No conditionals. No operators. You should feel lucky to have this much.

Variable Elements and Attributes

There is always a context object. That object is either an object or else it is a map. In the case of an Object the object is treated as a JavaBean and the variable names are bean properties. In the case of a Map the names are interpreted as hash keys.

Variable Elements

Variables are indicated with the var element. The var element is replaced by calling the the toString method of the object returned by evaluating the property. If the value is null, then the element is simply deleted.

Question Do I accommodate null values? It could be done by adding a null attribute.

Variable Attributes

Variable attributes are indicated using the namespace

flag://stencil.agtrz.com/attributes

An attribute in that namespace is interpreted in the following fashion.

The local name is the DSPL to apply to the context object

The attribute value is the qualified name of the new attribute to add to the element, the attribute local name is the DSPL to apply to the context object to obtain the value of the new attribute.

If the attribute already exists in the template, it is removed.

If the value evaluates to null, the attribute is not added.

This backwards naming convention is simple to implement. Added bonus of allowing a placeholder value for static editing.

Question Do I accommodate null values? It could be done by adding the null value replacement after a comma. Is there a common case where one attribute is added in lieu of another for a specific condition? What about zebra striping, how do I know which row is even or odd? Do I use ListIterator?

No Expressions

While there is logic above, there are no expressions. Truth in Stencil is somewhat like truth in Perl. In addition to boolean false, null is false, as is an empty iterator.

When an iterator is used as a condition, it is cached, so that it can be later used for iterator. Quite naturally, after an iterator is used for iteration, it will always return false.

Control Structures

There are no expressions, only control structures that take a truth value returned from the underlying object model. Those control structures are as follows.

  • if - The content of the if element are evaluated if the property named by the test attribute is true.
  • unless - The content of the unless element are evaluated if the property named by the test attribute is false.
  • each - For each object in the Iterator or Collection returned by the property named by the name attribute, the context switches to the object, and the contents of the each element are evaluated.
  • with - The context switches to the object return by the property named by the name element, and the contents of the with element are evaluated.
  • switch - The switch element can contain any of the above elements, the first to be evaluated will replace the switch element. The if and unless are evaluated conditionally, while each and with will always evaluate and terminate the switch.
  • default - For use in a switch statement, the default condition that will always execute and terminate the switch.

Then there are two more to consider, invoke or include. It is intended that Stencil will allow creation of pages through composition, reuse of Stencils through object orientation.

The Feature Matrix Killjoy

The feature matrix. OCD in the Hood by Karen Gadbois.

One thing that I’ve come to allow myself in recent days, is this; Strata is designed to support the objectives of Memento.

No longer will I tell the reader to suppose, for example, that a Strata B+Tree is used to implement multi-version concurrency control, when offering up examples.

No. This implies that there are many other imagined uses.

Rather, I will ask the reader to keep in mind that Strata was designed to implement multi-version concurrency control. I will then offer up an example from Memento.

Other applications for this B+Tree data structure will be apparent to other people when it has been released, deployed, and proven.

An Honest Question

> How does your project compare to project X’s feature Y?

It is an honest question, people are looking for a point of reference. Answer the question. You do not need to provide an answer in the form of an implementation of feature Y.

When asked honestly, answer honestly in terms of problem statements and computing concepts, rather than feature comparisons. I’m not marketing a product, but making an open source contribution. Still, if were to consider my market, talking in terms of the code and concepts is going to appeal to the demographic of programmers who think about problems and implementations.

Someone else may come to understand the workings of Strata. They may suggest an implementation of some desirable functionality toward the goal of implementing a different application. That is open source.

More simply, someone may attempt to use Strata in their application, and come forward with a clearly defined problem, a request for feature that they will test and deploy, if a solution is available. That is open source.

A Dishonest Question

> How does your project compare to project X’s feature Y?

It can be a dishonest question.

There are times when I’ve encountered the feature matrix killjoy. They want to engage you in a comparison to a more mature project, or one that has at least published a feature matrix.

These conversations are combative, not collaborative. Implicit the question, what makes you think your project is better than project Y?

The answer is, I don’t know about project X. I have no use for feature Y. I have generously provided the source code under an OSI approved license. You have the source. Please feel free to investigate this question for yourself.

Picture Infinity

It is a pity, a failure point, that when I’ve encountered this attitude, I’ve let it guide me.

I childishly follow every tangent. It has felt compulsive when I so follow. I childishly approach every trade-off as if there were some as of yet undiscovered algorithm that would eliminate compromise. It has felt obsessive when I so approach. It makes me worry about myself.

How refreshing to realize that pathology is not necessary. (For this I owe you, my fellow New Orleanians, for consistently perceiving weaknesses as human.)

It is my sometimes charming (though more often not) character defect, to seek universal approval. It is yet another manifestation. One of many.

Software cannot have universal approval. Software is discrete. Trade-offs are inherent.

It is such a hard truth, that even I will have to come to accept it.

Software may yet save me from myself, once again.

« Previous Entries Next Entries »