Serialization Conundrums
June 2nd, 2007Strata is a B+Tree that I’ve written in Java. It implements Java serialization. Not to store the leaves of the tree, but to store the definition of the tree itself.
BentoStorage.Creator newStorage = new BentoStorage.Creator();
newStorage.setSize(8 + 8);
newStorage.setReader(new MyRecordReader());
newStorage.setWriter(new MyRecordWriter());
Strata.Creator newIndex = new Strata.Creator();
newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new MyFieldExtractor());
Strata strata = newStrata.create();
Bento.Creator newBento = new Bento.Creator();
FileOutputStream outFile = new FileOutputStream(new File("index"));
ObjectOutputStream out = new ObjectOutputStream(outFile);
out.writeObject(strata);
out.close();
Here is how MyReader might be defined.
public final class MyReader
implements BentoStorage.Reader, Serializable
{
private final static int serialVersionID = 20070602L;
public Object read(ByteBuffer bytes)
{
return new MyRecord(bytes.getLong(), bytes.getLong());
}
}
What if I felt like using this from Groovy and taking advantage of closures?
BentoStorage.Creator newStorage = new BentoStorage.Creator();
newStorage.size = 8 + 8
newStorage.writer = { object, bytes ->
bytes.putLong(object.key)
bytes.putLong(object.version)
}
newStorage.reader = { bytes ->
return new MyRecord(bytes.getLong(), bytes.getLong())
}
Strata.Creator newIndex = new Strata.Creator()
newIndex.storage = newStorage.create()
newIndex.listExtractor = { txn, object ->
Person person = txn.heap.get(txn.unmarshaller, object.key)
return [ person.lastName, person.firstName ]
}
Strata index = newIndex.create()
The problem is that Groovy closures cannot be serialized. Now I cannot serialize Strata. This struck me as a setback for Groovy support, and a problem specific to Groovy, until I stubbed my toe on annonymous inner classes.
Strata.Creator newIndex = new Strata.Creator();
newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new FieldExtractor()
{
public Comparable[] getFields(Object txn, Object object)
{
MyRecord record = (MyRecord) object;
MyDatabase database = (MyDatabase) txn;
Person person = database.getHeap()
.get(database.getUnmarshaller(), record.getKey());
return new Comparable[] { person.getLastName(),
person.getFirstName() };
}
});
Strata index = newIndex.create()
Even if I make the anonymous inner class Serializable, perhaps by deriving from an interface that mixes FieldExtractor and Serializable the class that defines the method that builds the Strata must also be serializable. I’m serializing the hidden reference to the containing class. Unnecessary code. Pointless code. Not really part of the data structure.
This must be a conundrum faced any library that could be built using anonymous inner classes.
I wanted my creational pattern to follow the notion of creatation of the object through construction using the Creator classes once. Thereafter, the Strata is serialized and deserialized.
I do not have to get rid of this pattern, but in order to implement it, people will have to avoid the use of anonymous inner classes to define their readers, writers, and extactors. The should always create static classes that implement Serializable.
If they do want to use anonyomous inner classes for readers, writers and extractors, they can simply repeat the creation of the object through construction using the Creator classes. Same goes for the use of Groovy closures.
This muddles an assumption that I had higher in my application stack. I am working on an object database that I call Depot.
The object database I’m building based on Strata stores the Strata B+Tree definitions in a heap. I created a bridge interface for the FieldExtractor, where the object database fishes the object out of the heap, and so only has one parameter, the object found in the heap, and does not pass in the application specific transaction context object.
public class Strata
{
public interface FieldExtractor
{
Comparable[] getFields(Object txn, Object object);
}
}
public class Depot
{
public interface FieldExtractor
{
Comparable[] getFields(Object object);
}
}
I’d written example code to imagine how the API would work. With Strata I’d always made Strata.FieldExtractor static, because it was always so complicated, turning a key into an object. With Depot that was taken care of, so the application only had to crack the object and return the array of @Comparable@S upon which an alterate index would sort a set of objects.
Furthermore, I’d decided that for a desktop application, it would be nice to keep the definition, trees and heap in one file. Not thinking, I’d made so that the initial version of Depot demands that the Depot.FieldExtractor be Serializable. My code examples of how clever I am, to define an index in a few short lines of code, did not work out.
It is not so much a conundrum. I merely need to make this one file format optional. The definition could be deserialized from the one file, or it could simply be rebuilt. It’s an application developers choice. If the application developer choses to make her @Depot.FieldExtractor@S static and serializable, then I can provide a static function that will create a file store for use as a heap, that tucks that definition into a longish header field.
I’m sure you could serialize a compound Swing view after you’ve built it, but you could also build it each time and only serialize the user preference settings, like the position of a window splitter.
If anyone out thinks that I misunderstand the trade off, please let me know. It caught me off gaurd. I was going to trouble the Groovy listserv with some questions, but this understanding came to me while I was getting around to that.
comment
I’m A Happy Programmer Once I Find A Name
May 31st, 2007I’ve resumed the Think New Orleans mailing list. I’ve sent two missives already.
I’ve postponed the mailing list for a while. It has always been very successful. People know me more through the mailing list than through the weblog or the Wiki.
My first mailing was last week. At the Bayou Boogaloo, people said, hey, I got your message. There were people who wrote me back, to tell me that they would have missed the Bayou Boogaloo if not for the missive.
Mass email is distasteful to a web developer like myself. It seems cheap.
In New Orleans however, the protocol that reigns supreme is not HTTP, but SMTP. People don’t want to Google. They already have so much information it hurts. They are not about to go searching for more. An inbox is an evil. It is however, quantifiable. There is a count. Moreover, it is evil that has your name on it. It has that much context. There is still no place that provides significant context for information about New Orleans or the recovery.
In order to resume the email list I created an Java project and called it emailer. It was dull programming. Simply, reading the Java Mail API references that I’d collected, and applying them. I have an objective to grow the list, and grow it quickly, advertising it’s circulation growth in each missive.
The first step was to develop a simple opt out form. Then I sent out a multipart alternative message, built from the plain text and HTML exports of a Writeboard. I set up a bounces address and generate a VERP address and send it according to the article Using Apache James and JavaMail to implement Variable Envelope Return Paths.
After the Bayou Boogaloo, and the kind words about the missive, a name for the project occurred to me, whilst walking across Washington toward Mid-City.
- Missive - a written communication.
I’m enjoying this project, now. Looking forward to what it can accomplish.
Yesterday, I sent out another missive using the same revision. The same except that I changed the project name.
This evening I added embedded images, where before I was linking to images on Flickr, so my next missive will be more readily compatible with GMail and others.
I Hate Ant
May 30th, 2007Last Meal by Shannon.
I hate Ant. Ant takes a miserable task and makes orders of magnitude more difficult. It is a horrible piece of software. I am constantly amazed at the traction it has gained.
Ant is a dependency management system, or rather that’s the void that it fills. Make for Java. However, unlike Make, Ant cannot build a dependency tree of an entire project, mapping dependencies from one target to the next.
They defend themselves. It is explained that it is a declarative language, not an imperative language, and you need to think declaratively. That’s a behavior they’ve aped from the SQL community. However, SQL is a declarative language, based on years of database theory.
The SQL community tells you think declaratively, so that you might open your mind to the rich expressiveness of relational calculus. The Ant community tells you to think declaratively, because they don’t have an answer for your question. Don’t go off thinking declaratively, expecting that your problems with Ant will ever be solved.
Yes, SQL is declarative, think declarative in SQL. Lisp is functional, think functional in Lisp.
Ant is broken. A real build system is a dependency management system. It should manage your dependencies. It should not ask you to change your thinking. It has nothing more to say.
Who wants to change their thinking for the sake of a build? A build system should be cursory consideration, not mental discipline.
Eventually, you’re going to install one of many Ant tasks that implement a loop, or embed JavaScript, or generate your Ant scripts, or write a bash program that will actually get the job done.
Ant is not a dependency management system. What is it then?
It’s concept of dependencies are nothing more than procedure calls. An Ant target is a procedure with no logical branches. Ant dependencies are a series of subroutine calls that can only be invoked at the start of the subroutine.
They fire unconditionally, they perform no checking of dependencies. They will not be skipped because their dependent files are up to date.
This is because, unlike C where dependencies form a tree, Java can have cyclical dependencies. The Java compiler does it’s own dependency management. This led the developers to decide that system wide dependencies were unnecessary.
The result is that every task has to manage it’s own dependencies. Few of them do. The copy command does, the zip command does. That’s about it. Most of them are procedures with a myriad of named parameters.
A target will always evaluate. They may not execute because of a condition attached to them, but that condition will have to be set by a task that executed prior. If you really do want to check dependencies, you are going to have a chain of targets. The noise that produces on the command line makes watching an Ant build akin to reading a core dump.
What does Ant give you that keeps people from running away? It doesn’t do repository management. We need Maven to cock that up.
I’ve abandoned the pointless XML files. I’m building in Groovy now, so I don’t have to pretend that Ant is teaching me how to think declaratively. I have real data structures and algorithms to make short work of what should be a short task.
I must still use AntBuilder and here’s what for. Ant wraps the Java compiler, and JUnit. Ant implements a form of file globbing. It has file system comfort functions, like conditional recursive copy and a recursive delete.
I Hate How Much I Hate Ant
Many drafts of this post. I wasted a morning looking for a photograph to accompany it. An important morning at that. I can’t find one. I don’t want to show ants, because they are ugly. I don’t want to show them eating poison, because they are ugly, even if they are going to die soon.
I hate Ant. I hate it because I spent a lot of time on it and in the end I did not use it. I spent months on the sort of non-trivial build. That sort that Ant will have you abandon your real project, and make Ant your project. I tried to generate Ant with with XSLT. I tried using the includes. I spent hours trying to push the limits of a this stupid batch file format.
I hate how much I hate something trivial. I hate how much I hate Ant. I can’t find a photograph, because it is such toxic process to do so. What conveys your hatred for a tool that makes a process even more labor-intensive? What conveys your hatred for an entire community that adheres to this childish weekend hack, this manifestation of a gross misunderstading of XML as a standard when it is so terribly easy to write a replacement?
Is it that Java is so permiated with bloatware marketing, that it has become part of the mindset of the open source Java community? Talking about a standard build tool, and that build tool is a hopeless joke, but if we talk right, it will correct itself, because it is a standard.
I am going to end this by saying, I don’t like bile. It is much easier to be creative than critical. It takes much less effort. It becomes almost effortless.
Revisiting my build system sucked me into the fear of getting sucked into build configuration and not shipping.
The approach has been toxic. It is here so I can revisit it.
Classpath Exception
May 20th, 2007Found a project outside of GNU that is using the Classpath exception to the GPL. Read the licensing section of Restlet.
Binding XML to Maps and Beans
May 17th, 2007With Stencil, you start with the output XHTML page.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html
xmlns:s="flag://stencil.agtrz.com/snippits"
xmlns:var="flag://stencil.agtrz.com/attributes" s:snippit="hello"
xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello, World!</title>
</head>
<body>
<form>
First Name: <input name="firstName" value="Fred" var:firstName="value"/><br/>
Last Name: <input name="lastName" value="Flintstone" var:lastName="value"/>
Hometown: <input name="lastName" value="Bedrock" var:address.city="value"/>
</form>
<s:switch>
<s:if test="neighbors">
<table>
<tr>
<td>First Name</td>
<td>Last Name</td>
</tr>
<s:each name="neighbors">
<tr>
<td><s:var name="firstName">Barney</s:var></td>
<td><s:var name="lastName">Rubble</s:var></td>
</tr>
</s:each>
</table>
</s:if>
<s:default>
<p>No neighbors.</p>
</s:default>
</s:switch>
</body>
</html>
<!-- vim: set ts=2 sw=2 et: -->
You will note that this XHTML will not look right in a browser. You have to strip all things in the flag://stencil.agtrz.com/snippits namespace before it looks like XHTML. Stripping the Stencil elements is easy to do though. XSLT anyone?
It is easier to preview the output through an XSLT transform, or through a servlet that strips the Stencil elements, then depend on mockups of business logic.
Binding
public interface Person {
public String getFirstName();
public String getLastName();
public Iterator getNeighbors();
}
It all comes down to strings. Back in the days of Perl, HTML was string, SQL was a string, and CGI was a string. String processing made life easy.
We don’t want to extract structure out of the Java, we want to extract strings. The structure is dictated by the Stencil. You compose an object graph that matches the Stencil, creating facade objects where necessary.
How do we get the strings out of Java? How do we bind Java and XML?
There are three ways to bind XML to Java. One it to go the JSP route, and write Java code into the template language. Another is to go the XStream or Castor route, and express the structure of a Java object graph in XML. The other is to go an OGNL route, create a path language, that will evaluate a little bit or a lot of Java, to get string values and put them in a structure.
I’m going to go with a minimal path language. Bare minimum. No swapping. No choosing your own path language.
Dirt Simple Path Language
Dirt Simple Path Language (DSPL) is a dot separated path to a final object that will be evaluated to a String using that objects toString method.
For any Java object, you will get the class name in DSLP using:
class.name
This is for the case of obvious nested objects like:
employee.address.zip
That would save the hassle of writing a bean wrapper to get the ZIP.
There are no accomodations for indexes or map keys. No conditionals. No operators. You should feel lucky to have this much.
Variable Elements and Attributes
There is always a context object. That object is either an object or else it is a map. In the case of an Object the object is treated as a JavaBean and the variable names are bean properties. In the case of a Map the names are interpreted as hash keys.
Variable Elements
Variables are indicated with the var element. The var element is replaced by calling the the toString method of the object returned by evaluating the property. If the value is null, then the element is simply deleted.
Question Do I accommodate null values? It could be done by adding a null attribute.
Variable Attributes
Variable attributes are indicated using the namespace
flag://stencil.agtrz.com/attributes
An attribute in that namespace is interpreted in the following fashion.
The local name is the DSPL to apply to the context object
The attribute value is the qualified name of the new attribute to add to the element, the attribute local name is the DSPL to apply to the context object to obtain the value of the new attribute.
If the attribute already exists in the template, it is removed.
If the value evaluates to null, the attribute is not added.
This backwards naming convention is simple to implement. Added bonus of allowing a placeholder value for static editing.
Question Do I accommodate null values? It could be done by adding the null value replacement after a comma. Is there a common case where one attribute is added in lieu of another for a specific condition? What about zebra striping, how do I know which row is even or odd? Do I use ListIterator?
No Expressions
While there is logic above, there are no expressions. Truth in Stencil is somewhat like truth in Perl. In addition to boolean false, null is false, as is an empty iterator.
When an iterator is used as a condition, it is cached, so that it can be later used for iterator. Quite naturally, after an iterator is used for iteration, it will always return false.
Control Structures
There are no expressions, only control structures that take a truth value returned from the underlying object model. Those control structures are as follows.
if- The content of theifelement are evaluated if the property named by thetestattribute is true.unless- The content of theunlesselement are evaluated if the property named by thetestattribute is false.each- For each object in theIteratororCollectionreturned by the property named by thenameattribute, the context switches to the object, and the contents of theeachelement are evaluated.with- The context switches to the object return by the property named by thenameelement, and the contents of thewithelement are evaluated.switch- Theswitchelement can contain any of the above elements, the first to be evaluated will replace theswitchelement. Theifandunlessare evaluated conditionally, whileeachandwithwill always evaluate and terminate theswitch.default- For use in aswitchstatement, the default condition that will always execute and terminate theswitch.
Then there are two more to consider, invoke or include. It is intended that Stencil will allow creation of pages through composition, reuse of Stencils through object orientation.
XSS
May 8th, 2007Read up on Cross Site Scripting and be careful of it as you develop applications that allow users to inject markup into the data they enter.
Stencil by Convention
May 7th, 2007When toying with Stencil at the outset, I dipped into the JavaBeans API. It seemed like the easiest way to get started on a dirt simple path language to extract string values from Java objects. I’d replace it when a better idea came to me.
Still waiting.
I created a an interface. Given an object and a property name, an implementation would extract the property form the object. There were two distinct implementations of name and value pairing that were apparent at the outset. JavaBeans, which are ObjectS whose properties are obtained though the JavaBeans API, and MapS. Certainly, there would be more.
There weren’t.
The API programmer was burdened with choices. Not only did you have to provide an object model for a Stencil, you needed provide a was to interpret that object model? Flexible? Sure. As flexible as 0 and 1.
This calls for a convention.
An object returned from a property is either an Object, Map, array, Collection, Iterator, or a ListIterator.
ObjectS, through JavaBeans, and MapS are interpreted as name value pairs. They give named values for a dirt simple path language.
Colletion and Iterator are used by the each keyword. A Java array or ListIterator expose an index property to the each keyword.
The Liberty of Convention
Amusing to think of how it hit struck me though. That word; convention. It freed me from sifting though design patterns, thinking up ever more convoluted means of delgation, dispatch, configuration, and construction.
One mental block was an avoidance of the instanceof keyword, which always feels like your cheating. However, this time around, I realized that I was already employing reflection, and that binding begs reflection or similar.
Once I’d decided that instanceof was the way to go, then it became a matter of deciding, instanceof what?
What Is the Difference Between a Mutator and a Snapshot in Memento? Is It Significant?
May 7th, 2007It seems that you could create a snapshot, which would simply be a timestamp, and get rolling. However, let’s say that you begin to commit a mutation. You get a timestamp from the system. You write out an object into a Bin and then you go to sleep. Another thread creates a snapshot that is based on the timestamp only, it does not build map of committed mutations, and encounters the object written to the Bin. It will use that value, breaking atomicity, because the mutation thread has slept before it write the other objects in the mutation. There is no difference between an snapshot and a mutator, so we will define only the class Snapshot.
| « Previous Entries |






