A Programmer’s Notebook on the EditGrid API
February 27th, 2008My Tchotchke Boxes by 1213 1982.
I created a series of online spreadsheets of every building, construction and demolition permit issuesd in Orleans Parish since January 2005. These spreadsheets are available at Think New Orleans.
The EditGrid API can automate the creation of reports and act as an easy use and share database for light data collection applications. I’ll write about my experience with EditGrid and why you should consider EditGrid at the back end for your next data collection task.
This is my programmer notebook on my getting to know you time with EditGrid.
I wrote my program in Java. Here are the three gotchas that I encountered.
comment
Sheaf
December 15th, 2007Sheaf Toss by Ian Langworth.
Last Saturday morning, on my way to Fair Grinds for my morning coffee, something occurred to me about persistent object storage. I’m less certain as to why. I’ve not thought about that component of my application development in a long time. I implemented persistent storage using the basics of the malloc algorithm in a project that I called Bento.
Pages of Blocks with Identifiers
When I got to Fair Grinds I began to exchange instant messages with Brian Kerr or Ann Arbor, Michigan. I described the idea as follows.
A fixed page length, 8k or 16k. The address is a file position plus a short. It is a big address, I know. The blocks are stored in a page with a number to identify the blocks followed by the object length followed by the block data. To find a block, to the file position, load the page into memory, skip through the blocks until you find the one with the matching block number.
Within a page the block is sought so that within a page blocks can be relocated within the page. If a block is deleted, the contents following the block are shifted up. The moved blocks can still be found because we are skipping through the page form an offset.
Then Brian started asking questions and sorting out details. It was good to have someone think about the problem.
If you have a block larger than a single page, you break it up among pages. You use however many empty pages necessary to hold the block and put the remainder in a page that contains many blocks.
The file format supports journaling, so there is a concept of isolation. Isolation is modeled with a Mutator object. Changes to the file are called a mutation, rather than a transaction, since transaction implies a lot more hocus-pocus.
While the file format can handle concurrent requests, it cannot handle concurrent requests on the same block. It is up to the application developer to ensure that no two threads mutate the same block at the same time. It is up to the application developer to ensure that no thread reads a mutating block. The only manipulation that is permitted concurrently is read. Block allocations are going to be visible only to the thread that allocated them. Writes and frees are going to require synchronization on the part of the application.
In memory, a sorted map of lists of blocks with free space remaining is kept sorted on the amount remaining rounded down to an alignment. When a block is allocated it is rounded up to the next alignment and we search for the block with the smallest amount remaining that can fit the block. That block is used.
Separately, there is a list of pages that have been entirely emptied through deallocations.
The file format uses ByteBuffer objects kept in memory as buffers that represent the contents on disk. The ByteBuffer objects indicate what the disk will look like when the contents of the page are journaled. They can be out of sync with the file, but that is OK. This discrepancy between our in memory representation of the file and the on disk representation of the file allow us to implement journaling.
Commits to this file format are journaled. There are two journals available, an external file journal and an internal journal that uses pages in the file. For the internal journals the file format keeps a block of pointers in the header of the file. The default number of pointers is 64. To create a journal, one of the pointers is set to the address of the first page in a linked list of pages. This is set in the ByteBuffer that is in memory, however and not on disk. The address of the start of the linked list of journal pages is now written out until the journal commits. It is written out last and the file channel is forced to flush. Thus, on disk the pointers will only ever point to a complete journal.
In addition to a linked list of journal pages of journal actions, like allocate, write and free, the journal needs to allocate temporary blocks where the mutation can write changes without disturbing the actual block in memory or on disk. The temporary block contains the applications intent. It is copied over to the actual block during commit.
Delete Makes the File Grow
I set out to implement the file format. I’d hoped to have tests running on Monday. I begged off engagements. I sat before my computer and worked through each next logical task. I was not finished on Monday, so I forced myself to stop and resume my civic pursuits, while fiddling with the code and the concepts in my free time.
By Friday, the idea of the block identifier has run it’s course, before the code was complete.
An external journal meant that this file format would create a nice and compact file format. An internal journal would fill the file with empty journal pages. What really tripped me up was when I realized deletion could not be implemented as simply shifting the blocks following a deleted block up if I wanted the deletion to be journaled. The destination and the source of the copy were on the same page. If the page were to be corrupted their would be no way to recreate it. I would have to allocate copies of blocks following the deleted block in the journal. They would be safely on disk as part of the journal commit. The delete would then copy the blocks over from the journal.
This meant that a delete would actually cause the file to grow and grow large. If the first block on a page is deleted, a temporary block has to be allocated for all the other pages in the block. I couldn’t imagine myself very happy with a larger file.
Can the file ever shrink? Sure. If the pages at the end of the file empty, then we can truncate the file. We can always be on the lookout for empty pages at the end of a mutation and rewind the file.
But then, imagine a case where an application deletes a very large collection of objects. These deletions consume all the free pages for the journal. Then the deletions begin to create new journal pages at the end of the file. If another thread allocates a new object, the other thread couldn’t make use of all the free space created in isolation so it would have to allocate a block from a new page at the end of the file. When the thread with the deletions completes, it will free all of the journal pages, but the file cannot be rewound because there is a user page following all the freed journal pages.
The file would have a huge swath of empty pages just before the last page in the file. They would be reused eventually, but how counter-intuitive to delete objects and have the database grow in size on disk. That would be so hard to explain to people. It feels wrong.
I started to think about indirection, and a name for a new project.
Enter Mix
July 18th, 2007I’ve been thinking about what a real build system for Java will be.
It will be dependency management system. One that does not execute a task unless necessary. One that maintains it’s own tree of dependencies. It knows how to check the expiration of file dependencies and networked dependencies.
Targets are not names of procedures, but artifacts that are constructed. This is how it is done in dependency management systems such as Make. Targets are represented by actual artifacts, such as object files produced by a C compiler, or an executable produced by a linker.
There will be imaginary targets, as well. Imaginary targets will force a build. Programmatic targets are imaginary targets that expire based on application specific logic.
Targets are built using rules. There are specific rules and generic rules. Specific rules might create a specific jar from a collection of class files. A generic rule might describe how to turn a C file into an object file.
It will be Java, all Java. You will invoke it using Java.
java -jar mix.jar compile
A Mix project is expressed in the Java programming language. No XML. No external languages. No property files, except those managed by Mix itself.
Mix will depend on a JDK 1.4 solution at every turn, forgoing the en vogue library, the shinny objects of the moment, in order to reduce the footprint of the build system. (You won’t have to download Commons Logging.)
Mix will be a client/server application. It will create a process that will listen on a port for commands. Commands are sent as serialized objects. Startup times are greatly reduced.
Reporting will be done with standards compliant XHTML that follows a strict set of formatting rules, for the application of the CSS stylesheet of your choice. No XML data dump. No transforms. Direct to something that you can read. Each build will create a website with a dashboard index and reports from each task.
Dependency management extends to external dependencies; libraries. Your build can use a local cache of any remote repository. To add a resource, you fire off Mix at the command line with a name and URL. A checksum will be generated and you will be prompted to add the resource and checksum pair to your build.
The resource now is available to your Mix project as an InputStream that can be read and written to file anywhere you like or you can reference the file in the local cache.
When you deploy the application and someone else builds your project on their machine, the resource manager will pull the resource and perform the checksum automatically.
As noted above, the tasks are written in Java. You can use a default task as the action to take for a target. You can use a chain of tasks that will be performed one after another. You can build write your own task in Java, it will get compiled an executed the next time you invoke Mix.
Thus, Mix is a dependency system for your command line Java applications.
Snappy Missive Mail Shot
July 18th, 2007I sent today’s Think New Orleans newsletter directly from my server and it went quick. It took me some time to set it up to run from the server. I’d attempted to do this before, but getting the environment right was a build configuration task. I don’t like build configuration tasks. The messages queued so quickly, I was astounded. Last mail shot was from my laptop using SMTP and TLS. Each message took three seconds. This time it was ten messages a second. So rarely is programming time immediately repaid.
Serialization Conundrums
June 2nd, 2007Strata is a B+Tree that I’ve written in Java. It implements Java serialization. Not to store the leaves of the tree, but to store the definition of the tree itself.
BentoStorage.Creator newStorage = new BentoStorage.Creator();
newStorage.setSize(8 + 8);
newStorage.setReader(new MyRecordReader());
newStorage.setWriter(new MyRecordWriter());
Strata.Creator newIndex = new Strata.Creator();
newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new MyFieldExtractor());
Strata strata = newStrata.create();
Bento.Creator newBento = new Bento.Creator();
FileOutputStream outFile = new FileOutputStream(new File("index"));
ObjectOutputStream out = new ObjectOutputStream(outFile);
out.writeObject(strata);
out.close();
Here is how MyReader might be defined.
public final class MyReader
implements BentoStorage.Reader, Serializable
{
private final static int serialVersionID = 20070602L;
public Object read(ByteBuffer bytes)
{
return new MyRecord(bytes.getLong(), bytes.getLong());
}
}
What if I felt like using this from Groovy and taking advantage of closures?
BentoStorage.Creator newStorage = new BentoStorage.Creator();
newStorage.size = 8 + 8
newStorage.writer = { object, bytes ->
bytes.putLong(object.key)
bytes.putLong(object.version)
}
newStorage.reader = { bytes ->
return new MyRecord(bytes.getLong(), bytes.getLong())
}
Strata.Creator newIndex = new Strata.Creator()
newIndex.storage = newStorage.create()
newIndex.listExtractor = { txn, object ->
Person person = txn.heap.get(txn.unmarshaller, object.key)
return [ person.lastName, person.firstName ]
}
Strata index = newIndex.create()
The problem is that Groovy closures cannot be serialized. Now I cannot serialize Strata. This struck me as a setback for Groovy support, and a problem specific to Groovy, until I stubbed my toe on annonymous inner classes.
Strata.Creator newIndex = new Strata.Creator();
newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new FieldExtractor()
{
public Comparable[] getFields(Object txn, Object object)
{
MyRecord record = (MyRecord) object;
MyDatabase database = (MyDatabase) txn;
Person person = database.getHeap()
.get(database.getUnmarshaller(), record.getKey());
return new Comparable[] { person.getLastName(),
person.getFirstName() };
}
});
Strata index = newIndex.create()
Even if I make the anonymous inner class Serializable, perhaps by deriving from an interface that mixes FieldExtractor and Serializable the class that defines the method that builds the Strata must also be serializable. I’m serializing the hidden reference to the containing class. Unnecessary code. Pointless code. Not really part of the data structure.
This must be a conundrum faced any library that could be built using anonymous inner classes.
I wanted my creational pattern to follow the notion of creatation of the object through construction using the Creator classes once. Thereafter, the Strata is serialized and deserialized.
I do not have to get rid of this pattern, but in order to implement it, people will have to avoid the use of anonymous inner classes to define their readers, writers, and extactors. The should always create static classes that implement Serializable.
If they do want to use anonyomous inner classes for readers, writers and extractors, they can simply repeat the creation of the object through construction using the Creator classes. Same goes for the use of Groovy closures.
This muddles an assumption that I had higher in my application stack. I am working on an object database that I call Depot.
The object database I’m building based on Strata stores the Strata B+Tree definitions in a heap. I created a bridge interface for the FieldExtractor, where the object database fishes the object out of the heap, and so only has one parameter, the object found in the heap, and does not pass in the application specific transaction context object.
public class Strata
{
public interface FieldExtractor
{
Comparable[] getFields(Object txn, Object object);
}
}
public class Depot
{
public interface FieldExtractor
{
Comparable[] getFields(Object object);
}
}
I’d written example code to imagine how the API would work. With Strata I’d always made Strata.FieldExtractor static, because it was always so complicated, turning a key into an object. With Depot that was taken care of, so the application only had to crack the object and return the array of @Comparable@S upon which an alterate index would sort a set of objects.
Furthermore, I’d decided that for a desktop application, it would be nice to keep the definition, trees and heap in one file. Not thinking, I’d made so that the initial version of Depot demands that the Depot.FieldExtractor be Serializable. My code examples of how clever I am, to define an index in a few short lines of code, did not work out.
It is not so much a conundrum. I merely need to make this one file format optional. The definition could be deserialized from the one file, or it could simply be rebuilt. It’s an application developers choice. If the application developer choses to make her @Depot.FieldExtractor@S static and serializable, then I can provide a static function that will create a file store for use as a heap, that tucks that definition into a longish header field.
I’m sure you could serialize a compound Swing view after you’ve built it, but you could also build it each time and only serialize the user preference settings, like the position of a window splitter.
If anyone out thinks that I misunderstand the trade off, please let me know. It caught me off gaurd. I was going to trouble the Groovy listserv with some questions, but this understanding came to me while I was getting around to that.
I’m A Happy Programmer Once I Find A Name
May 31st, 2007I’ve resumed the Think New Orleans mailing list. I’ve sent two missives already.
I’ve postponed the mailing list for a while. It has always been very successful. People know me more through the mailing list than through the weblog or the Wiki.
My first mailing was last week. At the Bayou Boogaloo, people said, hey, I got your message. There were people who wrote me back, to tell me that they would have missed the Bayou Boogaloo if not for the missive.
Mass email is distasteful to a web developer like myself. It seems cheap.
In New Orleans however, the protocol that reigns supreme is not HTTP, but SMTP. People don’t want to Google. They already have so much information it hurts. They are not about to go searching for more. An inbox is an evil. It is however, quantifiable. There is a count. Moreover, it is evil that has your name on it. It has that much context. There is still no place that provides significant context for information about New Orleans or the recovery.
In order to resume the email list I created an Java project and called it emailer. It was dull programming. Simply, reading the Java Mail API references that I’d collected, and applying them. I have an objective to grow the list, and grow it quickly, advertising it’s circulation growth in each missive.
The first step was to develop a simple opt out form. Then I sent out a multipart alternative message, built from the plain text and HTML exports of a Writeboard. I set up a bounces address and generate a VERP address and send it according to the article Using Apache James and JavaMail to implement Variable Envelope Return Paths.
After the Bayou Boogaloo, and the kind words about the missive, a name for the project occurred to me, whilst walking across Washington toward Mid-City.
- Missive - a written communication.
I’m enjoying this project, now. Looking forward to what it can accomplish.
Yesterday, I sent out another missive using the same revision. The same except that I changed the project name.
This evening I added embedded images, where before I was linking to images on Flickr, so my next missive will be more readily compatible with GMail and others.
I Hate Ant
May 30th, 2007Last Meal by Shannon.
I hate Ant. Ant takes a miserable task and makes orders of magnitude more difficult. It is a horrible piece of software. I am constantly amazed at the traction it has gained.
Ant is a dependency management system, or rather that’s the void that it fills. Make for Java. However, unlike Make, Ant cannot build a dependency tree of an entire project, mapping dependencies from one target to the next.
They defend themselves. It is explained that it is a declarative language, not an imperative language, and you need to think declaratively. That’s a behavior they’ve aped from the SQL community. However, SQL is a declarative language, based on years of database theory.
The SQL community tells you think declaratively, so that you might open your mind to the rich expressiveness of relational calculus. The Ant community tells you to think declaratively, because they don’t have an answer for your question. Don’t go off thinking declaratively, expecting that your problems with Ant will ever be solved.
Yes, SQL is declarative, think declarative in SQL. Lisp is functional, think functional in Lisp.
Ant is broken. A real build system is a dependency management system. It should manage your dependencies. It should not ask you to change your thinking. It has nothing more to say.
Who wants to change their thinking for the sake of a build? A build system should be cursory consideration, not mental discipline.
Eventually, you’re going to install one of many Ant tasks that implement a loop, or embed JavaScript, or generate your Ant scripts, or write a bash program that will actually get the job done.
Ant is not a dependency management system. What is it then?
It’s concept of dependencies are nothing more than procedure calls. An Ant target is a procedure with no logical branches. Ant dependencies are a series of subroutine calls that can only be invoked at the start of the subroutine.
They fire unconditionally, they perform no checking of dependencies. They will not be skipped because their dependent files are up to date.
This is because, unlike C where dependencies form a tree, Java can have cyclical dependencies. The Java compiler does it’s own dependency management. This led the developers to decide that system wide dependencies were unnecessary.
The result is that every task has to manage it’s own dependencies. Few of them do. The copy command does, the zip command does. That’s about it. Most of them are procedures with a myriad of named parameters.
A target will always evaluate. They may not execute because of a condition attached to them, but that condition will have to be set by a task that executed prior. If you really do want to check dependencies, you are going to have a chain of targets. The noise that produces on the command line makes watching an Ant build akin to reading a core dump.
What does Ant give you that keeps people from running away? It doesn’t do repository management. We need Maven to cock that up.
I’ve abandoned the pointless XML files. I’m building in Groovy now, so I don’t have to pretend that Ant is teaching me how to think declaratively. I have real data structures and algorithms to make short work of what should be a short task.
I must still use AntBuilder and here’s what for. Ant wraps the Java compiler, and JUnit. Ant implements a form of file globbing. It has file system comfort functions, like conditional recursive copy and a recursive delete.
I Hate How Much I Hate Ant
Many drafts of this post. I wasted a morning looking for a photograph to accompany it. An important morning at that. I can’t find one. I don’t want to show ants, because they are ugly. I don’t want to show them eating poison, because they are ugly, even if they are going to die soon.
I hate Ant. I hate it because I spent a lot of time on it and in the end I did not use it. I spent months on the sort of non-trivial build. That sort that Ant will have you abandon your real project, and make Ant your project. I tried to generate Ant with with XSLT. I tried using the includes. I spent hours trying to push the limits of a this stupid batch file format.
I hate how much I hate something trivial. I hate how much I hate Ant. I can’t find a photograph, because it is such toxic process to do so. What conveys your hatred for a tool that makes a process even more labor-intensive? What conveys your hatred for an entire community that adheres to this childish weekend hack, this manifestation of a gross misunderstading of XML as a standard when it is so terribly easy to write a replacement?
Is it that Java is so permiated with bloatware marketing, that it has become part of the mindset of the open source Java community? Talking about a standard build tool, and that build tool is a hopeless joke, but if we talk right, it will correct itself, because it is a standard.
I am going to end this by saying, I don’t like bile. It is much easier to be creative than critical. It takes much less effort. It becomes almost effortless.
Revisiting my build system sucked me into the fear of getting sucked into build configuration and not shipping.
The approach has been toxic. It is here so I can revisit it.
Classpath Exception
May 20th, 2007Found a project outside of GNU that is using the Classpath exception to the GPL. Read the licensing section of Restlet.
| « Previous Entries |







