Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

How To Use EditGrid to Create a Database With A Slick User Interface With Zero Lines of Code

Morphing Grid

Morphing Grid by Dan Allison.

Ever find yourself with a data collection, and you can see the steps in your head, create a form, create a table? It’s a dry task that gets dryer by the minute once you find yourself once again creating that paged results table for the contact us or event registration form you’ve created a thousand times before.

I can’t tell you how much slicker it is to send that data directly to an EditGrid workbook. It is a quick and easy database that is intuitive for your users. Instead of a paged html table, their information is available as rich full-featured spreadsheet.

(more…)

Cyberduck

Cyberduck File Browser

Picture of the Cyberduck file browser by Cyberduck.

With all the catch up reading that I was able to do over the holidays, I was able to read about my Macintosh and came across Cyberduck. Cyberduck is an FTP and SFTP (file transfer via SSH otherwise known as SCP) utility for OS X released under the GNU Public License. It now replaces Fugu as my GUI SCP tool, which I rarely used because it felt slow. It even replaces the command line SCP which I always use because I’m familiar with the shell.

Sheaf

Sheaf Toss

Sheaf Toss by Ian Langworth.

Last Saturday morning, on my way to Fair Grinds for my morning coffee, something occurred to me about persistent object storage. I’m less certain as to why. I’ve not thought about that component of my application development in a long time. I implemented persistent storage using the basics of the malloc algorithm in a project that I called Bento.

Pages of Blocks with Identifiers

When I got to Fair Grinds I began to exchange instant messages with Brian Kerr or Ann Arbor, Michigan. I described the idea as follows.

A fixed page length, 8k or 16k. The address is a file position plus a short. It is a big address, I know. The blocks are stored in a page with a number to identify the blocks followed by the object length followed by the block data. To find a block, to the file position, load the page into memory, skip through the blocks until you find the one with the matching block number.

Within a page the block is sought so that within a page blocks can be relocated within the page. If a block is deleted, the contents following the block are shifted up. The moved blocks can still be found because we are skipping through the page form an offset.

Then Brian started asking questions and sorting out details. It was good to have someone think about the problem.

If you have a block larger than a single page, you break it up among pages. You use however many empty pages necessary to hold the block and put the remainder in a page that contains many blocks.

The file format supports journaling, so there is a concept of isolation. Isolation is modeled with a Mutator object. Changes to the file are called a mutation, rather than a transaction, since transaction implies a lot more hocus-pocus.

While the file format can handle concurrent requests, it cannot handle concurrent requests on the same block. It is up to the application developer to ensure that no two threads mutate the same block at the same time. It is up to the application developer to ensure that no thread reads a mutating block. The only manipulation that is permitted concurrently is read. Block allocations are going to be visible only to the thread that allocated them. Writes and frees are going to require synchronization on the part of the application.

In memory, a sorted map of lists of blocks with free space remaining is kept sorted on the amount remaining rounded down to an alignment. When a block is allocated it is rounded up to the next alignment and we search for the block with the smallest amount remaining that can fit the block. That block is used.

Separately, there is a list of pages that have been entirely emptied through deallocations.

The file format uses ByteBuffer objects kept in memory as buffers that represent the contents on disk. The ByteBuffer objects indicate what the disk will look like when the contents of the page are journaled. They can be out of sync with the file, but that is OK. This discrepancy between our in memory representation of the file and the on disk representation of the file allow us to implement journaling.

Commits to this file format are journaled. There are two journals available, an external file journal and an internal journal that uses pages in the file. For the internal journals the file format keeps a block of pointers in the header of the file. The default number of pointers is 64. To create a journal, one of the pointers is set to the address of the first page in a linked list of pages. This is set in the ByteBuffer that is in memory, however and not on disk. The address of the start of the linked list of journal pages is now written out until the journal commits. It is written out last and the file channel is forced to flush. Thus, on disk the pointers will only ever point to a complete journal.

In addition to a linked list of journal pages of journal actions, like allocate, write and free, the journal needs to allocate temporary blocks where the mutation can write changes without disturbing the actual block in memory or on disk. The temporary block contains the applications intent. It is copied over to the actual block during commit.

Delete Makes the File Grow

I set out to implement the file format. I’d hoped to have tests running on Monday. I begged off engagements. I sat before my computer and worked through each next logical task. I was not finished on Monday, so I forced myself to stop and resume my civic pursuits, while fiddling with the code and the concepts in my free time.

By Friday, the idea of the block identifier has run it’s course, before the code was complete.

An external journal meant that this file format would create a nice and compact file format. An internal journal would fill the file with empty journal pages. What really tripped me up was when I realized deletion could not be implemented as simply shifting the blocks following a deleted block up if I wanted the deletion to be journaled. The destination and the source of the copy were on the same page. If the page were to be corrupted their would be no way to recreate it. I would have to allocate copies of blocks following the deleted block in the journal. They would be safely on disk as part of the journal commit. The delete would then copy the blocks over from the journal.

This meant that a delete would actually cause the file to grow and grow large. If the first block on a page is deleted, a temporary block has to be allocated for all the other pages in the block. I couldn’t imagine myself very happy with a larger file.

Can the file ever shrink? Sure. If the pages at the end of the file empty, then we can truncate the file. We can always be on the lookout for empty pages at the end of a mutation and rewind the file.

But then, imagine a case where an application deletes a very large collection of objects. These deletions consume all the free pages for the journal. Then the deletions begin to create new journal pages at the end of the file. If another thread allocates a new object, the other thread couldn’t make use of all the free space created in isolation so it would have to allocate a block from a new page at the end of the file. When the thread with the deletions completes, it will free all of the journal pages, but the file cannot be rewound because there is a user page following all the freed journal pages.

The file would have a huge swath of empty pages just before the last page in the file. They would be reused eventually, but how counter-intuitive to delete objects and have the database grow in size on disk. That would be so hard to explain to people. It feels wrong.

I started to think about indirection, and a name for a new project.

A Third Space In Ann Arbor

The Detroit newsletter slash website Metromode reports that SRT opens Ann Arbor office, more than triples its employees. Although, it looks as though it was a reprinted press release, the notion of third spaces is one that I’m pursuing with coworking meetups at our lovely New Orleans coffee houses.

Coshirking

Coshirking: A coffee break lasting more than one hour where local industry gossip is exchanged over open laptops. Coined by William Tozer.

Preach to the Choir, They Are the Only One Listening

On the Internet you only ever preach to the choir, or you pander to the trolls. That’s your choice. You either speak to the like minded who find your message through the magic of the long tail or the hateful spoilers who find your message through that exact same magic. I can feel it when I write. I can feel the trolls. I know they will arrive to leave silly little troll droppings, and I preempt them in my writing. They see it and spew. I’ve got to feel the choir. To write directly for the choir. Kick the trolls. Invite the opposing views.

Hooray for Hollygrove

Back yard of a home on Edinburgh St, Christopher Johnston in background by Alan Guterrez

I have a full-time professional services contract with the New Orleans Housing Resource Center. I work for Paul Baricos, and with Meghan Finn on Hollygrove related recovery issues. They are assembling a block captains program, a newsletter for Hollygrove, as well helping to organize discussions on land use and development.

I continue to work on all aspects of Think New Orleans. There is no “but”. There is only “and”. I’m working on Think New Orleans, and the Road Home Unconference, and I have an office at the Trinity Christian Community Center, and I am affiliated with a neighborhood.

I’m in the middle of a lot of projects. The one that has most of my attention is the Road Home Unconference. I’m going to lend whatever support I can to Maitri Venkat-Ramani in her work on Rising Tide 2.

For Hollygrove, however, I’m focused on finding ways to get digital media into a neighborhood that has very few computers. The route we’re perusing is an older digital format that has become quite common in most American households. It’s called compact disc. We’re going to produce PodCasts for the block captains, but rather than distribute iPods, we’re going to burn CDs.

I’m going to interview an engineer with the Sewage and Water Board. Hollygrove is 9 feet below sea level. Water drains out of Broadmoor and Carrollton and into Hollygrove, where it is pumped into the Metaire 17th Canal and flows to Lake Ponchartrain through Jefferson Parish. It floods. It has flooded 11 times since the big one. (I’m going to post this and post corrections later.)

Last weekend Christopher Johnston and I went photowalking through Hollygrove. We chose a block and took a photograph of every house on that block. It’s on me to create a map that has all these photographs, so people can see the state of the neighborhood.

Expect to learn more about Hollygrove, as I learn more about Hollygrove.

Serialization Conundrums

Serialized by Silus Grok.

Strata is a B+Tree that I’ve written in Java. It implements Java serialization. Not to store the leaves of the tree, but to store the definition of the tree itself.

BentoStorage.Creator newStorage = new BentoStorage.Creator();

newStorage.setSize(8 + 8);
newStorage.setReader(new MyRecordReader());
newStorage.setWriter(new MyRecordWriter());

Strata.Creator newIndex = new Strata.Creator();

newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new MyFieldExtractor());

Strata strata = newStrata.create();

Bento.Creator newBento = new Bento.Creator();

FileOutputStream outFile = new FileOutputStream(new File("index"));
ObjectOutputStream out = new ObjectOutputStream(outFile);
out.writeObject(strata);
out.close();

Here is how MyReader might be defined.

public final class MyReader
implements BentoStorage.Reader, Serializable
{
    private final static int serialVersionID = 20070602L;

    public Object read(ByteBuffer bytes)
    {
        return new MyRecord(bytes.getLong(), bytes.getLong());
    }
}

What if I felt like using this from Groovy and taking advantage of closures?

BentoStorage.Creator newStorage = new BentoStorage.Creator();

newStorage.size = 8 + 8
newStorage.writer = { object, bytes ->
    bytes.putLong(object.key)
    bytes.putLong(object.version)
}
newStorage.reader = { bytes ->
    return new MyRecord(bytes.getLong(), bytes.getLong())
}

Strata.Creator newIndex = new Strata.Creator()

newIndex.storage = newStorage.create()
newIndex.listExtractor = { txn, object ->
    Person person = txn.heap.get(txn.unmarshaller, object.key)
    return [ person.lastName, person.firstName ]
}

Strata index = newIndex.create()

The problem is that Groovy closures cannot be serialized. Now I cannot serialize Strata. This struck me as a setback for Groovy support, and a problem specific to Groovy, until I stubbed my toe on annonymous inner classes.

Strata.Creator newIndex = new Strata.Creator();

newIndex.setStorage(newStorage.create());
newIndex.setExtractor(new FieldExtractor()
{
    public Comparable[] getFields(Object txn, Object object)
    {
        MyRecord record = (MyRecord) object;
        MyDatabase database = (MyDatabase) txn;
        Person person = database.getHeap()
            .get(database.getUnmarshaller(), record.getKey());
        return new Comparable[] {  person.getLastName(),
                                   person.getFirstName() };
    }
});

Strata index = newIndex.create()

Even if I make the anonymous inner class Serializable, perhaps by deriving from an interface that mixes FieldExtractor and Serializable the class that defines the method that builds the Strata must also be serializable. I’m serializing the hidden reference to the containing class. Unnecessary code. Pointless code. Not really part of the data structure.

This must be a conundrum faced any library that could be built using anonymous inner classes.

I wanted my creational pattern to follow the notion of creatation of the object through construction using the Creator classes once. Thereafter, the Strata is serialized and deserialized.

I do not have to get rid of this pattern, but in order to implement it, people will have to avoid the use of anonymous inner classes to define their readers, writers, and extactors. The should always create static classes that implement Serializable.

If they do want to use anonyomous inner classes for readers, writers and extractors, they can simply repeat the creation of the object through construction using the Creator classes. Same goes for the use of Groovy closures.

This muddles an assumption that I had higher in my application stack. I am working on an object database that I call Depot.

The object database I’m building based on Strata stores the Strata B+Tree definitions in a heap. I created a bridge interface for the FieldExtractor, where the object database fishes the object out of the heap, and so only has one parameter, the object found in the heap, and does not pass in the application specific transaction context object.

public class Strata
{
    public interface FieldExtractor
    {
        Comparable[] getFields(Object txn, Object object);
    }
}

public class Depot
{
    public interface FieldExtractor
    {
        Comparable[] getFields(Object object);
    }
}

I’d written example code to imagine how the API would work. With Strata I’d always made Strata.FieldExtractor static, because it was always so complicated, turning a key into an object. With Depot that was taken care of, so the application only had to crack the object and return the array of @Comparable@S upon which an alterate index would sort a set of objects.

Furthermore, I’d decided that for a desktop application, it would be nice to keep the definition, trees and heap in one file. Not thinking, I’d made so that the initial version of Depot demands that the Depot.FieldExtractor be Serializable. My code examples of how clever I am, to define an index in a few short lines of code, did not work out.

It is not so much a conundrum. I merely need to make this one file format optional. The definition could be deserialized from the one file, or it could simply be rebuilt. It’s an application developers choice. If the application developer choses to make her @Depot.FieldExtractor@S static and serializable, then I can provide a static function that will create a file store for use as a heap, that tucks that definition into a longish header field.

I’m sure you could serialize a compound Swing view after you’ve built it, but you could also build it each time and only serialize the user preference settings, like the position of a window splitter.

If anyone out thinks that I misunderstand the trade off, please let me know. It caught me off gaurd. I was going to trouble the Groovy listserv with some questions, but this understanding came to me while I was getting around to that.

« Previous Entries