Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

Sizing for Photographs

This is a photograph that I’m going to use as a header image for the new Kubrick based theme for my blog. Karen annotated it at Flickr to show how it exemplifies multiple land use.

New theme for less gloom and more content. A lighter, brighter theme. Familiar Kubrick look bores your attention towards the pretty words and images.

EMail Database

I get a lot of email messages that are public information. Easier than blogging them, would be to bounce them to an email database. The email database would keep the email in tact, keep the headers in tact, so that people could see that the message indeed came from an official of the city or the state. It might be necessary to obfuscate the sender’s email address, or hide recipient’s email addresses, but the email-ness of the message should be conveyed. Then you’d have something to which you can link.

Atomicity in Memento

Doors closing in “28th Stop” by Mo Riza.

Initially, I had a plan to keep structure in memory that contained the history of a specific object. The history would be cached using soft references, and every thread would reference the same history. Each history object would have a mutex. Before the history object could be changed, it would be locked exclusively, in fact, the plan was to lock them all exclusively, in the order of their keys.

But, what good is this? Now I can iterate through all these histories, knowing that my thread is the only one that can mutate the history of these objects. I can check to see if there is a newer version of the object. If there is, then I throw an exception, because Memento uses opportunistic locking.

I could go through all of them, lock them all exclusively, then go through them all again, check that there are no changes, finally, I could write out the changes to the B+Tree.

This would have to be implemented for both Bin and Join, while Index could rely on the lock on the associated Bin.

However, it doesn’t make sense. Once you start committing changes to the B+Tree, then they are visible to the other threads, so that, even though other mutators are locked out by the mutex on the history object, other snapshots are going to interpret the newly added version of a specific object as fair game.

Let’s say that when we commit, we create a time stamp at 05:00:01 and then we sleep while a snapshot is created with a time stamp of 05:00:02. Thereafter, the snapshot sleeps, we resume and begin our commit. We lock the mutex attached to the history object for each Bin or Join record that we wish to change. We begin to change write new versions, and add to the history object. Then, in the midst of it all, the 05:00:01 commit takes a snooze and the 05:00:02 snapshot resumes, visiting on of the just saved objects. 05:00:01 is before 05:00:02 so that the version of the object that we just saved is used. The snapshot then reads the object that the commit will store next, when it wakes up. Now we are no longer atomic.

When commit begins, there needs to be a way to tell the other threads to ignore that particular version.

This could be a hash table they keep and look for committed versions.

I cannot think of any way to do it with timestamps alone. Already, I’m trusting vacuum to elapsed time. What could I do, commit with a timestamp in the future?

Even the most clever in memory structure is going to need to be backed up by the ability to write to file that the commit succeeded. This is probably going to be yet anther B+Tree, to get the ordering. Not a bad way to go, the only lock on the leaf being written.

The results are mirrored in a hash table for each snapshot.

If I can think of a way to keep them from stomping on each other, besides a global scorecard, I will use it. It would be a lot of very hard thinking.

It does mean, that, except for locking that B+Tree of transaction history, there is no need to lock the B+Trees otherwise, I mean, that there is no need to lock a Bin exclusively.

To commit, one iterates though the isolated B+Tree. For each record, one searches for the record in the persistance B+Tree. If their exists a record with a later version number, and that version number is associated with a transaction that did not rollback, then the commit fails, the lock was not opportune, an exception is thrown.

If it is the case that, when a commit fails, which is the only time that a persistence B+Tree is written, we delete all the records written, then we can fail whenever we see a subsequent record.

We can fail whenever we see a subsequent record regardless of it’s success. This is easier to implement.

The drawback is that a mutator could fail for a contention that did not in fact exist, due to a rollback. On closer inspection, all of the above have the same drawback. At the time that thread A inspects an object revised by tread B, thread A will carp. Thread B might then inspect an object revised by thread C, and carp. If thread A had slept and waited, and provided that thread B clean up it’s mess, thread A would have committed. Chaotic.

Unless one devises a complicated scheme by which, thread A waits to see what thread B does, and does so in a way that eliminates the possibility that thread B might then wait to see what thread A does, the chances that opportunistic locking will detect conflicts that would not have been detected had the thread scheduler been more auspicious.

Of course, one could simply go in alphabetical order by the name of the Bin, followed by the Index of the Bin assuming a list of Index, followed by the ordered name of each Join associated with the Bin. When you detect a subsequent version that has not committed, you sleep on some fancy concurrency structure that will wake you after any other tread commits, and try your luck again.

The search to see if the version has committed in the B+Tree of mutations is not expensive, considering that the next thing to happen, if the results are negative, is an exception condition that will require that the entire mutation be repeated.

For now, let’s not be particular. If there is a subsequent version, mutating, committed, or rolled back, the lock is not opportune.

It could be that an operation would time out. If it were the case that two operations failed, because they were running in opposite directions, then when they restarted they might repeat the folly. The application might have a retry limit, and the application would probably panic if it was reached.

All of this checking on the state of the persistence B+Tree means that we’re going to have to have a better understanding, of the still not very well documented, and not at all implemented, intricacies of Strata concurrency. One would have to lock the leaf that contains the record sought exclusively.

In fact, one would have to see that all the records were on the same page, so that the version should not be part of the key. Strata will place duplicates on the same leaf page, so if the version is not part of the key, we can lock the leaf exclusively, read it to check for later versions, and then insert the latest version, knowing that no one else is doing same.

Event though the version is not part of the key, it will be in order, because when there is a later version then the one we want to insert, we raise and exception. If not, if we were to leave rolled back records in place, and check that the mutator rolled back, the last inserted and committed is always the newest.

I thought I had the plan, though, with in-memory histories. Bummer.

Histories In Memory

Do they need to be kept in memory at all times? Or do they only need to be kept in memory in order to lock them?

Nonprofit Post-Mortem

The money from the Blue Moon Fund is gone now. The way it was structured was that, I’d work for another nonprofit, as an employee, that nonprofit would pay me for what I’m doing. It didn’t make sense, but I was told that it was the fastest way to get funds.

Thus, I began to pour my efforts, behind the brand name that I created, into a nonprofit that could only pay me by virtue of my efforts. It was a horrible overhead.

It was billed as a favor to me, to provide a fiscal agent for Think New Orleans. It was a lot of language that I didn’t understand. I assumed that it was not the strange language of a new form of business, the nonprofit business. In the end, it turned out to be the strange language of someone who is making things up as they go along.

Eventually, I found myself before an employment contract made it clear that any “intellectual property” would belong to this fiscal agent. At that point, I was ready to break, but others were involved.

There was much that did not make sense, but the part that made the least sense of all, was that I was supposed to stop pursuing income outside of this fledgling nonprofit. It created a personal financial disaster.

Eventually, I realized that, as an employee, I could simply quit the nonprofit. I did. I was told that many horrible things would happen if I were to do this, but none of these horrible things befell me.

This experience was isolating.

Strata Storage Strategies

Same Box, Different Cat by Jennifer Lamb

At the outset, Strata was developed for use with in memory and file backed implementations of tiers.

When in memory, branches and the objects can be stored in arrays. The default implementation uses ArrayList classes to store objects.

When file backed, their are two types of objects that can be stored, variable length objects and fixed length objects. Objects can be stored within the tiers or by reference.

When file backed, tiers are softly or weakly referenced, so that they can be collected. Thus, they can act as caches of the objects they reference.

Storing by refernece means storing an address. Whether or not the object is stored in the tier, or by reference outside the tier, the object needs to be in memory for comparisons, for traversal at both the leaf and the inner tier.

By reference storage can keep the object within the in memory tier, or it can perform a look up of the referenced object. The object is an simple object, like a string, it might be easier to keep it in the memory tier, as part of an address, object pair. If it is a complicated object, it can load the object through a caching mechanism.

Free on Rollback

Remember to free the blocks allocated on rollback.

Local View of Data

One could do this, add a flag to the perminant index that says whether it is inserted or deleted, and has a version number, and then, when building the histories, one determines if, that has changed.

Also, their is a new thought, where those things do not change until absolutely necessary, which means, that, you’d create a view of the database, by building a mutation specific index, changing that mutation specific index, and then merging with the common to see what’s what.

The final commit would be as it is now in Memento, lock everything, make sure it has not changed. One could have a difference between row level locking and table level locking.

Table level, because we may need to lock the entire table, when things change significantly. If a join is rejiggered or some such.

The mutator’s view could be stored in standard Java containers, or in the future, it could be logged, or kept in a lazy index, one that writes out only when memory is needed.

« Previous Entries Next Entries »