Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

What Is the Difference Between a Mutator and a Snapshot in Memento? Is It Significant?

It seems that you could create a snapshot, which would simply be a timestamp, and get rolling. However, let’s say that you begin to commit a mutation. You get a timestamp from the system. You write out an object into a Bin and then you go to sleep. Another thread creates a snapshot that is based on the timestamp only, it does not build map of committed mutations, and encounters the object written to the Bin. It will use that value, breaking atomicity, because the mutation thread has slept before it write the other objects in the mutation. There is no difference between an snapshot and a mutator, so we will define only the class Snapshot.

Atomicity in Memento

Doors closing in “28th Stop” by Mo Riza.

Initially, I had a plan to keep structure in memory that contained the history of a specific object. The history would be cached using soft references, and every thread would reference the same history. Each history object would have a mutex. Before the history object could be changed, it would be locked exclusively, in fact, the plan was to lock them all exclusively, in the order of their keys.

But, what good is this? Now I can iterate through all these histories, knowing that my thread is the only one that can mutate the history of these objects. I can check to see if there is a newer version of the object. If there is, then I throw an exception, because Memento uses opportunistic locking.

I could go through all of them, lock them all exclusively, then go through them all again, check that there are no changes, finally, I could write out the changes to the B+Tree.

This would have to be implemented for both Bin and Join, while Index could rely on the lock on the associated Bin.

However, it doesn’t make sense. Once you start committing changes to the B+Tree, then they are visible to the other threads, so that, even though other mutators are locked out by the mutex on the history object, other snapshots are going to interpret the newly added version of a specific object as fair game.

Let’s say that when we commit, we create a time stamp at 05:00:01 and then we sleep while a snapshot is created with a time stamp of 05:00:02. Thereafter, the snapshot sleeps, we resume and begin our commit. We lock the mutex attached to the history object for each Bin or Join record that we wish to change. We begin to change write new versions, and add to the history object. Then, in the midst of it all, the 05:00:01 commit takes a snooze and the 05:00:02 snapshot resumes, visiting on of the just saved objects. 05:00:01 is before 05:00:02 so that the version of the object that we just saved is used. The snapshot then reads the object that the commit will store next, when it wakes up. Now we are no longer atomic.

When commit begins, there needs to be a way to tell the other threads to ignore that particular version.

This could be a hash table they keep and look for committed versions.

I cannot think of any way to do it with timestamps alone. Already, I’m trusting vacuum to elapsed time. What could I do, commit with a timestamp in the future?

Even the most clever in memory structure is going to need to be backed up by the ability to write to file that the commit succeeded. This is probably going to be yet anther B+Tree, to get the ordering. Not a bad way to go, the only lock on the leaf being written.

The results are mirrored in a hash table for each snapshot.

If I can think of a way to keep them from stomping on each other, besides a global scorecard, I will use it. It would be a lot of very hard thinking.

It does mean, that, except for locking that B+Tree of transaction history, there is no need to lock the B+Trees otherwise, I mean, that there is no need to lock a Bin exclusively.

To commit, one iterates though the isolated B+Tree. For each record, one searches for the record in the persistance B+Tree. If their exists a record with a later version number, and that version number is associated with a transaction that did not rollback, then the commit fails, the lock was not opportune, an exception is thrown.

If it is the case that, when a commit fails, which is the only time that a persistence B+Tree is written, we delete all the records written, then we can fail whenever we see a subsequent record.

We can fail whenever we see a subsequent record regardless of it’s success. This is easier to implement.

The drawback is that a mutator could fail for a contention that did not in fact exist, due to a rollback. On closer inspection, all of the above have the same drawback. At the time that thread A inspects an object revised by tread B, thread A will carp. Thread B might then inspect an object revised by thread C, and carp. If thread A had slept and waited, and provided that thread B clean up it’s mess, thread A would have committed. Chaotic.

Unless one devises a complicated scheme by which, thread A waits to see what thread B does, and does so in a way that eliminates the possibility that thread B might then wait to see what thread A does, the chances that opportunistic locking will detect conflicts that would not have been detected had the thread scheduler been more auspicious.

Of course, one could simply go in alphabetical order by the name of the Bin, followed by the Index of the Bin assuming a list of Index, followed by the ordered name of each Join associated with the Bin. When you detect a subsequent version that has not committed, you sleep on some fancy concurrency structure that will wake you after any other tread commits, and try your luck again.

The search to see if the version has committed in the B+Tree of mutations is not expensive, considering that the next thing to happen, if the results are negative, is an exception condition that will require that the entire mutation be repeated.

For now, let’s not be particular. If there is a subsequent version, mutating, committed, or rolled back, the lock is not opportune.

It could be that an operation would time out. If it were the case that two operations failed, because they were running in opposite directions, then when they restarted they might repeat the folly. The application might have a retry limit, and the application would probably panic if it was reached.

All of this checking on the state of the persistence B+Tree means that we’re going to have to have a better understanding, of the still not very well documented, and not at all implemented, intricacies of Strata concurrency. One would have to lock the leaf that contains the record sought exclusively.

In fact, one would have to see that all the records were on the same page, so that the version should not be part of the key. Strata will place duplicates on the same leaf page, so if the version is not part of the key, we can lock the leaf exclusively, read it to check for later versions, and then insert the latest version, knowing that no one else is doing same.

Event though the version is not part of the key, it will be in order, because when there is a later version then the one we want to insert, we raise and exception. If not, if we were to leave rolled back records in place, and check that the mutator rolled back, the last inserted and committed is always the newest.

I thought I had the plan, though, with in-memory histories. Bummer.

Histories In Memory

Do they need to be kept in memory at all times? Or do they only need to be kept in memory in order to lock them?

Free on Rollback

Remember to free the blocks allocated on rollback.

Local View of Data

One could do this, add a flag to the perminant index that says whether it is inserted or deleted, and has a version number, and then, when building the histories, one determines if, that has changed.

Also, their is a new thought, where those things do not change until absolutely necessary, which means, that, you’d create a view of the database, by building a mutation specific index, changing that mutation specific index, and then merging with the common to see what’s what.

The final commit would be as it is now in Memento, lock everything, make sure it has not changed. One could have a difference between row level locking and table level locking.

Table level, because we may need to lock the entire table, when things change significantly. If a join is rejiggered or some such.

The mutator’s view could be stored in standard Java containers, or in the future, it could be logged, or kept in a lazy index, one that writes out only when memory is needed.

Changes of Visibility

Atomic, Concurrent, Isolated, and Durable.

When updating the database, do you keep things in memeory, or write them out?

Consider joins, which, more or less, need to get written to file, in order to be read. If you add joins, do you want them to be visisble to a query that is made during the same mutation?

If so, then do you add a field to the index, one that asks whether the write is valid? Can you update a record in Strata? Not now.

Joins are binary. They are either on or off. The link exists or it doesn’t. How do you mark one as being invalid? How do you roll it back? Do you include a start and stop?

If there are starts and stops, then there is a chance that, the stop fails, there is rollback, so you’ll need a list of rolled back, a set of rolled back transactions. You can ignore anything that has been rolled back, in fact, you can make a point of erasing it.

You can be updating the indices all along, and then writing out only the failed transactions.

On a join, you can have a record that has the action, which is join or unjoin, and committed.

On a version you can have a committed flag.

Then you can write these out as they occur, and commit the changes, if you perform a query on a bag, you can flush the bag changes.

You’ll have to add a means to update a record in Strata, to update the flag, or insert one that has the committed flag set.

The join is on or off, so the record will have to be added that says that the join is off as of this transaction.

There could be a set of versions to ignore, no, concurrency problem.

An alternate index can reference a version number, and if the item is not found, because it has not been committed, then it is ignored.

Defining Indices

Close up of Index Cards by Aya Walraven Otake.
  • An interface that compares an array, or partial array of comprables, for Strata.
  • Then crack objects or XML documents to obtain the comparables.
  • This new interface is external to Strata.
  • Internal to Strata there are different interfaces for
    • Comparing objects.
    • Determining the equality of objects.
    • Reading objects from the tier.
    • Writing objects to the tier.
    • Obtaining the object to write.

Now I’d like to not have to keep the object in memory at all, but extract the fields. The object to write is still a key, but it can be resolved into just the field values, kept in an array, to compare the array of values.

Why is this not the default implementation?

Feverish Notes on Serialized DOM

An ideal DOM, with a small footprint would write the file flat, and read it into an array.

Any DOM could be made easier if Bento could divide into blocks and there was a way to read across blocks, as one contiguous output stream.

This low memory answer would be somewhat like tiny tree. An array of nodes, rather than a tree. The arrays are gathered into a linked list. A node wraps a reference to a list node an index into the list node. At regular offsets in the list node, one has an indication of the depth or the index of the parent, so that it is easy to find your way back up the tree. When an array is broken, an offset is adjust so that these things can still be found.

Writing out the array mends it. It gets written out in the clean format, the broken arrays discarded and reread.

« Previous Entries