Alan Gutierrez

Alan Gutierrez blogs on software, social networks, and himself.

Subscrive Via RSS Feed

XSS

Read up on Cross Site Scripting and be careful of it as you develop applications that allow users to inject markup into the data they enter.

Stencil by Convention

Changing the tire of a Ford Model T in 1959.

When toying with Stencil at the outset, I dipped into the JavaBeans API. It seemed like the easiest way to get started on a dirt simple path language to extract string values from Java objects. I’d replace it when a better idea came to me.

Still waiting.

I created a an interface. Given an object and a property name, an implementation would extract the property form the object. There were two distinct implementations of name and value pairing that were apparent at the outset. JavaBeans, which are ObjectS whose properties are obtained though the JavaBeans API, and MapS. Certainly, there would be more.

There weren’t.

The API programmer was burdened with choices. Not only did you have to provide an object model for a Stencil, you needed provide a was to interpret that object model? Flexible? Sure. As flexible as 0 and 1.

This calls for a convention.

An object returned from a property is either an Object, Map, array, Collection, Iterator, or a ListIterator.

ObjectS, through JavaBeans, and MapS are interpreted as name value pairs. They give named values for a dirt simple path language.

Colletion and Iterator are used by the each keyword. A Java array or ListIterator expose an index property to the each keyword.

The Liberty of Convention

Amusing to think of how it hit struck me though. That word; convention. It freed me from sifting though design patterns, thinking up ever more convoluted means of delgation, dispatch, configuration, and construction.

One mental block was an avoidance of the instanceof keyword, which always feels like your cheating. However, this time around, I realized that I was already employing reflection, and that binding begs reflection or similar.

Once I’d decided that instanceof was the way to go, then it became a matter of deciding, instanceof what?

What Is the Difference Between a Mutator and a Snapshot in Memento? Is It Significant?

It seems that you could create a snapshot, which would simply be a timestamp, and get rolling. However, let’s say that you begin to commit a mutation. You get a timestamp from the system. You write out an object into a Bin and then you go to sleep. Another thread creates a snapshot that is based on the timestamp only, it does not build map of committed mutations, and encounters the object written to the Bin. It will use that value, breaking atomicity, because the mutation thread has slept before it write the other objects in the mutation. There is no difference between an snapshot and a mutator, so we will define only the class Snapshot.

PHP

I’m certain that PHP’s effect on the brain is similar to ether. I despise this language. It is joke gone too far.

Enter Stencil

Stencil on wall south of Zocalo, Oax, MX

Two weeks back, I began work on an XML template engine. The name came to me shortly after. I’d call it Stencil. I like the name. I’m surprised that it does not already describe a template engine of sorts.

Today, I found another reason to like the name. It is easy to find pictures to accompany posts.

Motivations

When last I approached web applications programming, I was using XSLT and developing an XML pipeline engine called Relay. This engine is still about. I’m a fan of Saxon and XSLT.

XSLT, The T is for Transform

The problem with XSLT is that it is sight too complicated for XML generation. XSLT is a transform language. In order to generate a document from XSLT you need to feed it a bogus document. Then you match the root of the document, but copy none of it. Instead, you write out your new document.

Using this method, dynamic content must be passed in as parameters. A complicated document, with a lot of different dynamic sections, would require a lot of parameters passed to the Template. The invocation of the template looks and feels messy.

Feeding XSLT, With XML Serialization

There are ways to generate a document to get the ball rolling, like using XStream to serialize an object graph, which work rather well. Building map with the objects in them, and then serializing them gave a nice big XML document to roll from.

However, it does not always work, of course. Your object graph has to contain the data. You’ll have a JDBC cursor, for example, that you need to convert into an List, before you can put it in a Map that would also include the user login information.

The output can be very complicated, include a lot of extra data that is undesired, and is not terribly semantic.

There major drawback is that, if all I was doing was laying out a simple signup form, I couldn’t do that layout separately from the development of the form. I needed to start with the object model, serialize it, develop a transform. I’d have to create a pipeline to see what was going on at all.

A lot of complexity for something that once upon a time, was very simple.

JSP, Perpetual Context Switching

JSP hurts my eyes and my brain. I don’t like to see wayward Java injected into that twisty, angle-bracket maze. I don’t like it because it’s not valid XML. I don’t like it because logic has bled into the presentation.

On the client side, I manipulate the DOM with JavaScript, rather than place JavaScript directly in the document. Certainly, in client side JavaScript, there is no concept of a while loop that emits XML. (Okay, there is, but didn’t you get the memo? You’re not supposed to use document.write anymore.)

Two worlds collide.

Display Only XHTML

My ideal template engine would allow me to work on the XHTML separately. Creating the page in XHTML, then adding variable placeholders, and even then, using a utility that would strip the placeholders, view the static XHTML without having to yet implement the object model or controller.

I’d like to get back to the whipitupitude of the HTML and Perl days of yore.

Binding XML

The problem with templating languages it binding. Perl would treat markup as a string. That is an honest interpretation of markup.

Ruby wants to turn markup into objects. PHP wants to turn markup into logic.

Java straddles objects and logic. There are two ways to embed. Path languages or embedded logic.

Attempt to make something that is delcaritive?

Can you represent all of the iterative constructs with a declaration? Or a call to an implementation in Java? That is a worthy goal. Let’s see if it works. Novel an interesting enough to put out there for people’s consideration.

EMail Database

I get a lot of email messages that are public information. Easier than blogging them, would be to bounce them to an email database. The email database would keep the email in tact, keep the headers in tact, so that people could see that the message indeed came from an official of the city or the state. It might be necessary to obfuscate the sender’s email address, or hide recipient’s email addresses, but the email-ness of the message should be conveyed. Then you’d have something to which you can link.

Atomicity in Memento

Doors closing in “28th Stop” by Mo Riza.

Initially, I had a plan to keep structure in memory that contained the history of a specific object. The history would be cached using soft references, and every thread would reference the same history. Each history object would have a mutex. Before the history object could be changed, it would be locked exclusively, in fact, the plan was to lock them all exclusively, in the order of their keys.

But, what good is this? Now I can iterate through all these histories, knowing that my thread is the only one that can mutate the history of these objects. I can check to see if there is a newer version of the object. If there is, then I throw an exception, because Memento uses opportunistic locking.

I could go through all of them, lock them all exclusively, then go through them all again, check that there are no changes, finally, I could write out the changes to the B+Tree.

This would have to be implemented for both Bin and Join, while Index could rely on the lock on the associated Bin.

However, it doesn’t make sense. Once you start committing changes to the B+Tree, then they are visible to the other threads, so that, even though other mutators are locked out by the mutex on the history object, other snapshots are going to interpret the newly added version of a specific object as fair game.

Let’s say that when we commit, we create a time stamp at 05:00:01 and then we sleep while a snapshot is created with a time stamp of 05:00:02. Thereafter, the snapshot sleeps, we resume and begin our commit. We lock the mutex attached to the history object for each Bin or Join record that we wish to change. We begin to change write new versions, and add to the history object. Then, in the midst of it all, the 05:00:01 commit takes a snooze and the 05:00:02 snapshot resumes, visiting on of the just saved objects. 05:00:01 is before 05:00:02 so that the version of the object that we just saved is used. The snapshot then reads the object that the commit will store next, when it wakes up. Now we are no longer atomic.

When commit begins, there needs to be a way to tell the other threads to ignore that particular version.

This could be a hash table they keep and look for committed versions.

I cannot think of any way to do it with timestamps alone. Already, I’m trusting vacuum to elapsed time. What could I do, commit with a timestamp in the future?

Even the most clever in memory structure is going to need to be backed up by the ability to write to file that the commit succeeded. This is probably going to be yet anther B+Tree, to get the ordering. Not a bad way to go, the only lock on the leaf being written.

The results are mirrored in a hash table for each snapshot.

If I can think of a way to keep them from stomping on each other, besides a global scorecard, I will use it. It would be a lot of very hard thinking.

It does mean, that, except for locking that B+Tree of transaction history, there is no need to lock the B+Trees otherwise, I mean, that there is no need to lock a Bin exclusively.

To commit, one iterates though the isolated B+Tree. For each record, one searches for the record in the persistance B+Tree. If their exists a record with a later version number, and that version number is associated with a transaction that did not rollback, then the commit fails, the lock was not opportune, an exception is thrown.

If it is the case that, when a commit fails, which is the only time that a persistence B+Tree is written, we delete all the records written, then we can fail whenever we see a subsequent record.

We can fail whenever we see a subsequent record regardless of it’s success. This is easier to implement.

The drawback is that a mutator could fail for a contention that did not in fact exist, due to a rollback. On closer inspection, all of the above have the same drawback. At the time that thread A inspects an object revised by tread B, thread A will carp. Thread B might then inspect an object revised by thread C, and carp. If thread A had slept and waited, and provided that thread B clean up it’s mess, thread A would have committed. Chaotic.

Unless one devises a complicated scheme by which, thread A waits to see what thread B does, and does so in a way that eliminates the possibility that thread B might then wait to see what thread A does, the chances that opportunistic locking will detect conflicts that would not have been detected had the thread scheduler been more auspicious.

Of course, one could simply go in alphabetical order by the name of the Bin, followed by the Index of the Bin assuming a list of Index, followed by the ordered name of each Join associated with the Bin. When you detect a subsequent version that has not committed, you sleep on some fancy concurrency structure that will wake you after any other tread commits, and try your luck again.

The search to see if the version has committed in the B+Tree of mutations is not expensive, considering that the next thing to happen, if the results are negative, is an exception condition that will require that the entire mutation be repeated.

For now, let’s not be particular. If there is a subsequent version, mutating, committed, or rolled back, the lock is not opportune.

It could be that an operation would time out. If it were the case that two operations failed, because they were running in opposite directions, then when they restarted they might repeat the folly. The application might have a retry limit, and the application would probably panic if it was reached.

All of this checking on the state of the persistence B+Tree means that we’re going to have to have a better understanding, of the still not very well documented, and not at all implemented, intricacies of Strata concurrency. One would have to lock the leaf that contains the record sought exclusively.

In fact, one would have to see that all the records were on the same page, so that the version should not be part of the key. Strata will place duplicates on the same leaf page, so if the version is not part of the key, we can lock the leaf exclusively, read it to check for later versions, and then insert the latest version, knowing that no one else is doing same.

Event though the version is not part of the key, it will be in order, because when there is a later version then the one we want to insert, we raise and exception. If not, if we were to leave rolled back records in place, and check that the mutator rolled back, the last inserted and committed is always the newest.

I thought I had the plan, though, with in-memory histories. Bummer.

« Previous Entries Next Entries »