Bento
February 9th, 2007
Authentic c-store Japanese bento by Ramil Sagum.
Malloc
The idea behind Bento it to take a file and divide it up using the malloc algorithm as described by Doug Lea.
Bento is written in Java for JDK 1.4.2 using NIO.
You can read more about Bento on my Wiki and you can paruse the source in my Subversion repsitory and you can use it under LGPL.
Design Goals
Bento is a file format for writing out discrete blocks of data. It is designed for the storage of objects. Bento is not supposed to be a database in itself.
Bento is a robust file format that provides
- storage of blocks of arbitrary size,
- random access to stored blocks,
- atomic writes of multiple blocks,
- thread-safe allocation and deallocation of blocks,
- and thread-safe reads and writes.
Funamentally, Bento strives to be a simple data structure that you can use to persist objects in your applications. You can use Bento as the basis for more complicated data structures or databases.
Usage
Bento can be used to serialize objects using Java serialization, to write out objects as Java primitives, or to store Java primitives.
A single Bento file can be organized into a database of different Java types. For desktop applications, this means that a single file can be the container for many different objects of many different classes.
Multiple Bento files can be organized into a database, spread across different file system, to reduce disk contention.
You can design resuable Bento based data structures and mix them into a single Bento file.
Quick Start
An example program that creates a Bento file with a static header and a single allocated block.
Bento.Creator creator = new Bento.Creator();
creator.addStaticBlock(URI.create("flag://agtrz.com/header"), 512);
Bento bento = creator.create(new File("data.bento"));
Bento.Mutator mutator = bento.mutate();
try {
Bento.Block block = mutator.allocate(512);
ByteBuffer bytes = block.toByteBuffer();
for (int i = 0; i < 512 / 4; i++) {
bytes.putInt(i);
}
block.write();
Bento.Block header = mutator.load(URI.create("flag://agtrz.com/header"));
ByteBuffer bytes = header.toByteBuffer();
block.getAddress().write(bytes);
header.write();
mutator.commit();
}
finally {
mutator.rollback();
}
bento.close();
Bento.Opener opener = new Bento.Opener();
bento = opener.open(new File("bento.data"));
ByteBuffer header = bento.read(URI.create("http://agtrz.com/header"));
ByteBuffer bytes = bento.read(new Bento.Address(header));
for (int i = 0; i < bytes.capacity() / 4; i++) {
if (i != bytes.getInt()) {
throw new IllegalStateException();
}
}
bento.close();
Mutation
A Bento file is written using a Bento.Mutator. All of the changes performed by a mutator are atomic. The changes are not made perminent until the commit method of mutator is called. The changes can be discarded by calling the rollback method of the mutator.
After calling commit or rollback the mutator can be reused. Calling commit or rollback when there are no uncommited changes held by the mutator does nothing. Thus, we use a try/finally block to recover the state of a Bento file when a failure occurs.
public void storeObjects(Bento bento, Bento.Address address) {
Bento.Mutator mutator = bento.mutate();
try {
Bento.Block block = mutator.load(address);
block.toByteBuffer().putInt(this.magicNumber);
block.write();
mutator.commit();
}
finally {
mutator.rollback();
}
}
Concurrency
Bento synchronizes the list of free blocks, but does not synchronize the blocks themselves. Blocks can be allocated and freed in a thread-safe manner without further consideration by the application developer.
If you read and write blocks without some syncrhonization strategy, you’ll turn your Bento file into garbage in short order.
Multiple threads can read the same blocks, but blocks that are being written should not be written or read by other threads. Reads can be shared. Writes are exclusive.
A simple strategy is to use a read/write lock from the concurrency library to gaurd the file during updates.
Bento strives for liveness, however. It places as few locks as possible on the management of list of free blocks.
A more sophisticated strategy would place locks on the blocks themselves. A composite object, such as a tree, could lock a tree root element. Different threads can then operate on different trees in the same file. They would not run afoul of each other.
Atomic Writes
A Bento file is edited using a mutator the object. The mutator object keeps all block writes, allocations, and deallocations are kept in memory until the user commits or rolls back the changes made by the mutator.
Journaling
All reads, writes, allocations and frees are performed on buffers in memory. Only when a mutation is committed or rollback, is the underlying file written.
Journaling is implemented by writing out the list of operations into a journal prior to applying the changes to the file.
The journal can be implemented using blocks within the file, or it can be implemented using external temporary files. The former is good for desktop applications, where the user may move a corrupted Bento file out of reach of the external journals, or unwittingly delete the external journals. The latter is good for a server application, where multiple Bento files and journals can exist in a single directory, watched over by an administrator.
comments
B-Tree
September 8th, 2006I’m implementing a B-Tree in Java. The project is called Strata. You can view the source, is is in the strata subdirectory of my subversion repository.
I’ve yet to choose a license.
This tree is the first indexing option in Momento, which is a large file XML database that I wrote last year, when I was in Ann Arbor.
Getting things done in Java is a challenge. There are many ways to get destracted.
Strata will index the normalized text of a specific node, and do so without concern about range queires. The only query supported at the outset is equality.
Then I can use Lucene to do full text indexing. In order for Lucene to be effective, you see, it must be a simple matter to retrieve the document. Indexing unique atom identifers, would give a key to fetch the document resulting from a Lucene query.
The research that I’m doing on B-Tree implemetnation can be tracked with my del.icio.us account, under the tag strata.
Thus, the objectives for Strata are as follows.
- Leaf nodes only to store references to file positions, not store the data itself.
- Branches also store only references to file positions, and hold a list of objects read from those file positions to use as keys.
- Thread-safe.
- File backed, paged.
- Supports efficent insertion and deletion, stays balanced.
- Equality comparison query.
- Compares byte buffer ranges, not objects.
At some point, it may be better to keep partial keys in the tree itself.
Update: I’ve written a description of the tree structure at my user page on the New Orleans Wiki, under a Software Development page.


