The WebFunds ODB

Intro

This document is a design and use manual for the WebFunds Lucaya (4G) Object Database.

The WebFunds ODB was written to support a multi-threaded version of WebFunds and allow live, real time access in both directions (client to server to client). Within the open source Java distribution, the ODB is located in package webfunds.orb.

Status

The status of this document is that it is a Working-Document-In-Progress. That is, it is always likely to be live and interspersed with questions and shortfalls.

Such shortfalls or questions to be addressed are interspersed thusly.

Credits

ODB was designed and implemented by Jeroen van Gelderen, presumably with help from Edwin Woudt. This document was written by Jeroen van Gelderen with editing by Ian Grigg.

ODB is currently in use with WebFunds Lucaya.

Programming Model

Running a Transaction

This is how it is done:

    public WfThing updateThing(WfThingId thing, WfValue value)
    {
        /*
         *  The outside-ODB part of the Thing update method.
         *  Create a special class that is ODBable, and then
         *  ask ODB to execute it.
         */
        UpdateThing updater = new UpdateThing(thing, value);
        try {
            ODB.execute(updater);
        } catch (Exception ex) {
            _logger.debug("ODB - transaction failed ", ex);
            return null;
        }
        return updater.getThing();
    }
    
    private class UpdateThing  
        extends ODBProcedure
    {
        private final WfValue   _value;
        private final WfThingId _thing;
                           
        private WfThingId _newThing = null;
                
        WfThingId   getThing()         { return _newThing; }

        private UpdateThing(WfThingId thing, WfValue value)
        {
            _thing = thing;
            _value = value;
        }
  
        protected void execute()     // in ODB transaction now!!! 
        {
             ODBRef thingRef = ODB.refFor(_thing);
             WfThing updateable = thingRef.resolveWrite();
             updateable.setValue(_value);
             _thing = updateable.copy();       // needs to be implemented?
             // transaction terminates now.
        }
    }

A non-transaction method should have a private class available to it that implements the ODBProcedure interface that can then be passed to ODB to be executed. Communication back and forth is possible by means of arguments in (at the beginning) and method calls out (at the end).

It seems that transaction code should be limited to read & recover & store objects when they have changed, and make no decisions. Yet, if "pretty much everything is in a Swing transaction ..." how do we get out of it to do some work?

I.e., what is the canonical pattern here...

The Transaction

The code that is in the method ODBProcedure<UpdateThing>.execute() is executed as a transaction. This means that it either completes, and all objects are updated, or it fails, and no objects are updated.

The objects that are updated at the end are the ones which are resolved for writing:

    WfThing updateable = thingRef.resolveWrite();

In essence, the code should acquire the objects from the ODB database (resolve for write), and update them. Then, if an exception is thrown, these objects are thrown away and no harm is done. In contrast, if the code gets to the end, all are written to the ODB, and thus they are all updated.

This means, for example:

Fresh copies have to be acquired at the beginning.
All updates should be idempotent, as ODB has the ability to decide on its own merits to restart the transaction from the beginning, after an arbitrary number of steps.
If the transaction launches any side-effects (such as threads), they should also be idempotent and non-essential, as there is no guaruntee as to how many times (0,1,n) the side-effects are launched.

Basic Rules

Any code can run a transaction, but transactions cannot be nested.

From static space, outside objects?

Can transactions be chained? That is, can a transaction cause the setting off of another transaction, if it terminates (successful? unsuccessful?) ?

"Nothing would prevent this from being implemented."

Crossing the Transaction Border

XXX:

Communicating between the transaction space and the non-transaction space.

From Inside to Outside

When in a transaction, actions of a non-transaction nature must be started by:

Starting a thread. All transactions are thread based, so every new thread starts out as outside a transaction. Thread creation carries the danger of creating side effects, which will not be rolledback on transaction failure.
```
It is possible to override an afterCommit method if you
want something to happen after the transaction is committed.
    
```
what is an afterCommit method? Who can call it?
Having the caller start the action when the transaction terminates.
Register an EventListener. A non-transaction object can register itself as a listener in the transaction object. When the ODB-controlled object is committed at the end of the transaction, a new transaction is started, and each listener is called with an ODBRef to the object.

Each registered listener gets a chance to query the new state of the now updated object. Note that if the listener changes the object, this will trigger a new call to the listener, ad infinitum.
@see ODBRef.addRefListener(ODBRefListener)
@see ODBRefListener.refChanged(ODBRef)

Data can be passed out of the transaction only by copying the data into non-transaction objects. These are objects that have no ODBRef context.

In particular, this means that there is no way to directly do an Event model from an ODB transaction to for example the Swing objects. Instead, an object must change itself, and the Swing object listener must query the object to see what the event is. To implement a message passing pattern, create a transaction object that just holds each message. The writer is another transaction object, and the reader is the listener, which must also write the message down off the queue, and receive an 'empty' listen wakeup!

From Outside to Inside

Objects and data can happily be copied in via arguments to the ODBProcedure constructor. However, if any such objects are placed within the ODB context, by use of the ODB.makePersistent() method, they would now be captured within transaction space.

Does this mean that the transaction code should make a copy? of the the object?

Coordination Between Transactions

ODB enforces locking on multiple access to an object, when the object already has a writer. The model is one writer only, or multiple concurrent readers.

The strategies for implementing locking are undefined, and may be subject to configuration. The techniques relate to deadlocks, and may include:

Blocking for a short period of time (where "short" is undefined).
Throwing an internal resolveWriteLockException, and restarting the transaction transparently at a later time.
Limiting all transactions to be single-threaded, so only one transaction ever enters looking for locks. (This is a good debugging option.)
Restarting the transaction with an ODB-wide lock.

There is no direct effect on the caller, as no catchable exception is thrown on lock or deadlock detection.

These can be tweaked somewhat in the ODB implementation without affecting the ODB API contract.

Coordination Outside Transactions

Swing

When ODB is used in a Swing application such as WebFunds, Swing should not be blocked and thus any transactions executed from Swing should not block either.

Inside Swing is by definition inside an ODB transaction. Every Swing event is intercepted by ODBEventQueue and an ODB transaction is placed around it, so everything in Swing can expect an ODB context.

Which means that code needs to know that it is being fired from a Swing transaction, so as to avoid re-invoking a transaction!?

Code that is isolated from Swing by being run in a separate independent thread will not be effected by this, but should be careful not to block with writable resources that other threads need to access.

Currently, ODB is configured to sequentialise all transactions, so blocking could block all users of ODB. This is (will become) a configuration issue.

Java's `Synchronized` Regime

When in an ODB transaction, there is in general no need to synchronize. And, in practice, methods and objects in an ODB framework should not be synchronized, as there may be side-effects.

Is this the case?? To be written in more depth!

Programmer Rules

Acquire fresh ODB-managed objects at the start of a transaction, and do not retain them after transaction completion.
If you break (1), make sure the objects you do pass across the transaction border are not mutable. In general it is better to not break rule (1).
If Rules (1), (2), are broken, then do not rely on equality.
resolveRead and resolveWrite will never return null. They will panic if the object under consideration has been removed from the database. In the future we plan to introduce the likes of resolveReadOrNull and resolveWriteOrNull.
ODBProcedure instances can be used to pass parameters into a transaction and to pass results out of a transaction. This makes it easy to spot locations that need to be audited for compliance with (1), (2), (3).
The programmer must not cause observable side-effects which cannot be undone during transaction rollback. In the case of message sending you would queue the message until the transaction is complete and committed. Then you'd send out the messages in the queue.

how does a routine know if it is in a transaction?
i.e., is it possible to say:
assert(!ODB.inTransaction());

Design

Some notes on the design of Webfunds ODB. The design space for ODB was large and many design decisions interact in such a way that it is difficult to pick a point to start from. Here is one attempt...

Constraints

ODB must be written in Pure Java.
Support large (multiple GB) databases efficiently.
Fast enough for use in an interactive application, in particular with Swing executing ODB transactions in response to high-frequency events.
Scale up to server performance by increasing throughput with bounded latency.
Support concurrent transactions.
Full support for ACID properties when combined with the right store. (Level of Durability depends on the store but ODB always enforces Atomicity, Consistency and Isolation.)
Support a variety of objects encodings, at least Serialization and the WebFunds-specific WireObject framework.
WODB must be completely independent of the rest of WebFunds.

Desiderata

Strong support for debugging. The less testable an invariant is, the more we need debugging support.
Transparent store encryption with AES (not for all ODBStore implementations).
Transparent dual or multiple storage for mirroring and replication over "conveniently located storage devices" such as dual drives (not for all ODBStore implementations).
backup strategy: export, import paths.

Analysis

Memory Management

WODB cannot rely on the stored data always being completely in-memory because we wish to support stores much larger than available memory. It also implies that we cannot scan over the whole database on a regular basis.

We cannot track when and where the programmer stores ODBRefs because we can not in practice intercept this information.

Non-ODB objects (non-persistent objects) must be able to hold ODBRefs.

THUS: we cannot offer garbage collection of no-longer-referenced ODB objects and thus the programmer must manually delete objects from ODB. (This is a potential space leak that can be attacked fairly successfully.)

THUS: We have to use immutable ODBRefs because we cannot update existing ones. (We don't know where they are and the store is assumed to be too big to efficiently do this if we could.)

Syntax

Since we are restricted to Pure Java we must write every bit of WODB in Java itself. This in turn requires the use of manual object resolution because we cannot intercept the JVM's object references.

THUS: the programmer explicitly resolves references to objects.

THUS: we need a programmer-friendly, lightweight WODB syntax (little language)

Given the overhead such manual resolution imposes, both in the number of characters to be typed by the programmer and the computational overhead, we do not want to do this for every individual Java object.

Access and Performance

Loading an object from the database can be made fast (through disk layout, caching, prediction, hardware improvements and programmer assistance, in that order of preference) but the cost of an WODB access cannot be made negligible. Even if we could hold each individual Java object in the store we would rather want to amortize the database access cost over a (smallish) graph of objects instead.

THUS: we store object graphs instead of individual Java objects. Note that it is perfectly fine to store a graph consisting of a single object if you want to.

THUS: This is why these pesky ODBRef<>s exist. They allow the programmer to delineate object graphs.

PERF: This means we read and write larger objects to the database. On average you won't notice a performance degradation because for small reads and writes the fixed disk overhead (disk seek + user/kernel transition) dominates. All reads/writes smaller than a few pages take the same amount time. In all cases the fsync time dominates. (Thus it pays to batch transactions and amortize the cost of a single fsync over as many transactions as allowed by your latency budget. This is a server opt.)

PERF: Presently, if an object is in the memory cache, a load can be satisfied with just two hashtable lookups.

We wish to support multiple backend implementations of the WODB store so that different performance requirements can be met. We envision a compacting filesystem-based log-structured store for client-side use (StopAndCopyStore) and a store based on direct disk access for high-performance in servers.

THUS: we have an ODBBackend interface and a variety of backend store implementations. We currently have a log-structured file-based and an in-memory store.

Equality and Rollbacks

When an exception causes a transaction to terminate and not complete (therefore, implicitly rollback), the objects within the transaction are left in a partial state. This leads to difficulties if any other agent - outside this now terminated transaction - has reference to the object.

We cannot roll back the state of a Java object unless the object explicitly implements support for rollback. Such a rollback interface is not defined in ODB as it is difficult to do, thus, when we use the term here, the rollback is an implicit one at the object level, and does not refer to actual changes to the data in that object. (These difficulties derive from Java's language safety which does not allow such things as constructors being applied twice or even simply overwriting of an object's [private/protected] state.)

In particular, a rolled-back object will not satisfy instance comparison with an equality operator or method. I do not consider this to be a big problem as the use of == is nearly always ill-advised. But, it is important to know that ODB will break code that relies on identity comparisons of ODB-managed objects. (We probably do need some debugging support for this.)

Explanation

Let's say I have an ODBRef rA which points to an object A that is stored in the ODB. A transaction T is executing and it performs:

   A myA = rA.resolveWrite();
   myA.updateVariableOfA();
   myA.updateAnotherVariableOfA();
   throw new Exception("failed");  // some error condition detected

We are now in a position where we need to undo the effect of the two update calls because the transaction needs to abort, and we need do so without cooperation of A. The desired pre-modification state of A exists as a sequence of bytes on-disk but there is no[1] way we can take those bytes and put them in A! All we can do is create a new object A' from those bytes which gives us the object state we desire. But now (A == A') is false, even though logically we are speaking of a single object. Nobody should use A because it is stale.

Observe that, in theory, this would not apply to immutable objects: immutable objects never need to be rolled back to a previous state and thus they could always retain the same instance, A.

Now fortunately, rollbacks never happen inside a transaction, only in-between transactions. Since only the rolback causes (A != A'), this condition is never observable by code inside a transaction context. And because the rule is to never let ODB-managed objects (such as A) escape outside a transaction context, you are never in a position to do a (==) comparison between A and A'.

(A development/debugging option which changes ODB to always return newly instantiated objects is desirable.)

[1] Various 'solutions' can be imagined: using the ... permission, or implementing a BackRollable interface. None of these seem very satisfactory. The former requires changing JDK configuration files, the latter precludes us from storing ODB-unaware objects in ODB.

Transaction Begins and Ends

WODB cannot infer when transactions are to begin and when they end.

THUS: the programmer must indicate this manually (see below).

In order to support rollback and transaction isolation we must be in a position to catch all exceptions that are thrown during transaction execution. We must be able to unambiguously match transaction start with transaction end. Since this is critical for ODB correctness we can NOT leave this to the programmer.

THUS: the programmer must explicity call ODB.execute with an ODBProcedure to apply a transaction to the database. This indicates precisely where the transaction is to begin (before ODBProcedure.execute is called) and when it is to end, either by being committed (after ODBProcedure.execute sucecssfully completes) or by being rolled back (when ODBProcedure.execute throws an exception). Since Java has rather limited support for anonymous closures we cannot do this more elegantly.

What is an anonynmous closure? what is a closure?

Additional Issues

Outline - things to document


ODBRef

Stable reference to an object graph in the ODB store.

ODB

Main interface for initialization and transaction execution.

ODBProcedure

Parent class of all closures that are executed as ODB transactions.

ODBBackend

Common interface for all backend stores.

Musings on Dependencies - Design Points 7,8.

Wire

At the the current state of development, there is a bit of a battle for supremacy going on between ODB and Wire. Both are trying to be independent, and both are currently drawn to be dependent on each other.

ODB ==> (depends on) Wire because it uses the format, as the (preferred?) way to save objects. This is not necessary but may be inevitable, as all on-the-wire SOX packets have this as their (future) standard. A method of independence from the vagueries of the JVM/language is needed for SOX, and highly advised for every internal persistent object.

Wire ==> ODB when an Wire object decides to save ODBRefs, or when it calls ODB.DECLARE() methods (which require the object to be in the ODB database already).

This only gets manifested when examples are created and tested, but these latter are the province of Wire testing. In this case, it seems that the classes being tested for Wire capability are also totally dependent on ODB. C.f., WfNameManager which makes calls and thus cannot exist outside an ODB framework.

Question: how much of Lucaya has a life outside the ODB framework? WfNameManager does not. Is this a policy or a bug?

It seems that ODB ==> Wire is the natural order of things. That is, it is more sustainable to have Wire independent as all of SOX uses it.

Question: Would it be possible for WireObject interface to be totally independent of the current Wire methods, and for an implementation of a WireObject to choose for example Serialisation internally or Wire? So, thus, WireObject becomes the only ODB interface, and Wire becomes the SOX way, with Serialization as an alternate and possibly others?