What is JEMM?

Put simply JEMM is 'persistent shared memory without the pain'. With JEMM (Java Extended Memory Model), persistent data is modelled using POJOs. To create an object you use new and leave it to the garbage collector to dispose of. There are no mapping files and no weird rules. Sharing data between applications behaves like multi-threading and the same rules apply.

An Entity can be referenced in more than one JVM, synchronized methods on entities are synchronized not only on the local JVM, but across all JVMs with access to that object.

Objects will outlive the life of the JVM that created them if they are referenced from another object that is referenced from a root.

Entity (the persistent objects) liveness is defined in the same way normal objects. If it is reachable, it is alive, once it is no longer reachable it will be garbage collected.

Key Concepts

Entities

What is an entity

Entities are your POJO objects, the Person, or Account or whatever else you are modelling in your application. You create them with new and then are garbage collected when unreferenced. Unlike a normal POJO however these objects are transparently enhanced.

Writing entities

Writing an entity is exactly the same as writing a normal POJO with a couple of extra restrictions. The main restriction is that the objects cannot hold references to non-persistent objects (e.g. a File handle or similar.) unless its transient. You can think of this as similar to serialization. For an object to successfully serialize an object it is only allowed to contain references to other serializable objects. As with an object expecting to be serialized, methods on the object may pass in non serializable fields and return such fields.

As with all multi-threaded code (and especially in JEMM) you must be careful with multi-threading. Java synchronization performs two tasks within the JVM, mutual exclusion blocks (one thread may access the block at a time) and change visibility (when one threads changes are visible to another).

JEMM updates the synchronization behaviour of entities to make synchronization (both locking and visibility) work between multiple JVMS. If you update a variable outside a synchronized block in one JVM, the change may not be visible to a different thread in a different JVM.

It is still possible to get into a dead lock with JEMM. The Server does not act like a DBMS and make one of the attempts to acquire a lock fail after a time.

For more information on thread synchronization you can get a quick overview from Wikipedia or if you are feeling brave read the Java Language Specification .

When is an entity garbage collected?

Entities are handled in exactly the same way as normal objects. Liveness is managed by reachability. In the JVM an object is considered live if it is stored in a local variable (i.e. on the stack), stored in a static variable, or stored in a field of an object which is itself live.

Within JEMM we add one more form of reachability, the concepts of roots (discussed in more detail below). Roots are like persistent static variables. Once a JVM is stopped all the other forms of reachability end with it, roots continue to be valid even when no clients are running. With a persistent store a root exists until its deleted.

Roots

Roots are the access point for an application to store data. When no JVMs are running only data that is referencable from a root will be kept.

Roots are accessed through the session and three methods are provided:

  • Object getRoot(String name) - Retrieve the current root value.
  • void setRoot(String name,Object value) - Set a new root reference.
  • Object setRootIfNull(name,Object value) - Set a new root reference if the root is currently unset, otherwise return the current root value.

Stores

What is a Store?

Stores are the backend infrastructure that do the work of managing your persistent data. A store must be defined at application startup and cannot be reset once the application is running.

Available Stores

There are three Stores currently available that serve different purposes.

Memory
        Session.setStore(new MemoryStore());

The MemoryStore is a local JVM store that never writes data to disk, all Entities stored in it will be lost on JVM shutdown. It is mainly useful for unit testing, but its also faster than a persistent store and could be useful for caches on servers with lots of memory.

Persistent
        Session.setStore(new PersistentStore(dataDir));

The persistent store is a single JVM store that persists the data into the given data directory. It is useful for applications that want to persist data easily, but does not supply any cross JVM support. You should never create two processes that both share a PersistentStore if they would run at the same time, the file system is not locked and both writing would result in corruption.

Remote
        Session.setStore(new RemoteStore(hostname,port));

The remote store is the main JEMM store, in conjunction with the JemmServer it provides the full JEMM experience. Objects stored using the remote store and the JemmServer are persistent and shared.

How is JEMM different from Hibernate/JPA?

ORM (Object Relation Mapping) tools such as Hibernate, iBatis suffer from what is known as an 'impedence mismatch' . On the java platform we use synchronized blocks, classes and objects and in the database world you work in transactions and tables and selects. ORM frameworks do a lot of work to hide the complexity of mapping classes and objects to tables, but suffer from various abstraction leakages which impact the domain model design.

The ORM frameworks do not generally support the same type of locking and lifecycle management (deletion of unused objects) is left as an exercise to the developer. The use of cascading and delete is not as easy to use as getting rid of an Object inside of Java.

Larger scalability can also be a problem with databases. There are solutions to scaling beyond a single database instance (caching and sharding) but both have their own gotchas and complexities and again impact the design of the model, yet further abstraction leakages. See below for how JEMM approaches this problem.

Whilst ORM is a step up from raw SQL we see the basic underlying approach as flawed. We built JEMM because we believe there is a fundamentally better way of handling persistent data. We start with the premise that persistent objects should be just like normal objects. Objects have a simple lifecycle which is well understand by developers. Handling multi-threaded code with use in two or more process should be exactly the same as handling two threads within the same process with as little noise as possible.

How does JEMM work?

JEMM works by using a javaagent to rewrite your entity class bytecode when it is loaded to capture object structure and to intercept object construction, method entry and exit and synchronized blocks. Doing this gives JEMM a very fine control of the object.

JEMM ensures that when a user thread is with an object that all of the fields are resolved and can be used as with any normal class. When all threads have exited an object the object can be dehydrated, all references to other objects are released and held in an internal form as a simple ID. If an object is re-entered by a user thread the fields are re-hydrated before the method executes.

This way the only persistent objects that are held in the JVM memory are those actively in use, the persistent object graph can be much larger than the JVM memory size because it is never all loaded at the same time.

When a synchronized block is entered the store ensures that the object state is consistent before allowing the block to start (all changes from other threads/JVMs are captured), when the synchronized block exits the store updates the persistent data with any deltas.

JEMM introduces a small but growing list of specific classes that offer scale above that of the normal Java implementations. The licence with Sun's code forbids the replacement of core Collections, so JEMM implements a serious of replacement implementations. These implementations work with the Store to only load what is needed, allowing stores with many millions of entries which would otherwise overflow memory.