2014/02/06

jBPM 6 - store your process variables anywhere

Most of jBPM users is aware of how jBPM stores process variable but let's recap it here again just for completeness.

NOTE: this article covers jBPM that uses persistence as without persistence process variables are kept in memory only.

 jBPM puts single requirement on the objects that are used as process variables:
  • object must be serializable (simply must implement java.io.Serializable interface)
with that jBPM engine is capable to store all process variables as part of process instance using marshaling mechanism that is backed by Google Protocol Buffers. That means actual instances are marshaled into bytes and stored in data base. This is not always desired especially in case of objects that are actually not owned by the process instance. For example:
  • JPA entities of another system
  • documents stored in document/content management system 
  • etc
Luckily, jBPM has a solution to that as well called pluggable Variable Persistence Strategy. Out of the box jBPM provides two strategies:
  • serialization based, mentioned above that actually works on all object types as long as they are serializable (org.drools.core.marshalling.impl.SerializablePlaceholderResolverStrategy)
  • JPA based that works on objects that are entities (org.drools.persistence.jpa.marshaller.JPAPlaceholderResolverStrategy)
Let's spend some time on the JPA based strategy as it might become rather useful in many cases where jBPM is used in embedded mode. Consider following scenario where our business process uses entities as process variables. The same entities might be altered from outside of the process and we would like to keep them up to date within the process as well. To do so, we need to use JPA based strategy for variable persistence that is capable of storing entities in data base and then retrieving them back.
To configure variable persistence strategy you need to place it into the environment that is the used when creating knowledge sessions. Note that the order of the strategies is important as they will be evaluated which one will be used in the order they are given. best practice is to always set the serialization based strategy to be the last one. 
An example how you can use it with RuntimeManager:


// create entity manager factory
EntityManagerFactory emf = Persistence.createEntityManagerFactory("org.jbpm.sample");

RuntimeEnvironment environment = 
     RuntimeEnvironmentBuilder.Factory.get().newDefaultBuilder()
     .entityManagerFactory(emf) 
     .addEnvironmentEntry(EnvironmentName.OBJECT_MARSHALLING_STRATEGIES, 
          new ObjectMarshallingStrategy[]{
// set the entity manager factory for jpa strategy so it 
// know how to store and read entities     
               new JPAPlaceholderResolverStrategy(emf),
// set the serialization based strategy as last one to
// deal with non entities classes
               new SerializablePlaceholderResolverStrategy( 
                          ClassObjectMarshallingStrategyAcceptor.DEFAULT  )
         })  
     .addAsset(ResourceFactory.newClassPathResource("cmis-store.bpmn"), 
               ResourceType.BPMN2)
     .get();
// create the runtime manager and start using entities as part of your process  RuntimeManager manager = 
     RuntimeManagerFactory.Factory.get().newSingletonRuntimeManager(environment);

Once we know how to configure it, let's take some time to understand how it actually works. First of all, every process variable on the time when it's going to be persisted will be evaluated on the strategy and it's up to the strategy to accept or reject given variable, if accepted only that strategy will be used to persist the variable, if rejected other strategies will be consulted.

Note: make sure that you add your entity classes into persistence.xml that will be used by the jpa strategy

JPA will accept only classes that declares a field with @Id annotation (javax.persistence.Id) that allows us to ensure we will have an unique id to be used when retrieving the variable.
Serialization based one simply accepts all variables by default and thus it should be the last strategy inline. Although this default behavior can be altered by providing other acceptor implementation.

Once the strategy accepts the variable it performs marshaling operation to store the variable and unmarshaling to retrieve the variable from the back end store (of the type it supports).

In case of JPA, marshaling will check if entity is already stored entity - has id set - and:

  • if not, it will persist the entity using entity manager factory that was assigned to it
  • if yes, it will merge it with the persistence context to make sure up to date information is stored
when unmarshaling it will use the unique id of the entity to load it from the database and provide as process variable. It's that simple :)

With that, we quickly covered the default (serialization based) strategy and JPA based strategy. But the title of this article says we can store variables anywhere, so how's that possible?
It's possible because of the nature of variable persistence strategies - they are pluggable. We can create our own and simply add it to the environment and process variables that meets the acceptance criteria of the strategy will be persisted by that given strategy. To not leave you with empty hands let's look at another implementation I created for purpose of this article (although when working on it I believe it will become more than just example for this article).

Implementing variable persistence strategy is actually very simple, it's a matter of implementing single interface: org.kie.api.marshalling.ObjectMarshallingStrategy

public interface ObjectMarshallingStrategy {
    
    public boolean accept(Object object);

    public void write(ObjectOutputStream os,
                      Object object) throws IOException;
    
    public Object read(ObjectInputStream os) throws IOException, ClassNotFoundException;
    

    public byte[] marshal( Context context,
                           ObjectOutputStream os,
                           Object object ) throws IOException;
    
    public Object unmarshal( Context context,
                             ObjectInputStream is,
                             byte[] object,
                             ClassLoader classloader ) throws IOException, ClassNotFoundException;

    public Context createContext();
}

the most important methods for us are:

  • accept - decides if this strategy will be responsible for persistence of given object
  • marshal - performs operation to store process variable
  • unmarshal - performs operation to retrieve process variable
the other remaining are for backward compatibility reasons with old marshaling framework prior to protobuf, so it's not mandatory to be implemented but it's worth to put the logic there too as most likely it will be same as for marshal (write) and unmarshal (read).

So the mentioned example implementation is for storing and retrieving process variables as document from Content/Document management systems that support access to the repository using CMIS. I used Apache Chemistry as the integration component that can easily talk to CMIS enabled systems like for example Alfresco.


So first bit of requirements:

  • process variables must be of certain type to be stored in the content repository
  • documents (process variables stored in cms) can be:
    • created
    • updated (with versioning)
    • read
  • process variables must be kept up to date
so all these sounds simple and of course that's the point to keep it simple at this point. CMS can be used for much more but we wanted to get started and then enhance it if needed. So the implementation of strategy org.jbpm.integration.cmis.impl.OpenCMISPlaceholderResolverStrategy supports following:
  • when marshaling
    • create new documents if it does not have object id assigned yet
    • update document if it has already object id assigned
      • by overriding existing content
      • by creating new major version of the document 
      • by creating new minor version of the document
  • when unmarshaling
    • load the content of the document based on given object id
So you can actually use this strategy for:
  • creating new documents from the process based on custom content
  • update existing documents with custom content
  • load existing documents into process variable based on object id only
These are very high level details but let's look at the actual code that does that "magic", let's start with marshal logic - note that is bit simplified for readability here and complete code can be found in github.


public byte[] marshal(Context context, ObjectOutputStream os, Object object) throws IOException {
 Document document = (Document) object;
 // connect to repository
 Session session = getRepositorySession(user, password, url, repository);
 try {
  if (document.getDocumentContent() != null) {
   // no object id yet, let's create the document
   if (document.getObjectId() == null) {
    Folder parent = ... // find folder by path
    if (parent == null) {
     parent = .. // create folder
    }
    // now we are ready to create the document in CMS
   } else {
      // object id exists so time to update     
   }
  }
 // now nee need to store some info as part of the process instance
 // so we can later on look up, in this case is the object id and class
 // that we use as process variable so we can recreate the instance on read
     ByteArrayOutputStream buff = new ByteArrayOutputStream();
     ObjectOutputStream oos = new ObjectOutputStream( buff );
     oos.writeUTF(document.getObjectId());
     oos.writeUTF(object.getClass().getCanonicalName());
     oos.close();
     return buff.toByteArray();
 } finally {
  // let's clear the session in the end
  session.clear();
 }
}

so as you can see, it first deals with the actual storage (in this case CMIS based repository) and then save some small details to be able to recreate the actual object instance on reading. It stores objectId and fully qualified class name of the process variable. And that's it. Process variable of type Document will be stored inside content repository.

Then let's look at the unmarshal method:


public Object unmarshal(Context context, ObjectInputStream ois, byte[] object, ClassLoader classloader) throws IOException, ClassNotFoundException {
 DroolsObjectInputStream is = new DroolsObjectInputStream( new ByteArrayInputStream( object ), classloader );
 // first we read out the object id and class name we stored during marshaling
 String objectId = is.readUTF();
 String canonicalName = is.readUTF();
 // connect to repository
 Session session = getRepositorySession(user, password, url, repository);
 try {
  // get the document from repository and create new instance ot the variable class
  CmisObject doc = .....
  Document document = (Document) Class.forName(canonicalName).newInstance();
  // populate process variable with meta data and content
  document.setObjectId(objectId);
  document.setDocumentName(doc.getName());   
  document.setFolderName(getFolderName(doc.getParents()));
  document.setFolderPath(getPathAsString(doc.getPaths()));
  if (doc.getContentStream() != null) {
   ContentStream stream = doc.getContentStream();
   document.setDocumentContent(IOUtils.toByteArray(stream.getStream()));
   document.setUpdated(false);
   document.setDocumentType(stream.getMimeType());
  }
  return document;
 } catch(Exception e) {
  throw new RuntimeException("Cannot read document from CMIS", e);
 } finally {
  // do some clean up...
  is.close();
  session.clear();
 }
}

nothing more that the logic to get ids and class name so the instance can be recreated and load the document from cms repository and we're done :)

Last but not least, the accept method.


public boolean accept(Object object) {
    if (object instanceof Document) {
 return true;
    }
    return false;
}

and that is all that is needed to actually implement your own variable persistence strategy. The only thing left is to register the strategy on the environment so it will be evaluated when storing/retrieving variables. It's done the same way as described for JPA based on.

Complete source code with some tests showing complete usage case from process can be found here. Enjoy and feel free to provide feedback, maybe it's worth to start producing repository of such strategies so we can have rather rich set of strategies to be used...