Cloud runtime architectures for jBPM

In the days where more and more software is moving to cloud, I'd like to take a moment (or two) to describe various runtime architectures that jBPM can be deployed with.

This article mainly talks about version 7 and onwards though some of the aspects are also applicable for version 6.


  • Admin Console - is workbench (or it's lighter version that includes only runtime views) with embedded controller
  • Controller - is KIE Controller used with KIE Servers that are running in managed mode
  • Smart Router - is a optional component that acts as kind of intelligent load balancer as it can both route requests to individual KIE Servers and aggregate data from different KIE Servers
  • Managed KIE Server - KIE Server that is connected to Controller and takes the configuration from the controller, overriding anything it has locally (even included in the image)
  • Unmanaged KIE Server - KIE Server that runs completely standalone, does not require any other component to be fully functional
  • Managed Smart Router - router that is connected to Controller though it owns the right to dynamically update server template it represents 

Architecture 1: Immutable unmanaged KIE Servers with Smart Router and Admin Console

This architecture is as cloud native as possible, it promotes the immutable execution servers paradigm that means both execution server itself and all KJARs should be colocated and included in the image itself (with all dependencies). That way any instance of that service (regardless when it starts) will always be identical.
It then uses Managed Smart Router to benefits from routing and aggregation. Smart Router will dynamically update Controller with new containers coming in so Admin Console can properly setup clients to interact with it.

In this architecture end users interact with KIE Servers always via Smart Router - either by using Admin Console and its runtime views or by another application.

Individual KIE Servers can come and go at anytime and register/unregister in the Smart Router. These KIE Servers might be with the same id - meaning representing the same image or different images. In this case (since images are immutable) they will represent other set of kjars.

As an example, let's look at the diagram above:

  • There are three KIE Servers behind the router - each represents independent image (meaning it has different kjars included)
    • KIE Server - ABC
    • KIE Server - DEF
    • KIE Server - GHI
  • There could be multiple instances of given KIE Server image - multiple PODs in OpenShift terminology
  • KIE Server starts completely independent from each other and smart router (though it will constantly attempt to register in Smart Router in case it was not up at the time KIE Server started)
  • KIE Containers to be started is included in the image

Architecture 2: Immutable managed KIE Servers with Admin Console and optionally Smart Router

Immutable managed architecture is slight variation of the first architecture though all KIE Servers are managed by controller thus Smart Router is then optional component as users can access individual KIE Servers directly as they are managed. Smart Router is needed when end users should be able to look at all KIE Servers at once instead of grouped by server template.

The images are still immutable, meaning they include KJARs that should be running, but the final word what set of KJARs (included in the image) should be started has controller. This is to secure that both Admin Console and KIE Server have same set of KIE Containers defined.

In practice, this architecture requires additional step in the deployment (as part of the deployment pipeline) to create immutable server template in the controller that would match kjars included in the image. Main reason for this is to protect the Admin Console from being affected by wrong image connecting with template id that it should not. So Admin Console completely relies on the controller configuration rather than runtime.

In this architecture, users can use both router or individual KIE Servers to interact with their capabilities. Same rules apply as for first architecture when it comes to KIE Server images and their instances. Whenever new instance of the KIE Server starts it register itself in the controller and optionally in the Smart Router.

Architecture 3: "Empty" managed KIE Servers with Admin Console and optionally Smart Router

Another architecture moves into another direction, instead of making the images immutable it promotes the dynamic behaviour of KIE Server. "Empty" KIE Server means that the image is without any KJARs included, it's pure KIE Server runtime that when started has nothing deployed to it.

With managed capabilities, controller can dynamically instruct KIE Servers what needs to be deployed. So the additional deployment step (as in architecture 2) can be used. By making the server template immutable similar architecture as in 2 can be achieved, though it might be impacted by differences in downloaded artefacts - this might be especially visible in case snapshots are used.

Benefit is that there is single image of KIE Server used (per release version) that, when started, is given list of parameters that defines it behaviour:

  • KIE Server ID that refers to server template in controller
  • switches to turn off capabilities
  • url to controller
  • optionally url to smart router

The main difference here is that controller should have server templates defined (with kie containers) for all the KIE Servers in the diagram:
  • ABC
  • DEF
  • GHI
In case the server template is not there, upon connection from KIE Server such template will be created, though it will be completely empty meaning nothing to execute on it.

Another flavour of this architecture could be that the server template is not immutable and thus at any point in time new kie containers can be added or removed. This will then make the environment completely dynamic which might be actually a good fit for development environments.

Interaction with KIE Servers is exactly the same as for architecture 2, can be done directly to individual servers or via smart router.

Architecture 4: Immutable unmanaged KIE Servers with Smart Router

This is another aspect of the immutable images though simplified as there is no need for controller and thus Admin Console is not used. With that in mind users will still have KIE Server image per set of KJARs to ensure immutability though there is no "managed" client for it.

This architecture targets mainly setups where there will be other components (applications/services) interacting with KIE Servers via Smart Router.

KIE Servers behave exactly the same as in architecture 1 and allow to add new instances or images at any time. Smart Router will constantly update the routing table to make sure it provides access to all available server with efficient balancing.

Architecture 5: "Empty" unmanaged KIE Servers with Smart Router

This architecture is pretty much the same as above (4) though it does provide the dynamic deployment feature. So instead of having immutable images of KIE Server and all KJARs included, it  gives at start "empty" KIE Server image that can be manually (since it is unmanaged) configured - deploy/undeploy KJARs at any point in time. 

This gives the most flexibility but at the same time it requires the most manual configuration of the runtime environment. So it is most likely suitable for simple environments. Though it still might be good fit for some use cases.


Of course final selection of the architecture will depend on number of factors but overall recommendation would be to follow the order of these architectures in the article. 

The most cloud "friendly" seems to be the first one as it nicely fits into the continuous delivery approach with least additional steps. 

Second one adds ability to control individual servers from the controller and use admin console for selected server templates in isolation.

Third, provides really flexible (though not immutable) environment with small number of KIE Server images to managed. Might be an option for certain uses cases where the dynamic behaviour of the business logic is required but does not require complete image redeployment.

Forth, removes the controller and admin console from the landscape so might be a good fit for lighter setups where the business automation is used with another UI and managed completely by the cloud infrastructure - immutable images.

Fifth, is most likely just for the environments where deployment is managed by external system and thus keeps track of all possible KIE Servers being active.

Hope this gives a nice overview and helps with selecting the right runtime architecture based on requirements.


Elasticsearch empowers jBPM

As a follow up on article that introduced NoSQL experimental support for jBPM, this article aims at illustrating one potential integration to enhance search capabilities and potentially routing support for larger environments.

Elasticseach will be used as additional data store where both process instances and tasks will end up being indexed. Please keep in mind that at this point in time it is rather basic integration though has already proven to be extremely valuable. Before jumping into details let's look at what use cases this integration brings:
  • ability to collect process instances and tasks from different sources - e.g. different execution servers connected to different dbs
  • ability to search for process instances and tasks using full text search - indexed values etc
  • ability to search for process instances and tasks by their variables, multiple variables (both name and value) in single request
  • ability to retrieve variables with search results in single request
  • and all other things that Elasticsearch provides :)


The actual implementation to integrate Elasticsearch with jBPM based on PersistenceEventManager hooks is actually simple - it consists of single class that implements EventEmitter interface - ElasticSearchEventEmitter

It utilises Elasticseach REST API - to be precise its _bulk REST endpoint. It does push all events in single HTTP call. This consists of both types of instances
  • ProcessInstanceView
  • TaskInstanceView
all views are serialised as JSON documents. This integration uses:
  • http://localhost:9200 as the location where Elasticsearch server is
  • jbpm as the name of the index
  • processes as the type for ProcessInstanceView documents
  • tasks as the type for TaskInstanceView documents

Location of the Elasticsearch server and name of the index is configurable via system properties:
  • org.jbpm.integration.elasticsearch.url
  • org.jbpm.integration.elasticsearch.index

There is one more file in the project and this is the ServiceLoader services file providing information on emitter implementation for discovery on runtime.

ElasticSearchEventEmitter is delivering actual events in an async way to not hold back thread that was used to execute the process so the impact (performance wise) on process engine is minimal. Moreover thanks to default PersistenceEventManager implementation, this emitter will only be invoked when transaction is completed successfully, meaning in case process instance is rolled back that information won't be in Elasticsearch.


For this who would like to try this out, first of all install Elastisearch on your box (or wherever you prefer as you can point it to any server via system property).
Next, build the elastisearch-jbpm project locally (it's not yet included in the regular jBPM builds) and drop it into KIE Server web app (inside WEB-INF/lib) and that's it!

Now when you execute any processes you will have it's data in Elasticsearch as well so you can nicely query them in very advanced way.

In action

Let's now look at short screen cast that shows this in action. This demo illustrates still rather small subset of data (around 12 000 process instances and 12 000 tasks instances) that will be queried. Anyway, what this will show is:
  • speed of execution
  • query in a way that neither JPA nor jBPM advanced queries allows to do without additional setup
  • data retrieved directly from the query

In details:

  • first search for all active process instances was done in workbench - this uses data sets / advanced queries - though it is slightly slower due to it collects execution errors so that does affect the performance and it's under investigation
  • Then it does the same query over KIE Server REST api - that uses JPA underneath 
  • Last it does the same query over Elasticsearch
  • Next it shows a bit more advanced queries by multiple variables, people assignment etc

What can be found in the screencast illustrates benefits but on small scale, more will be seen where there are several independent execution servers so you can search across them.

Main difference is that Elasticsearch directly returns process instance variables. Similar for user tasks, though it does provide much more information - both task inputs and outputs plus people assignments - e.g. potential owners, business admins and excluded owners.

Expect more integration with other NoSQL data stores to come... so stay tuned.

NoSQL enters jBPM ... as an experiment ... so far

Quite frequently there are questions around jBPM if there is anyway to use NoSQL as data store for persistable setup. From the very beginning persistence in KIE projects (drools and jBPM) was designed to be pluggable. In versions prior to 7 it was though rather tight integration which resulted in dependencies to JPA being still needed. With version 7 persistence layer was refactored (thanks to Mariano De Maio who did majority of work) and enabled much cleaner integration with different (than default) persistence store.

That opened the door for more research on how to utilise NoSQL data stores to benefit the overall projects. With that in mind, we started to think what options are valuable and initial set of them are as follows:

  • complete replace of JPA based persistence layer with another data store (e.g. NoSQL)
  • enhance persistence layer with additional data store tailored with its capabilities 

Replacement of default persistence layer with NoSQL - MapDB

When it comes to the first approach, it's rather self explanatory - it completely replaces entire persistence layer thus freeing it up from any JPA based mechanism. This actually follows Mariano's work on providing persistence mechanism based on MapDB. You can find that work here that provides rather complete replacement of JPA and covers:
  • drools use cases - persistence of KieSession
  • jBPM use cases - persistence of 
    • KieSession, 
    • WorkItem, 
    • ProcessInstance, 
    • Task
  • jBPM runtime manager use cases - mainly around PerProcessInstance and PerCase strategies
  • jBPM services use cases - additional implementation of RuntimeDataService and DeploymentService to take advantage of MapDB store - does not persist all audit log data so some of the methods from RuntimeDataService (like node instances or variables related) won't work
  • KIE Server use cases - an alternative implementation of jBPM KIE Server extension that uses MapDB as backend store instead of RDBMS - though it does have limited capabilities - only operations on process instances and tasks are supported, no async execution (jBPM executor)
The good thing with MapDB is that it's a transactional store so it fits nicely with jBPM infrastructure. 

Though it didn't prove (with basic load tests*) to be faster than RDBMS based store. Quite the opposite it was 2-3 times slower on single box. But that does not mean there is no value in that. 

Personally I think the biggest value of this experiment was to illustrate that a complete replacement of the persistence layer is possible (up to KIE Server). Although it is quite significant work required to do so and there might be some edge cases that could limit or change available features.

Nevertheless it's an option in case some environments can't use RDBMS for whatever reason.

* basic load tests consists of two types of requests - 1) just to start a process with human task, 2) start a process with human task and complete it.

Enhance persistence layer with additional data store

Alternative approach (and in my opinion that brings much more value and less work) is to enhance the persistence layer with additional data store. This means that default and used by the internal services data store is still JPA and thus requires RDBMS though it can be offloaded for certain use cases to another data store as it might be much better suited for that.

Some of the use cases we are exploring are:
  • aggregation of data from various execution servers (different dbs)
  • aggregation of business data and process data
  • analytics e.g. BAM, stream processing, etc
  • advanced search capabilities like full text search
  • replication across data centres for searchability 
  • routing across data centres that runs individual process engines
  • and more... in case you have any ideas feel free to comment

This was sort of possible already in jBPM by utilising event listeners (ProcessEventListenr, TaskLifeCycleEventListener) though it was slightly too fine grained and required to have a bit of plumbing code to deal with how the engine behaves - mainly around transactions. 

So to ease with this work, jBPM provides few hooks to allow easier integration and let developers to only focus on actual integration code with external data store instead of knowing all the details in the process engine. 

So the main two hook points are:
  • PersistenceEventManager - that is responsible for receiving information from the engine when instances (ProcessInstance, Task) are in anyway updated - created, updated or deleted. The other responsibility is to collect all those events and at some point push to the event emitter implementation for actual delivery to external data store.
  • EventEmitter - this is the interface that must be implemented to activate the PersistenceEventManager - if there is no emitter found PersistenceEventManager acts in no-op way. Event emitter has two main responsibilities:
    • provides EventCollection implementation that decides how to deal with events that are added (new instance), updated (updated instance), removed (deleted instance) - different implementation of the EventCollection can decide on individual events e.g. in case single instance is added and removed in the same scope (transaction) then collection can decide to drop it from itself and deal only with still active instances.
    • integrates with the external data store - encapsulate client api of the external system 

Implementations that comes out of the box


There is a default PersistenceEventManager provided that integrates with transactions. That means there is no need (in most of the cases) to implement new PersistenceEventManager. Default implementation collects events from single transaction and deliver them to emitter at:
  • beforeCompletion of the transaction, manager will invoke deliver method of the emitter - this is mainly to give a complete list of events in case emitter wants to send these events in transactional way - for example JMS transactional delivery
  • afterCompletion of the transaction, this will again deliver same list of events as on beforeCompletion and is more for emitters that can't send events in transactional way e.g. REST/HTTP call. Manager will invoke:
    • apply method of the emitter in case transaction was successfully committed
    • drop method of the emitter in case transaction was rolled back


There is also default EventCollection implementation BaseEventCollection that will collect all events (instances regardless of their event type - create, update, delete) though will eliminate duplicates, meaning it will have only the last state of the instance.


Now let's take a look what is an event - this is maybe a bit overused term but it does fit well in this scenario - it is fired when things happen in the engine - these events mainly represent instances that process engine is managing:
  • ProcessInstance
  • Task
Currently only these two types are managed but the hooks within the engine allow to plugin more, for example async jobs.

As soon as instance is updated (created, updated, deleted) that instance is wrapped with an InstanceView type and delivered to PersistenceEventManager - over its dedicated method representing type of the event - create, update remove.

InstanceView will have dedicated implementation to provide access to individual instance details though every implementation will always provide the link to the actual source of this view. Why there is a need for the *View types? Mainly to simplify consumption of them - InstanceView type is designed to be serialisable - for example to JSON or XML without too much hassle.

Out of the box there are two implementations of the InstanceView:
InstanceView might decide when the data should be copied from source though at latest it will be invoked by the PersistenceEventManager before calling the deliver method - so it's important that in case InstanceView implementation copies data earlier it should mark as copied itself to avoid double copy.

That concludes the introduction into how jBPM looks into support for NoSQL. Following article will show some of the implementations of the second approach to empower jBPM with additional capabilities.

I'd like to encourage everyone to share their opinion on how NoSQL would provide value for jBPM or what use cases you see are good fit for NoSQL and thus jBPM should support that better.


Maven plugins for KIE Server

Since version 7 of jBPM KIE Server is the only execution server available by default thus it's getting more and more traction. With that in mind there is a need to have it more aligned with CI/CD pipelines to allow simple integration with runtime environments.

To help with that two maven plugins were built:

the main purpose of these plugins is to enable simple deployment (and not only deployment) of kjars into KIE Servers. 
First one is dedicated for unmanaged KIE Servers as that plugin interacts directly with KIE Server REST api. While the second one targets managed KIE Servers as it interacts with KIE Controller (either one in workbench/business central or standalone controller).

These maven plugins can be used to perform deployment of kjar to execution server directly from within a build pipeline. 

Both plugins have comprehensive documentation (see links above) but just for completeness I'd like to list their capabilities in this article:

KIE Server Deploy Maven Plugin

  • deploy -  deploy kjar to runtime environment
  • dispose - dispose running kjar (kie container) in runtime environment
  • update - update version of running kjar (kie container) in runtime environment

KIE Server Controller Deploy Maven Plugin

  • get-template - retrieves existing server templates from controller
  • create-template - creates new server templates with set of containers 
  • delete-template - removes server template
  • get-containers - retrieves containers in given server template
  • get-container - retrieves given container from server template
  • create-container - create new container in given server template
  • delete-container - delete container from given server template
  • start-container - starts container in given server template
  • stop-container - stops container in given server template
  • deploy-container - creates and starts container in given server template
  • dispose-container - stops and removes container from given server template 

    Contribution - a win win situation!

    And now the most important part - these Maven plugins were added by Fabio Massimo as contributions to KIE projects. So I'd like to thank Fabio for his outstanding work and excellent addition to projects. 

    This clearly shows how valuable contributions are! With that I'd like to encourage others to follow Fabio and share with others community members great stuff you all have done or plan to do!


          Managed KIE Server gets ready for the cloud

          As described in this article, KIE Server can run in two modes:

          • managed, wit controller that is responsible for providing kie containers to be deployed
          • unmanaged, self contained server that allows to deploy kie containers manually 
          In this article, I'd like to focus on managed mode and show some improvements in that area that will make managed KIE Server ready for the cloud.


          With default configuration of managed KIE Server, both controller and kie server need to know how to communicate with each other. By default it is REST based communication and thus require to provide credentials while sending requests
          • user and password - for BASIC authentication
          • token for BEARER authentication 
          These should be gives as system properties on each side:

          • org.kie.server.user and org.kie.server.password is to be set on controller jvm to instruct what credentails to use when connecting to kie server(s)
          • org.kie.server.controller.user and org.kie.server.controller.password is to be set on kie server jvm to instruct what credentials to use when connecting to controller

          This configuration fits nicely in non restricted environment where both controller and KIE Server(s) don't have any limitations to talk to each other. Though it does require that user name and password used by controller to connect to kie servers is the same as it is set globally via system properties and thus will be used whenever talking to any KIE Server instance.

          Though this setup can become problematic if there are any restrictions between these two. In some cases controller might be hidden behind firewall. That will then make an issue for it to communicate with KIE Server(s) when needed. Similar this becomes an issue in OpenShift environment where controller and KIE Server(s) are in different namespaces - they won't see each other internal IP.

          Here we touch upon another aspect of managed KIE Servers - its location. KIE Server when running in managed mode requires following configuration parameters (given as system properties on jvm that runs the KIE Server):
          • org.kie.server.id - an id that points to server template id defined in the controller
          • org.kie.server.controller - an URL of the controller to connect to upon start
          • org.kie.server.location - an URL of this instance of the KIE Server where it will be accessible over HTTP/REST
          The location of the KIE Server is expected to be unique - since this is an URL where the actual instance is accessible. Though this becomes an issue when running kie servers behind load balancer or when running in a cloud based environments. 

          It puts us in situation that we either give load balancer URL and by that loose the capabilities of receiving updates from controller (as only one of them will get updates based on load balancer selection) or we bypass load balancer and then loose the capabilities of it for runtime operations. Keep in mind that the location that kie server does provide on connection to controller is then used by (so called) runtime views in workbench - process instances, tasks, etc.

          In OpenShift environment that is pretty much the same issue - either public IP is provided which completely hides the individual PODs or internal IP of the POD. It has the same consequences as load balancer with one addition - internal IP won't work at all across namespaces.

          Websockets to the rescue...

          To resolve all the issues mentioned above, an alternative (and soon to be the default) way of communicating between KIE Server and Controller was introduced. It is based on Websockets that is now available in pretty much any JEE container (including servlet container) and solves pretty much all the issues that were identified for both on premise and in the cloud.

          As illustrated on the diagram above, KIE Server is the one who initiate the communication and keeps it active as long as it's alive. That in turn removes any need from KIE Controller to know how to communicate (and by that connect to) with KIE Server instances. So there is no more need to configure any user name or password on controller jvm to talk to KIE Servers, it will simply reuse open channel to connected KIE Servers.

          KIE Server is solely responsible for the connection. That means it needs to know where the controller is, how to authenticate when opening connection and how to handle lost connection (e.g. when controller goes down).

          So the first two are exactly the same, given as system properties on jvm that KIE Server is going to run:
          • using either BASIC or BEARER authentication 
          • org.kie.server.controller - an URL of the controller to connect to upon start
          Lost connections are handled by retry mechanism - as soon as KIE Server gets notification that the connection is closed it will start a background thread that will attempt to connect to controller every 10 seconds. Once it is reconnected that thread is terminated. It will reconnect only if the KIE Server itself is not the one who closed the connection.

          Since we keep connection open between kie servers and kie controller then the location given when kie server connects does not have to be unique any more. That solves the issue with running behind load balancer or in OpenShift with different namespaces. System property that provides location (org.kie.server.location) should now be given as the load balancer or public IP in OpenShift. 

          NOTE: If you don't run behind load balancer on on-premise setup (not OpenShift) then keep the location of the kie server unique regardless of the websocket being used. Similar rule applies - same public IP/load balancer should be kept for single server template only.

          There is no need for any extra configuration to enable websocket based communication, it is based only on the actual URL given as controller url - org.kie.server.controller system property.


          Depending where is your controller you might need to change:
          • localhost - to actual host/IP of your server where controller is deployed to
          • 8080 - to actual port number of your server where controller is deployed to
          • kie-wb - to actual context path of the controller web app 

          Both protocols - HTTP/REST and Websocket are active by default and either of them can be used. Though one rule must be kept - use single protocol for all kie servers of given server template. 
          Recommended is to keep single protocol across all kie servers connected to single controller.

          Workbench that provides UI for process related operations (Process Instance, Process Definitions, Tasks perspectives) will utilise websocket channel only for administration operations, that is:
          • controller based operations to manage kie servers
          • data set queries registration required by runtime views
          All other operations, like getting user tasks, getting process definitions or instances, will use regular REST based communication as it will call endpoints on behalf of logged in user to enforce security.

          With this enhancement managed KIE Server is way nicer option to run in cloud and behind load balancer than ever before :)

          Stay tuned for more to come!