2017/06/30

Execution error - how to deal with unexpected in jBPM 7.1

jBPM technical error handling is based on transactionality and going back to last (stable) state. That means in case of an error (of any kind) that is not handled by the process, will result in rolling back of entire transaction and leaving process instance in the previous wait state. Any trace about this is only visible in the logs and usually is displayed to the caller (who sent the request to process engine).

That in some cases might not be enough and thus additional error handling is required to provide:
  • Better traceability
  • Visibility in case of critical processes
  • Reporting and analytics - based on error situations 
  • External system error handling and compensation

Overview

Configurable error handling is introduced in version 7.1 that will be responsible for catching any technical errors thrown throughout the process engine execution (including task service). Any technical exception means:
  • Anything that extends java.lang.Throwable
  • Was not handled before - like process level error handling
There are several components that made up the error handling mechanism and allow pluggable approach to extend its capabilities.

The entry point from process engine point of view is ExecutionErrorManager that is integrated with RuntimeManager which is then responsible for providing it to underlying components - KieSession and TaskService. ExecutionErrorManager from the api point of view gives access to:

  • ExecutionErrorHandler - the heart of the error handling mechanism
  • ExecutionErrorStorage - pluggable storage for execution error information
ExecutionErrorHandler is bound to the life cycle of RuntimeEngine, meaning is created when new runtime engine is created and is destroyed when RuntimeEngine is disposed. Single instance of the ExecutionErrorHandler is used within given execution context (transaction). Both KieSession and TaskService uses that instance to inform the error handling about processed nodes/tasks. ExecutionErrorHandler allows to inform it about:
  • Starting processing of a given node instance
  • Completion of processing of a given node instance
  • Starting processing of a given task instance
  • Completion of processing of a given task instance

Such information is mainly used for errors that are of unknown type - in other words errors that do not provide information about the process context. For example, data base exception upon commit time will not carry any process information meaning that would make the error information really poor and pretty much useless. 

ExecutionErrorStorage is pluggable strategy to allow various ways of persisting information about execution errors. Store is used directly by the handler that gets an instance of the store upon creation (at the time RuntimeEngine is created). Default store implementation is based on data base table. Every error will be stored into that table with all information available in it. Not all errors might have all the details they are dependent of the type and possibility to extract information from the error.


Error types and filters

Since error handling will attempt to catch and handle any kind of error it needs a way to categorize errors to be able to properly extract information out of the error and make it pluggable as users might use their special types of error to be thrown and handled in different way then one provided out of the box.
Error categorization and filtering is based on so called ExecutionErrorFilters. This is simple interface that is solely responsible for building instance of ExecutionError that is later on stored via the ExecutionErrorStorage. It has following methods:
  • accept to indicate if given error can be handled by the filter
  • filter where the actual filtering/handling etc happens
  • getPriority indicates the priority which is used when calling filters
Filters provide their priority as only one filter can process given error - this is mainly to avoid to have multiple filters returning alternative “views” of the same error. That’s why priority was introduced to allow more specialized filters to see if they can accept the error and if so deal with it, otherwise let it to be handled by another filter.

ExecutionErrorFilter can be provided using ServiceLoader mechanism that is quite easy and proven so extending capability of the error handling is very simple.

Out of the box ExecutionErrorFilters:

Class name
Type
Priority
org.jbpm.runtime.manager.impl.error.filters.ProcessExecutionErrorFilter
Process
100
org.jbpm.runtime.manager.impl.error.filters.TaskExecutionErrorFilter
Task
80
org.jbpm.runtime.manager.impl.error.filters.DBExecutionErrorFilter
DB
200
org.jbpm.executor.impl.error.JobExecutionErrorFilter
Job
100

The lower value of the priority the higher execution order it gets. In above table then filters will be invoked in following order:
  • Task
  • Process
  • Job
  • DB

Error acknowledgment

By definition every error that is caught and stored is unacknowledged, that means it is to be handled by someone/something (in case of automatic error recovery). That is the base approach to allow to filter on existing errors if they have been already taken care of or not. Acknowledgment on each error saves user who did the acknowledgment and the time stamp for traceability purpose.

Since the ExecutionErrorFilter is responsible for creating the ExecutionError instance, different implementations might decide that the acknowledgement is set to true immediately when the error is handled - maybe because there is a notification sent to some issue tracking system or an email to administrator. Again, that is up to concrete implementation of the filters or even storage.

Auto acknowledgement of execution errors

By default, executions errors are created unacknowledged and thus require manual action to be performed otherwise they will always be seen as information that requires attention. In case of bigger volumes, manual actions can be time consuming and not suitable in some situations. To help with that auto acknowledgement of errors has been provided. It is based on scheduled jobs (via jbpm executor) and there are three types of jobs available:
  • org.jbpm.executor.commands.error.JobAutoAckErrorCommand
    • Job responsible for finding out jobs that previously failed but now are either cancelled, completed or rescheduled for another execution. This job will only acknowledge execution errors of type “Job”
  • org.jbpm.executor.commands.error.TaskAutoAckErrorCommand 
    • Job responsible for auto acknowledgment of user task execution errors for task that previously failed but now are in one of the exit states (completed, failed, exited, obsolete). This job will only acknowledge execution errors of type “Task”
  • org.jbpm.executor.commands.error.ProcessAutoAckErrorCommand
    • Job responsible for auto acknowledgment of process instances that have errors attached. It will acknowledge errors in case process instance is already finished (completed or aborted) or the task that the error originated from is already finished - based on init_activity_id value. This job will acknowledge any type of job that matches above criteria.
All three jobs can be registered on KIE Server to automatically auto acknowledge errors and they are reoccuring type of jobs, meaning if not explicitly said to be SingleRun they will run once a day by default. They can be configured to run on any time intervals by providing NextRun as time expression e.g. 2h, 5d etc

Last parameter that these jobs support is EmfName to provide custom name of entity manager factory that should be used when searching for jobs to acknowledge. All of these parameters are optional.

There is a base class that is extended by individual jobs and can be seen as the starting point for additional implementation of auto acknowledge options
org.jbpm.executor.commands.error.AutoAckErrorCommand

Once extended there are two methods to be implemented:
  • protected abstract List<ExecutionErrorInfo> findErrorsToAck(EntityManager em);
  • protected abstract String getAckRule();
First is the most important as it abstracts the way individual jobs find error to be acknowledged. Second is to provide the rule based on which the errors were found. It is only for logging purpose to indicate what led to auto acknowledge.

Services and access to error information

Access to error information (for the out of the box storage) is through jbpm services. The two admin facing services provide basic access to the error information and to be able to acknowledge the errors:

  • ProcessInstanceAdminService
    • allow to find execution errors of any type and mainly focusing on search capability around process instance
  • UserTaskAdminService 
    • allow to find Task type of errors and focuses on search es around task details like name or id
Since the way of looking for errors can be pretty much unlimited, above services provide the basic access only. For more advanced/tailored searches advanced queries should be used. There is out of the box query mapper available to directly produce the ExecutionError instance out of the data set.

Similar access and capabilities are exposed over KIE Server Remote api and its client library.

Clean up mechanism

To be able to maintain the ExecutionErrorInfo table in good health there is a need to clean it up from time to time. Since the errors can be there for quite some time, depending on the life cycle of the processes, there is no direct api to clean it up. Instead there is jBPM executor command that can be scheduled for recurring execution to periodically clean up errors. There are several options to be used for clean up command:
  • DateFormat 
    • date format for further date related params - if not given yyyy-MM-dd is used (pattern of SimpleDateFormat class)
  • EmfName 
    • name of entity manager factory to be used for queries (valid persistence unit name)
  • SingleRun 
    • indicates if execution should be single run only (true|false)
  • NextRun 
    • provides next execution time (valid time expression e.g. 1d, 5h, etc)
  • OlderThan 
    • indicates what errors should be deleted - older than given date
  • OlderThanPeriod 
    • indicated what errors should be deleted older than given time expression (valid time expression e.g. 1d, 5h, etc)
  • ForProcess 
    • indicates errors to be deleted only for given process definition
  • ForProcessInstance 
    • indicates errors to be deleted only for given process instance
  • ForDeployment 
    • indicates errors to be deleted that are from given deployment id
Important note is that the command will always (regardless of parameters given) restrict deletion to already completed/aborted process instances. If there is any other need to deal with that it should be extended or provided as custom command.

Time to see this in action

Below screen cast shows this error handling in action. Moreover it shows excellent UI support for it which I would like to give credits to the team that have worked on it - Cristiano, Neus and Rafael.

In the screen cast you'll see a simple process that based on variable either continues as expected or throws an exception. This exception is then handled as execution error and is available to users/administrators to deal with. In addition it will illustrate use of auto acknowledge jobs to based on various conditions acknowledge the errors. Please be patient as there are some waiting times in the screen cast while waiting for job to execute :)

Enjoy and stay tuned for more!!!

2017/06/26

KIE Server welcomes Narayana

KIE Server (with BPM capabilities) requires data base for persistence. That is well known fact, though to have properly managed persistence there is also need for transaction manager that will ensure consistency of the data jBPM persists.

Since version 7 KIE Server is the only provided out of the box execution server (there is no execution server in workbench) so it got some additional attention to make sure it does perform in the best possible way.

KIE Server supports following runtime environments:

  • WildFly 10.x
  • EAP 7.x
  • WebSphere 9
  • WebLogic 12.3
  • Tomcat 8.x

Since all of the above are supported for jBPM usage they all must provide transaction manager capability. For JEE servers (WildFly, EAP, WebSphere, WebLogic) KIE server relies on what the application server provides. Though for Tomcat the story is slightly different...

Tomcat does not have transaction manager capabilities so to make use of jBPM/KIE Server on it, it required an external transaction manager to be configured. Until now it was recommended to use bitronix as jBPM test suite was running on it and it does provide integration with Tomcat (plus it covered db connection pooling and JNDI provider for data source look ups). But this has now changed ...

Starting from jBPM 7.1 KIE Server on Tomcat runs with Narayana, the state of the art transaction manager that nicely integrates with Tomcat and makes the configuration much easier than what was needed with bitronix - and is more native to Tomcat users.

Before I jump into details on how to configure it on Tomcat, I'd like to take the opportunity and give spacial thanks to:

Tom Jenkinson and Gytis Trikleris

for their tremendous help and excellent support while working on this change.

Installation notes - with BPM capabilities

Let's see what is actually needed to configure KIE Server on Tomcat with Narayana:
  • (1) Copy following libraries into TOMCAT_HOME/lib
    • javax.security.jacc:javax.security.jacc-api
    • org.kie:kie-tomcat-integration
    • org.slf4j:artifactId=slf4j-api
    • org.slf4j:artifactId=slf4j-jdk14
  • (2) Configure users and roles in tomcat-users.xml (or different user repository if applicable)
  • (3) Configure JACC Valve for security integration Edit TOMCAT_HOME/conf/server.xml and add following in Host section after last Valve declaration 
         <Valve className="org.kie.integration.tomcat.JACCValve" />
  • (4) Create setenv.sh|bat in TOMCAT_HOME/bin with following content
    CATALINA_OPTS="
    -Djbpm.tsr.jndi.lookup=java:comp/env/TransactionSynchronizationRegistry 
    -Dorg.kie.server.persistence.ds=java:comp/env/jdbc/jbpm 
    -Djbpm.tm.jndi.lookup=java:comp/env/TransactionManager 
    -Dorg.kie.server.persistence.tm=JBossTS 
    -Dhibernate.connection.release_mode=after_transaction 
    -Dorg.kie.server.id=tomcat-kieserver 
    -Dorg.kie.server.location=http://localhost:8080/kie-server/services/rest/server 
    -Dorg.kie.server.controller=http://localhost:8080/kie-wb/rest/controller
    "
       Items marked in green are related to persistence and transaction.
       Items marked in blue are general KIE Server parameters needed when running in managed mode.
  • (5) Copy JDBC driver jar into TOMCAT_HOME/lib depending on the data base of your choice
  • (6) Configure data source for jBPM extension of KIE Server 
           Edit TOMCAT_HOME/conf/context.xml and add following within Context tags of the file:
     <Resource 
           name="sharedDataSource" 
           auth="Container" 
           type="org.h2.jdbcx.JdbcDataSource" 
           user="sa" 
           password="sa"
           url="jdbc:h2:mem:testdb;DB_CLOSE_DELAY=-1;MVCC=TRUE" 
           description="H2 Data Source" 
           loginTimeout="0" 
           testOnBorrow="false"
           factory="org.h2.jdbcx.JdbcDataSourceFactory"/>
           This is only an example to use H2 as data base, for other data bases look at
           Tomcat's configurations docs.

           Once important note, please keep the name of the data source as sharedDataSource

  • (7) Last but not least is to configure XA recovery 
  • Create xa recovery file next to the context.xml with data base configuration with following content: 
    <?xml version="1.0" encoding="UTF-8"?> 
    <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> 
    <properties> 
      <entry key="DB_1_DatabaseUser">sa</entry> 
      <entry key="DB_1_DatabasePassword">sa</entry> 
      <entry key="DB_1_DatabaseDynamicClass"></entry> 
      <entry key="DB_1_DatabaseURL">java:comp/env/h2DataSource</entry> 
    </properties> 

    Append to CATALINA_OPTS in setenv.sh|bat file following: 
    -Dcom.arjuna.ats.jta.recovery.XAResourceRecovery1= \
    com.arjuna.ats.internal.jdbc.recovery.BasicXARecovery\;
    abs://$CATALINA_HOME/conf/xa-recovery-properties.xml\ \;1
    BasicXARecovery supports following parameters: 
    • path to the properties file 
    • the number of connections defined in the properties file


Installation notes - without BPM capabilities

In case you want to use KIE Server without BPM capabilities - for instance for Rules or Planning - then you can completely skip steps from 4 (in step 4 use only the marked in blue items) and still run KIE Server on Tomcat.

With that, I'd like to say welcome to Narayana in KIE Server - well done!