czwartek, 20 sierpnia 2015

Shift gears with jBPM executor

Since version 6.0 jBPM comes with component called jBPM executor that is responsible for carrying on with background (asynchronous) tasks. It started to be more and more used with release of 6.2 by users and even more with coming 6.3 where number of enhancements are based on that component:

  • async continuation 
  • async throw signals
  • async start process instance
jBPM executor uses by default a polling mechanism with backend data base that stores jobs to be executed. There are couple of reasons to use that mechanism:
  • supported on any runtime environment (application server, servlet container, standalone)
  • allows to decouple requesting the job from executing the job
  • allows configurable retry mechanism of failed jobs
  • provides search API to look through available jobs
  • allows to schedule jobs instead of being executed immediately 
Following is a diagram illustrating a sequence of events that describe default (polling based) mechanism of jBPM executor (credits for creating this diagram go to Chris Shumaker)
Executor runs in sort of event loop manner - there is one or more threads that constantly (on defined intervals) poll the data base to see if there are any jobs to be executed. If so picks it and delegates for execution. The delegation differs between runtime environments:
  • environment that supports EJB - it will delegate to ejb asynchronous method for execution
  • environment that does not support EJB will execute the job in the same thread that polls db
This in turn drives the configuration options that look pretty much like this:
  • in EJB environment usually single thread is enough as it is used only for triggering the poll and not actually doing the poll, so number of threads should be kept to minimum and the interval should be used to fine tune the speed of processing of async jobs
  • on non EJB environment number of threads should be increased to improve processing power as each thread will be actually doing the work
In both cases users must take into account the actual needs for execution as the more threads/more frequent polls will cause higher load on underlying data base (regardless if there are jobs to execute or not). So keep that in mind when fine tuning the executor settings.

So while this fits certain set of use cases it does not scale well for systems that require high throughput in distributed environment. Huge number of jobs to be executed as soon as possible requires more robust solution to actually cope with the load in reasonable time and with not too heavy load on underlaying data base. 
This came us to enhancement that allows much faster (and immediate compared to polling) execution, and yet still provide same capabilities as the polling:
  • jobs are searchable 
  • jobs can be retried
  • jobs can be scheduled
The solution chosen for this is based on JMS destination that will receive triggers to perform the operations. That eliminates to poll for available jobs as the JMS provider will invoke the executor to process the job. Even better thing is that the JMS message carries only the job request id so the executor will fetch the job from db by id - the most efficient retrieval method instead of running query by date.
JMS allows clustering support and fine tuning of JMS receiver sessions to improve concurrency. All in standard JEE way. 
Executor discovers JMS support and if available will use it (all supported application servers) or fall back to default polling mechanism.

NOTE: JMS is only supported for immediate job requests and not the scheduled one

Polling mechanism is still there as it's responsibility is still significant:
  • deals with retries
  • deals with scheduled jobs
Although need for the high throughput on polling is removed. That means that users when using JMS should consider to change the interval of polls to higher number like every minute instead of every 3 seconds. That will reduce the load on db but still provide very performant execution environment.

Next article will illustrate the performance improvements when using the JMS based executor compared with default polling based. Stay tuned and comments as usually are more than welcome.


13 komentarzy:

  1. Hi Maciej,

    Thanks for the excellent article. I am using JmsAvailableJobsExecutor in our Spring-embedded application and have defined that as a MDB in the ejb-jar.xml and accordingly configured the maxSession etc. Is this a correct approach? Please confirm.

    Thanks,
    Anindya

    OdpowiedzUsuń
    Odpowiedzi
    1. depending on your application server you might not need MDB and rely purely on spring JMS handling. But anyway this is correct approach

      Usuń
  2. Thanks Maciej. I found it's easier to use Spring JMS and so changed the implementation. But still facing issue, as the JMSAvailableJobsExecutor cannot have a queryService unless I set it. Do I need to write another version of buildRunable() method for JMSAvailableJobsExecutor? Is there any other easier way around?

    Please suggest.

    OdpowiedzUsuń
    Odpowiedzi
    1. since you use spring only then you must create JMSAvailableJobExecutor in spring and set all its fields then bind it to given queue. Once that is done you can easily make use of it.

      Usuń
  3. Okay done. I created the JMSAvailableJobsExecutor by following the same approach of creating AvailableJobsExecutor in the buildRunable() method and then loaded as Spring bean. This is then registered to Spring Queue Listener Container bean with a thread pool executor. This works great!!

    Thanks for the suggestion :)

    OdpowiedzUsuń
  4. Hi Maciej,

    Can you please suggest on below 2 issues regarding executors?

    In our Spring application, we are running large jobs and sometimes when the requests fail, it goes to RETRYING state. After that those records are ONLY picked up by single executor thread that runs once every minute. Which will end up processing only 1 request per minute, even if we go with asynchronous method execution mode. I am not been able to find any support from JMS to retry these requests. Is there any other alternative to this?

    Another issue is coming with concurrency. Looks like the JMS Executor and polling thread is picking up same request and executing at the same time. If I increase the interval for the polling thread, then this gets minimized but still happening randomly. From the code, it should not happen as it is locking before updating. Do you have any idea about why this could happen?

    Please suggest on these and let me know if you want me to submit a JIRA for this.

    Thanks,
    Anindya

    OdpowiedzUsuń
  5. JMS based processing is only for immediate jobs so those that failed and are put for retry will be performed by polling thread only. Will not schedule it over JMS any more. This was done on purpose to allow fast and efficient processing over JMS while polling should just pick up the failed ones or scheduled in future.

    When it comes to concurrency make sure you db is running with row level lock as executor uses pessimistic lock mechanism to ensure only on executor will take the job.

    OdpowiedzUsuń
  6. Thanks Maciej. That helps a lot. Also, need one clarification. We are using Spring's transaction manager (jtaTransactionManager) which by default chooses the transaction manager based on underlying container (For us its WebSphereUoWTransactionManager).

    Now jBPM manual recommends org.jbpm.persistence.jta.ContainerManagedTransactionManager.

    So should I use Spring's one or jBPM's one? Is there any advantage of using jBPM's transaction manager?

    Please suggest.

    OdpowiedzUsuń
  7. OK. I was able to figure that out. Got confused with the name. I think we should be fine with Spring's JTATransactionManager for both.

    While using JMS listener and setting the JTATransactionManager to the listener container in Spring, it is giving OptimisticLock exception. When I remove the acknowledgement-mode to make it default AUTO_ACKNOWLEDGE, then its working fine. DO you see any concerns here, as the JMS messages will be auto acknowledged, even if the listener gets exception while executing?

    Please advise.

    Thanks,
    Anindya

    OdpowiedzUsuń
    Odpowiedzi
    1. you must use auto acknowledge as it will then participate in transaction and commit message only when it was processed successfully. When exception happens transaction should be rolled back and thus message will bet back into the queue so will be available for reprocessing.

      Usuń
  8. Thanks Maciej for the clarification on this. I have set it to auto acknowledged as advised by you and its now working perfectly fine.

    OdpowiedzUsuń
  9. Hi Maciej!

    I've a question regarding async jobs. I made a CustomWorkItemHandler. It's just a Java class that invokes a service. I checked the async ítem in the model (is this correct? I didn't extend AsynWorkItemHandler and I didn't implement a Command).

    When the service doesn't respond within five minutes, I see this in the log.

    16:12:31,192 INFO [stdout] (Thread-1 (HornetQ-client-global-threads-1353134396)) ***Execution starts!***
    ...
    16:17:31,176 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a402a10:-623b57e5:57aa2a5e:3f7 in state RUN
    16:17:31,176 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012095: Abort of action id 0:ffff0a402a10:-623b57e5:57aa2a5e:3f7 invoked while multiple threads active within it.
    16:17:31,176 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffff0a402a10:-623b57e5:57aa2a5e:3f7 aborting with 1 threads active!
    16:17:31,177 WARN [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
    16:17:31,177 WARN [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
    16:17:31,197 WARN [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
    16:17:31,207 WARN [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffff0a402a10:-623b57e5:57aa2a5e:3f7

    In this situation, what is the status of the instance, the work item and the job?

    This 5 minutes timeout (or something like that), is configurable?

    Thanks in advance!

    OdpowiedzUsuń
    Odpowiedzi
    1. this means your transaction timed out and it was rolled back. Result of this is all the operation done in that transaction are rolledback as it they would never happen. Process instance stays in they async node and needs to be retriggered.

      transaction timeout is configurable on application server level but I would not recommend it as that is in general bad practice to have long running transaction as this might lead to resource locks. If you have a case that is not responding or is very slow then use jbpm executor command that is executed outside of transaction and allows to have retry mechanism.

      P.S.
      I recommend move this sort of discussion to jbpm usage mailing list - see jbpm mailing list for detailed addresses

      Usuń