piątek, 15 kwietnia 2016

KIE Server clustering and scalability

This article is another in KIE Server series and this time we'll focus on clustering and scalability.

KIE Server by its nature is lightweight and easily scalable component. Comparing to execution environment included in KIE workbench it can be summarized with following:

  • allows to partition based on deployed containers (kjars)
    • in workbech all containers are deployed to same runtime
  • allows to scale individual instances independently from each other
    • in workbench scaling workbench means scaling all kjars
  • can be easily distributed across network and be managed by controller (workbench by default)
    • workbench is both management and execution which makes it a single point of failure
  • clustering of KIE Server does not include any additional components in the infrastructure 
    • workbench requires Zookeeper and Helix for clustered git repository
So what does it mean to scale KIE Server?
First of all it allows administrators to partition knowledge between different KIE Server instances. With that said, HR department related processes and rules can run on one set of KIE Server instances, while Finance department will have its own set of KIE Server instances. By that each department's administrator can easily scale based on the needs without affecting each other. That gives us unique opportunity to really focus on the components that do require additional processing power and simply add more instances - either on the same server or on different distributed across your network.

Let's look at most common runtime architecture for scalable KIE Server environment


As described above the basic runtime architecture will consists of multiple independent sets of KIE Servers where number of actual server instances can vary. In the above diagram all of them have three instances but in reality they can have as many (or as little) as needed.

Controller in turn will have three server templates - HR, Finance and IT. Each server template is then defined with identifier which is used by KIE Server instances via system property called org.kie.server.id.

In above screenshot, server templates are defined in the controller (workbench) which becomes single point of configuration and management of our KIE Servers. So administrators can add or remove, start or stop different containers and controller is responsible for notifying all KIE Server instances (that belongs to given server template) with performed operations. Moreover when new KIE server instances are added to the set they will directly receive all containers that should be started and by that increase processing power.

As mentioned, this is the basic setup, meaning actual usage of the server instances is by calling them directly - each individual KIE Server instance. This is a bit troublesome as users/callers will have to deal with instances that are down etc. So to solve this we can put load balancer in front of the kie servers and then utilize that load balancer to the heavy lifting for us. So users will simply call single URL which is then configured to work with all instances in the back end. One of the choices of a load balancer is Apache HTTP with ModCluster plugin for efficient and highly configurable load balancer.


In version 7, KIE Server client will come with pluggable load balancer implementation so whenever using KIE Server client users could simply skip additional load balancer as infrastructure component. Though it will provide load balancing and failure discovery support it's client side load balancer which has no knowledge of underlying backend servers and thus won't be as efficient as mod cluster can be.

So this covers, scalability of KIE Server instances as they can be easily multiplied to provide more power for execution and at the same time distribution both on the network and knowledge (containers) level. But looking at the diagram, a single point of failure is the controller. Remember that in managed mode (where KIE Server instances depend on controller) they are limited in case controller is down. Let's recap on how KIE Server interacts with controller:

  • when KIE Server starts it attempts to connect to any of the defined controllers (if any)
  • it will only connect to one once connection is successful
  • controller will then provide list of containers to deploy and configuration
  • based on this information KIE Server deploys and starts to serve requests
But what happens when none of the controllers can be reached when KIE Server starts? KIE Server will be pretty much useless as it does not know what container it should deploy. And will keep checking (with predefined intervals) if controller is available. So until controller is not available KIE Server has not containers deployed and thus won't process any requests - most likely response you'll get from KIE Server when trying to use it will be - no container found.

Note: This affects only KIE Servers that starts after controller went down, those that are currently running are not affected at all.

So to solve this problem workbench (and by that controller) should be scaled. Here default configuration of KIE workbench cluster applies, meaning with Apache Zookeeper and Apache Helix as part of the infrastructure.


In this diagram, we scale workbench by using Apache Zookeeper and Helix for cluster of GIT repository. This gives us replication between server instances (that runs workbench) and by that provides several controller endpoints (which are synchronized) to secure KIE Server instances can reach the controller and collect configuration and containers to be deployed.

Similar as it was for KIE Servers, controller can be either reached directly by independent endpoints or again fronted with load balancer. KIE Server allows to be given list of controllers so load balancer is not strictly required though recommended as workbench is also (or even primarily) used by end users who would be interested in load balanced environment as well.

So that would conclude the description of clustering and scalability of KIE server to gain most of it, let's now take a quick look what's important to know when configuring such setup.

Configuration

Workbench
We start with configuration of workbench - controller. The most important for controller is the authentication so connecting KIE Server instances will be authorized. By default KIE Server upon start will send request with Basic authentication corresponding to following credentials:
  • username: kieserver
  • password: kieserver1!
so to allow KIE Server to connect, make sure such user exists in application realm of your application server.

NOTE: the username and password can be changed on KIE Server side by setting following system properties:
  • org.kie.server.controller.user
  • org.kie.server.controller.pwd

This is the only thing needed on application server that hosts KIE workbench.

KIE Server
On KIE Server side, there are several properties that must be set on each KIE Server instance. Some of these properties must be same for all instances representing same server template defined in the controller.
  • org.kie.server.id - identifier of the kie server that corresponds to server template id. this must be exactly the same for all KIE Server instances that represent given server template
  • org.kie.server.controller - comma separated list of absolute URL to the controller(s). this must be the same for all KIE Server instances that represents given server template
  • org.kie.server.location - absolute URL where this KIE Server instance can be reached. This must be unique for each KIE Server instances as it's going to be used by controller to notify requested changes (e.g start/stop container). 
Similar to how workbench authenticates request KIE Server does the same, so to allow controller to connect to KIE Server instance (based on given URL as org.kie.server.location) application realm of the server where KIE Server instances are running must have configured user. By default workbench (controller) will use following credentials:
  • username: kieserver
  • password: kieserver1!
so it must exist in application realm. In addition it must be member of kie-server role so KIE Server will authorize it to its REST api.

NOTE: the username and password can be changed on KIE Workbench side by setting following system properties:
  • org.kie.server.user
  • org.kie.server.pwd
There are other system properties that can be set (and most likely will be needed depending on what configuration of KIE Server is needed). For those look at the documentation.

This configuration applies to any way you run KIE Server - on standalone Wildfly, domain mode of Wildfly, Tomcat, WAS or WebLogic. It does not really matter as long as you follow the set of properties you'll be ready to go with clustered and scalable KIE Server instances that are tailored to your domain.

That would be all for today, as usual comments are more than welcome.

32 komentarze:

  1. Hi Maciej, thanks for your great blog, very helpful indeed!

    One question regarding this approach: how would you set up inter process communication over partitions? For example messaging, if one process on the HR partition is to send a message to the IT partition, how would you accomplish this? (One can have the same question for signalling, and events for example)

    One solution I can think of is implementing a send handler that communicates to the other partition through for example REST (kie server client). Would you consider this the most optimal solution? (Having one large database is not an option)

    OdpowiedzUsuń
    Odpowiedzi
    1. you can either use REST calls using kie server client or the default REST Work item handler to send the information to other kie servers. Or use JMS directly to make sure it's transactional. You could take advantage of this http://mswiderski.blogspot.com/2015/09/improved-signaling-in-jbpm-63.html to make it more transparent. Not that you have to enable this on kie server side (mdb for processing) as it's not on by default.

      Usuń
  2. Hi Maciej. I managed to connect kie-servers to same database as kie-wb. All process instances are showing in database, but I'm having trouble finding them in kie-wb process instance page. Am I doing something wrong or it is impossible to use workbench in such a way.

    OdpowiedzUsuń
    Odpowiedzi
    1. in general they are not designed to share db as they use different concept for execution. So best would be to avoid this scenario. Although if you really need to do so, make sure you deploy same kjars into both kie server and workbench and then name the container as GAV of the kjar so then workbench should be capable to see them.

      Usuń
    2. Ok, thanks for the tip, I will check it out.

      But still I have some questions regarding correct way to do this.

      1. When You use infrastructure presented in this article, assuming that processes will be executed only on kie-servers using load balancer, and if we would like to have some sort of tool to monitor the execution state and history, we should build some kind of extensions using this API?
      2. Is it possible to make all the servers in one group (for example IT department from the screen above) share the same database?
      3. Do we need any extra work to be able to start one process instance and finish on the other (for example process with human task in the middle) assuming that we don't know where the next request will be send by loadBalancer?

      Hi, I hope You will be able to answer to my questions. If there is are any other articles like this that You can point to me, that would be very thankfull

      Thanks in advance

      Usuń
    3. 1. There history saved in db so you can rely on this already. If you need more then, yes you would have to build extension on top of kie server.
      2. Yes, that's actually how you design it to be clustered
      3. No, as long as they share single db you have it covered

      Usuń
    4. Hi Maciek

      I'm greatfull for all the answers that you provided.
      Could you tell me why we should avoid scenario, where kie-server and kie-wb are sharing the same database?
      What kind of problems we can encounter ?

      Usuń
    5. see this jira for some details https://issues.jboss.org/browse/JBPM-5164

      Usuń
  3. Hi Maciej, how are you? Is there an out of the box way to encrypt the property org.kie.server.controller.pwd? Thanks in advance!

    OdpowiedzUsuń
    Odpowiedzi
    1. Ariel,

      unfortunately there is no support for encrypted password for controller/kie server. in 6.4 there is token based support but that still requires token to be given as system property and then the token needs to be long lived too.

      Usuń
  4. Hi Maciej, how are you? In the second picture (3 kie-server templates with multiple instance each one, and a load balancer in front of them), how does the load balancer know which kie-server template (eventually, a kie-server instance of HR, finance or IT) will attend the request? Is it be based on the URL?

    OdpowiedzUsuń
    Odpowiedzi
    1. Yes, it is based on URL as it includes container id which is then unique per setup

      Usuń
  5. Hi Maciej
    1 I'm install a workbench using zookeeper and helix and The system is operating normally;but its Synchronize git resources not sync artifact repository and not sync container? I need to deploy and config container one by one in workbench cluster?
    2 I have a kieserver and I use comma separated list of absolute URL to the controller(s). I know two controller nees same configure; I means kie-server polling for two controllers?
    Thanks

    OdpowiedzUsuń
  6. Hi, I passed the test plan, but unfortunately I have encountered a new problem:
    1. When I start start scanner, I rebuild and deploy project, but the container not update? I hava a setting.xml file in kie-server machine point to one of cluster workbench machine; container start successful, but no update drl content;
    2.may i point to anthor nexus where i deployed project in kieserver settint.xml?
    Becase i do not want all kieserver machine only scanner one of workbench machine

    OdpowiedzUsuń
  7. Hi Maciej, we are trying to automate the container deploy in our jbpm 6.4 environment. We tried to do it with the REST API and it worked ok. After doing it, all we see is that our "template-name".xml in our bin directory is updated with a new node.

    The question is if we change this xml by hand (for example, adding a new container referencing a valid GAV in our repository), do we have the same affect than using the REST API? Or there is another magic behind the scenes?

    Thanks in advance!

    OdpowiedzUsuń
    Odpowiedzi
    1. in general, there is no difference in features but using REST api the change is visible directly, while modifying xml file requires server restart. Moreover, I'd recommend to always stop the server before doing modifications and the sort it again.

      Usuń
  8. Dear Mr Maciej hello: I met a problem at present in the study of drools Want to ask you, the problem is this: I have set up two kie - service through the tomcat server Settings in the workbench stateful kiesession, through the load balancer How can I put the two kie - server service kiesession Shared kie - what is deployed on the tomcat server, load balancer nginx is use

    OdpowiedzUsuń
  9. Dear Mr Maciej hello: I met a problem at present in the study of drools Want to ask you, the problem is this: I have set up two kie-server through the tomcat server Settings in the workbench stateful kiesession, through the load balancer How can I put the two kie-server service kiesession Shared, kie-server what is deployed on the tomcat server, load balancer nginx is use

    OdpowiedzUsuń
    Odpowiedzi
    1. it's not possible to have ksession shared across jvms

      Usuń
    2. Ten komentarz został usunięty przez autora.

      Usuń
    3. that is not implemented to be shared. SO the only thing for statefull is to stay on same jvm

      Usuń
    4. Thank you very much, Mr Maciej

      Usuń
  10. Maciej Sir Hello, disturb you again, two questions I ask you to:
    Problem a: through wildfly I successful deployment of the cluster of the workbench and synchronize data (zookeeper + helix) but with related attributes, tomcat failed to synchronize data, start the failure in the setenv.sh Add configuration files
    - Dorg. Uberfire. Cluster. Id
    - Dorg. Uberfire. Cluster. Zk
    - Dorg. Uberfire. Cluster. The local. Id
    Lead to the workbench failed to start properly, ask, you how to configure the workbench in tomcat cluster


    Question 2: I'm in the IDE to create a maven project A1
    A1 is a static method, the method call initializes the spring context and call interface implementation class
    A1 project to create a rules file file import function call a static method is a static method initializes the spring container successfully
    If A1 project MVN install into the jar to upload the workbench, the workbench to create the same rules as in A1, test scenarios spring container launch failure
    Ask you, how to initialize the spring container in the workbench.

    OdpowiedzUsuń
  11. zookeeper and helix was never tested in tomcat environment so it unknown if that will work properly.

    when it comes to initializing spring in the container that all depends if you provided all dependencies

    OdpowiedzUsuń
  12. Mr. Maciej Hello, I am in the process of drools study there is a particularly urgent question to ask you
    The problem is this, the use of workbench + kie-server do service in a data object to add a property java.util.List list = new java.util.ArrayList ( );

    But through kie-server client JSON request, not anti-serialization, ask this question. How to solve

    OdpowiedzUsuń
  13. The problem has been found, I revised the data in the workbench get set of methods, and I am sorry to bother you
    Yesterday to see your reply tomcat used zookeeper + helix Not tested,
    Will the two tomcat server if the deployment of two data synchronization workbench it.

    OdpowiedzUsuń
  14. Hi Mr. Maciej,

    I'm trying to cluster kie-server. How could I do to share the generated .jars from the workbench with my kie-server instance.

    Thanks

    OdpowiedzUsuń
    Odpowiedzi
    1. you need to properly configure maven via settings.xml and its repositories so knee server machine can see workbench maven repo.

      Usuń
  15. Ask a question: I study the workbench + kie-server found a very low memory consumption problem, I set it to StatelessKiesession keeession, but it seems that calling kie-server kiesession is not released
    Do not know what solution can be solved,
    In api i did not find release kiesession

    Log: WARNING: Found more than one default StatelessKieSession: disabling all. StatelessKieSessions will be only only by name

    OdpowiedzUsuń
  16. When starting up a system consisting of more than one KIE Server, it appears that QueryServiceImpl and the registering of queries occurs concurrently. The result is that queries will be registered in one of the instances but not the other instance.

    The instance that pulls the shortest straw and isn't able to register a query will result in something like the following when that query is attempted on that instance:
    "Caused by: org.jbpm.services.api.query.QueryNotFoundException: Query jbpmProcessInstances not found".

    Is this a known issue? Is there a practice to starting up multiple KIE Servers to avoid this? Some possibilities are 1) populating the database with the queries prior to starting the instances, or 2) staggering the startup such that one instance starts and creates the queries prior to starting the other instances.

    OdpowiedzUsuń
    Odpowiedzi
    1. Mark, this means that your kie server instances identify themselves with exact location (given via org.kie.server.location system property) so controller does see it only as one instance. This is a limitation when running behind load balancer. For that limitation an alternative, web socket based communication has been introduced in 7.2 see this article for details: http://mswiderski.blogspot.com/2017/08/managed-kie-server-gets-ready-for-cloud.html

      for the earlier versions you would have to give each kie server a unique location so controller will call all of them based on uniqueness of their locations.

      Usuń
    2. 7.2.0.Final seems to resolve the issue I was having. I didn't need to use WebSockets as described on that page. I may transition to it next. I did continue to use discrete 'org.kie.server.location' that point to it's self in the pool. The full set of queries appears to be registered via QueryServiceImpl in each of the KIE Server instances - this wasn't the case with 7.1.0.Final.

      Thanks!

      Usuń