Apache Geode: How Pymma Uses it as an Efficient Alternative to Kafka-Storm-Spark
Further to our customers request, we had to implement a fine monitoring system for the OpenESB business process engine. The largest OpenESB configurations on production runs concurrently dozens of engine instances, that daily, generate up to 10 billion of events. These events must be aggregated and analysed to generate data for monitoring.
A classical solution would be to store the events first and then process them in a batch mode. Simply, it would require too much storage and CPU capacities to process this number of events and a too long delay to provide monitoring information on time.
So, our architect decided to process the message coming from our engines as a stream of events. This processing involves three types of application:
- The buffer: It is used to store the events coming from the event providers (here OpenESB). The buffer must have a very low latency to capture a huge number of events without slowing down the event producer. At the same time, the buffer must capture data from multiple producer concurrently. This involves a powerful support of distributed and concurrent processes. Apache Kafka, Cassandra are some good examples of buffer.
- The engine: the engine implements a kind of step by step process with intermediate states. Each step executes a part of the event aggregation and analysis process, and generate intermediate state useful for the next steps. For obvious efficiency reasons, the intermediate states must not be stored outside the engine. One of the engine key feature is to process concurrently numerous events on many machines. Apache Spark and especially Apache Storm are typical examples of this type of software.
- The persistence system: when the event aggregation or analysis process is complete, the process results are sent to a persistence system which implements a query language to provide an easy access to the results. MongoDB, Crate, PostgreSQL, Greenplum can be used as a persistence system.
So, event aggregation or analysis process requires knowledge and installation of three or more different software. And this said, companies are reluctant to dedicate such budget and time to deploy an event aggregation or analysis process chain.
In our presentation, we demonstrate how, at our profit, GemFire or Geode can act as a buffer, an engine and a persistence system and avoid the multiplication of software and deployment. We explain how the asynchronous Event Queue and Event Handler work together to act as an engine which support step by step aggregation and analysis process, and take advantage of the distributed cache to work as a buffer and a result store. We also detail how GemFire partitioned region internal design, provides a great scalability and provides very good results for events aggregation and data analysis.
We hope that thanks to this presentation, the delegate will get a different point of view on GemFire or Geode and would like to use it as an event processing system.