- Rate of complex rules per second -- number of rules that can be processed per second
- Rate of instructions per second -- since each complex rule may consist of more primitive instructions, knowing the rate of instruction execution per second may be useful.
- Publishing rate per second - peak rate at which events can be published to the engine
- Consumption rate per second -- peak rate at which events can be consumed by event listeners a.k.a sinks.
- Event processing latency (ms)-- time it takes for event to be processes after it is published
- Event delivery latency (ms) -- time it takes to deliver event after it is processed by the event processor or cain of event processors.
- Outstanding event queue size -- number of events that waiting to be processed. An important measure that tell the user how many events are in the queue to be processed.
The sum of the processing and delivery latency produces the total latency to be expected by the end user. This latency can then be compared to the required quality of service or SLA for given process to determine if the processes can yield useful results as specified by the SLA.
What interests me is not only the metrics, but also the behavior in the situations when rate of production exceeds the rate of consumption for a significant period of time. In this case, the influx of incoming events would have to buffered somewhere to be processed by the engine. This of course can not go on without a significant performance degradation as well as the increase the overall processing latency.
There are several strategies that can be used separately or in combination:
- Buffering -- simplest technique where events are buffered for both consumers and producers to accommodate for peaks. Eventually the buffers get exhausted and production and consumption rates must equalize by either reducing rate of production or increasing the rate of consumption
- Increasing number of consumers -- this can drive the consumption rate up. However this technique suffers from the plateau effect -- meaning after a certain number the rate of consumption stalls and starts to decrease.
- Dynamic throttle -- this where rate of production and consumption are throttled. The easiest place to throttle is at the event production phase, where events are actually dropped or event production is decreased via deliberate and controlled action. In this situation the latency is passed on to event producers.