Tuesday, October 9, 2007

What are the key performance attributes of CEP engines

CEP engines are typical implementations of a classic producer and consumer paradigm and therefore can be measured in their ability to produce and consume events. So what would be some of the metrics that we can use:
  • Rate of complex rules per second -- number of rules that can be processed per second
  • Rate of instructions per second -- since each complex rule may consist of more primitive instructions, knowing the rate of instruction execution per second may be useful.
  • Publishing rate per second - peak rate at which events can be published to the engine
  • Consumption rate per second -- peak rate at which events can be consumed by event listeners a.k.a sinks.
  • Event processing latency (ms)-- time it takes for event to be processes after it is published
  • Event delivery latency (ms) -- time it takes to deliver event after it is processed by the event processor or cain of event processors.
  • Outstanding event queue size -- number of events that waiting to be processed. An important measure that tell the user how many events are in the queue to be processed.
The sum of the processing and delivery latency produces the total latency to be expected by the end user. This latency can then be compared to the required quality of service or SLA for given process to determine if the processes can yield useful results as specified by the SLA.

What interests me is not only the metrics, but also the behavior in the situations when rate of production exceeds the rate of consumption for a significant period of time. In this case, the influx of incoming events would have to buffered somewhere to be processed by the engine. This of course can not go on without a significant performance degradation as well as the increase the overall processing latency.

There are several strategies that can be used separately or in combination:
  • Buffering -- simplest technique where events are buffered for both consumers and producers to accommodate for peaks. Eventually the buffers get exhausted and production and consumption rates must equalize by either reducing rate of production or increasing the rate of consumption
  • Increasing number of consumers -- this can drive the consumption rate up. However this technique suffers from the plateau effect -- meaning after a certain number the rate of consumption stalls and starts to decrease.
  • Dynamic throttle -- this where rate of production and consumption are throttled. The easiest place to throttle is at the event production phase, where events are actually dropped or event production is decreased via deliberate and controlled action. In this situation the latency is passed on to event producers.

Friday, September 7, 2007

The Great Battle of the Networks

"The Great Battle of the Networks" is the what is happening in today IT environment. Businesses that have best most efficient networks, better intercommunications and integration among various subsystems will prevail. Todays IT networks can be compared to biologicals nervous systems.

So what is happening today: organizations are building more complex, more efficient networks. Technologies such as virtualziation, application integration, grid-computing, network and performance monitoring making these networks faster and more agile.

While business are investing into their IT infrastructure to improve their business, cut costs and maintain competitiveness I am wondering if there is a more hidden by-product of this growth -- a steady evolution of the intelligent network -- a web of self-organizing, self-healing, agile networks.

As with biological organism growth in neuro complexity led to the evolution of inteligence. While every nueron is a failry simply cell their collections produces asstounding results -- a higher order of function.

The world wide web already exhibits the some features of intelligent self-organizing systems. While you may say that "we the humans" are the once organizing. In my view the "who or what" does not matter, what matters is the final outcome.

Just like bio-organisms networks allow businesses to adapt to ever changing environment. We may be looking to the early rise of the "intelligent net". What might be interesting is that such intelligence may be beyond our senses just like each and every neuron is unaware of the higher organization it is part of.

Saturday, September 1, 2007

Virtualizing Monitoring Infrastructure: Virtual CEP

Virtualization offers clear advantages when it comes to storage, server and desktop virtualization. Today we can run Mac OS, Windows, Linux and others OS on a single hardware all at the same time with seamless integration and switching from one to another. Servers and storage can be consolidated easily and reduce energy, costs associated with new hardware, storage and management overhead. Benefits are clear especially for those managing complex data centers.

Virtualization is an interesting concept when applied in the area of application performance monitoring, Business Activity Monitoring and a like. When we apply concepts of CEP (Complex Event Processor), it would be nice to achieve the following:
  • Linear scalability with increased loads -- meaning it takes the same effort to go from 1 million rules/sec to 2 as 10 million to 20 million
  • Installation, deployment and reconfiguration within minutes
  • Unlimited processing capacity (only limited to the physical boundaries of the servers) -- meaning the number and rate of events that can be processed per unit of time.
Virtualizing CEP capabilities delivers these benefits - Virtual CEP Environment (VCE). VCE is a virtual collection of individual CEP engines pooled together as a single virtual processing environment. In this model the processing capacity of each instance can be added to the overall VCE processing capacity.
VCE can be implemented on top of virtual machines such as VMWare, XEN, Parallels -- meaning virtual machines on a single and separate hardware boxes can be pooled together to deliver processing capability.

The diagram above depicts VCE concept where 3 physical servers are aggregated into a single virtual CEP capable of processing 5.5 million rules/sec. It is easy to add more capacity by simply instantiating CEP instances either on existing box/VM or additional hardware. Instances can also be taken offline with little or no disruption.

Friday, August 3, 2007

Achieving Proactive Business Activity Monitoring (BAM) solution by combining Dashboard with CEP

To some people think of BAM as a dashboard that display Key Business Performance Indicators (KBPIs). Mostly, these indicators are obtained from underlying technologies, databases, business applications and presented on the dashboard using nice looking graphs and gauges. While this typcal view is what some BAM vendors provide it falls far short of the actual business requirements:
  • Proactive correlation of complex interactions of key business systems
  • Detection of negative trends and prevention
  • Notifcation, proactive actions based on conditions and trends
  • Measuring impact of IT on key business services and the bottom line

So the dashboards provide the visualization part of the total BAM solution. The big question is how do you measure KBPIs reliably, and second how do you slice IT environment and its impact on those KBPIs. The later is a daunting task for most IT oragnizations -- trying to figure out how to obtain, measure and correlate thousands of different pieces of information to create a coherent picture of the impact on the business.

This where I beleive a solid CEP (Complex Event Processing) system might come to the rescue. Coupled with a solid data collection mechanism CEP can deliver what dashboards need in order to deliver a complete "Proactive BAM" solution.

So the case can be structured as follows: Use simple data collectors to obtain critical data required to make descisions (this could be tied to middleware, app servers, applications, business apps, existing tools), correlate information in CEP like engine to create KBPIs and integrate CEP/KBPIs with your favorite dashboard.

This process of course will only be successsful if both business and IT side are all on the same page -- meaning business requirements are well defined, and the process is well managed.

Thereore I beleive the key to any BAM implementation is combine dashboard technologies with CEP capabilities. The union of the two provides not just visualization, but cuasality, proactivity and drill down to the elements that impact your bottom line.

Java Runtime.exec() pitfalls re-examined

While there is plenty of stuff written about proper usage of Java Runtime.exec(), there still seems to be a problem with the way most developers use it. First the streams provided via Process object must be drained to prevent process hang and even deadlock, second upon process termination process streams must be closed (OutputStream, InputStream and ErrorStream). However even this clean up may not prevent your JVM from running out of file descriptors.

Apparently as tested with JRE 1.4.2_13 (Linux and AIX), JVM leaves open handles dangling upon process termination even if all streams are explicitly closed. Interestingly these handles are cleaned up when System.gc() is called explictly -- so it can not be a leak in the code.

As a result repeated exections of Runtime.exec() may cause descriptor exhaustion and subsequent failures when opening sockets, files, and launching programs from within your JVM. The common error is (too many files open).

After lots of tests, calling Process.destroy() after the process ends solves the handle leak.
However you must be very careful when calling this method, since you if you still have running threads that are reading/writting Process input streams , then destroy() would make them fails on the next IO. So thread synchronization must be put in place to make sure that destoy() is called only after all stream threads have terminated.

Not sure if this problem exists on all platforms and JVMs, but the lesson is that Runtime.exec is a lot more complicated then it seems and require careful handling within your code.