Friday, May 16, 2014

Sharing logging context across application boundaries

How do you share logging context across application boundaries? Let me try to illustrate the challenge.

Example: 2 applications (sender/receiver) exchanging requests and replies. How does the sender flag a specific request to be traced or tracked across both applications at runtime?

Well, today java developers who use log4j or any other logging framework would enable debug (you also need to know what categories to be debug enabled) on both applications and you get all kinds of messages in the log. Now you have to analyze the log and pick out entries that are only related to a set of specific request/reply pairs relevant to a specific exchange of interest.

TNT4J provides a facility to share context across applications called shared conditional logging. The idea is to establish a shared pool of tokens (key/value pairs) available to all applications a runtime. These tokens can be added, removed, updated on the fly and therefore logging context or any other context can be communicated to all applications at runtime.

This simple model allows sender applications set a token/value pair and pass it along (out of band) to the receiver. Both apps can check for trace levels on a specific token to determine whether logging is needed. The result is that only specific request/reply pairs are tracked across 2 or more applications.

This approach saves developer a ton of time, reduces the overhead associated with enabling debug mode for all, logs only what is needed and therefore reduces the amount of manual analysis, simplifies diagnostics phase.

I am using this framework in my own project and so far with great results.

Tuesday, May 13, 2014

Launched TNT4J -- Java Open Source Project for Tracking, Tracing Application Behavior

I am restarting my blog again after a few years of silence with a launch of a new java open source project (TNT4J) available @ GitHub The mission of the project is to deliver production quality logging framework that significantly outperforms existing simple logging frameworks such as log4j, syslog, etc.

Simple logging of severity/message combo is just not enough when it comes to truly distributed, concurrent applications. Frankly I was tired of going through log files and trying to figure out why apps behave they way they do. I created this framework to deal with 3 basic logging problems:

  • How do I log only what is needed and across applications, runtimes (is DebugEnabled()) simply is not enough and produced too much unrelated data across concurrent apps.
  • How do I correlate and relate log entries within and across logs that belong to a logical activity such as order process, request etc and across multiple threads, applications, runtimes. Logs are just simply a big mess.
  • How do I record important metrics and state of the app, business process? Many times I ask myself what else was going on when this error occured? What was GC, memory, my apps internal variables etc, etc. Much of this info is simply hidden and not available.
Of course one could say, "well use profilers and such, problem solved". The asnwer is simple, when you develop and deliver apps to the users and they have problems, what do you tell your end users? Go use a profiler, buy an application diagnostic tool, go debug the application YOU developed (mobile or otherwise). With many copies of your apps running across variety of devices, servers, desktops how do you troubleshoot your application running elsewhere? Most developers request logs and log analysis nightmare begins. 

Crash logs have too much system, runtime data and lack application specific data required to understand application logic. To troubleshoot application behavior or misbehavior one needs to know how application behaves and log that in addition to what the runtime is doing (stack, traces, VM info, memory, etc).

Most logs are simply useless and a mess -- either too much data, too much text, not enough context, not enough relations. TNT4J addresses this problem head on. I created this project to help myself with this problem. 

I think developers may find TNT4J very useful. Let me know what you think. I welcome feedback, collaborators and adopters. All Welcome.

Monday, March 8, 2010

Business Transaction Management vs. Business Transaction Performance

BTM or Business Transaction Management vs. Business Transaction Performance -- two terms aimed to describe the current state of the affairs in what Gartner calls Transaction Profiling. Ever since I came across the term BTM I questioned whether the term actually reflects what vendors do in this space. The word "management" implies a bi-directional relationship between the manager and the entity being managed. In the world of Application Performance Management the term management implies "measure, monitor, administer, control, plan, automate, improve". If anything the BTM should be redefined as Business Transaction Performance Management or BTPM. Transaction Profiling (Gartner's definition) while more accurate implies a specific implementation of how performance is actually accomplished -- "profiling". One can envision measuring transaction performance without actually doing any profiling. It seems that profiling is an implementation construct and as such should be avoided when naming a broad discipline such as this. In fact BTM, as defined, is really a derivative of Business Process Management rather than an Application Performance Management discipline.

The term BTM actually confuses the market place. What part of "management" is actually being done by the vendors in the space? Most if not all vendor in this space measure performance and report. Any proactive involvement in the transaction lifecycle itself is minimal or not practical in most cases. How practical is it to define application and business logic within a transaction "management" tool? And even if it were feasible wouldn't it be better to do this in the BPM orchestration layer? Managing transaction lifecycle is already defined by the Business Process Management discipline and as such belongs in the BPM space. Today's transactions are orchestrated and therefore managed by widely known BPM tools from IBM, Microsoft, Oracle and others. So either BTM is part of BPM, rather than APM and if this is true do we really need another term to describe the same thing? or BTM simply is all about performance and therefore "management" should be dropped from the acronym.

No matter what we call things, it is important to understand what these things actually are in reality. BTM, no matter what vendors say focus on performance and measurement. Any active involvement in the transaction lifecycle, while possible, in many cases impractical and in most not desirable for many reasons. So BTM is really about performance, and in my view BTP (Business Transaction Performance) or BTPM (Business Transaction Performance Management) are more appropriate. Keeping terms honest is important and benefits the end user. Why? because we are already awash in so many terms, abbreviations, acronyms, technologies, products and vendors with "super-natural abilities". What we need is simplicity and clarity rather than ambiguity and complexity.

Thursday, October 29, 2009

Technology Overload

I am completely convinced that just like we've produced too many cars, too many houses, too much credit and sadly too many dollars, we have also produced too much technology, software products, packages and solutions. The result is that organizations are not only confused but unable to absorb the technology and products that they already own. Over the past decade enterprises acquired too many products, a large portion of which have become shelve-ware. So what is the response of the corporate CIO -- vendor/product committees, tool consolidation, vendor consolidation and other tactics to keep new vendor and technologies away and make do with what they already own.

Enterprise solutions are so complex and vendor messaging so confusing and ambiguous that often times you need Gartner or some other research agency to decode what is what. The number of new terms, abbreviations is just staggering. The best way to deal with complexity is ... simplicity. I like the KISS approach Keep It Simple and Stupid or Stupid and Simple. But unfortunately that is not what is happening.

Monday, April 21, 2008

On events, non-events and metrics

I would like to talk about events, non-events and metrics (aka. facts). Facts are elements of truth usually expressed as name=value pair. Some examples of factual information: current_tempreature=30F, or CPU usage=30%, of course this assumes that the measurement instrument being used is accurate. When monitoring applications, systems or business services, facts are the key performance indicators that reflect the state, availability and/or performance of a given service, system or a subsystem.

So what are the events and how they are different from facts? Event is a change in state of one or more facts. A “High CPU usage” event simply means that CPU usage has exceeded a certain threshold defined by the observer. So events are just the vehicles by which changes in facts are carried from the source to the observer. Therefore most events if not all have the following common attributes {source, timestamp, variable1, variable2...., cause=other_event_list}. Timestamp is simply a time associated with the change of fact state or attribute. Example: temperature changed from 20 to 30F. One can design a event generator that creates add, removed, change events every time a fact is added, removed or changed. These events in turn can feed into a CEP or EP engine for processing.

It is also worth noting that detecting non-events should always be in the context of time, (for example non-occurrence within last 5 min or 24 hours). When the time interval expires it is easy to check for occurrence of certain events and evaluate the remaining CEP expression.