Continuing on my exporation of the new features in the System.Diagnostics namespace I thought I might stop and take some time to explain a little bit about things thing called the CorrelationManager – but first, a history lesson.

Request Tracing in the Enterprise Instrumentation Framework

If you have ever used EIF (Enterprise Instrumentation Framework) you will probably know a little bit about a feature called Request Tracing.

The idea behind Request Tracing was that you could establish a context in which code is executed, and all diagnostic events ther were raised within that context would be associated with it via a series of flags. When diagnostic output was analysed this information could be used to group the output and individual execution paths could be determined. This was exceptionally useful for multi-threaded server applications which could hundreds and thousands of concurrent users.

Request Tracing worked by stuffing GUIDs in this thing called CallContext. Think of CallContext as a bucket of memory associated with each thread where you can store “useful stuff”. The beauty of CallContext is that if you have information required to support infrastructure (like logging) then you can put it in there and you don’t need to pass things like handles around as arguments to methods.

An ancillary benefit of the CallContext was that the .NET Remoting infrastructure was aware of it and could propogate some information (marked with the ILogicalThreadAffinitive interface) across into a LogicalCallContext. The LogicalCallContext would be transferred as part of the remoting message to another host process (potentially on another machine) and associated with the handling thread there. If that machine also happened to be using EIF, events raised on that servicing thread would also emit the request tracing GUID which meant that if the logs were merged a full distributed analysis could take place.

More of the Same: Correlation Manager

Enter the .NET 2.0 BCL CorrelationManager class, attached prominantly as a static property on the Trace class. The Correlation Manager has two methods, StartLogicalOperation and StopLogicalOperation. The idea is that you can use these two simple methods to associate tracing information with a logical business operation.

Imagine the following program which contains a number of sub-routines that are called in sequence depending on what features are being used.

ProgramStructure

Lets say this program represents the behaviours of user group participants when the pizza arrives. Each person is probably going to be doing one of two things, drinking coke, or eating pizza – in a multi-threaded user group meeting the diagnostic output might look like this.

OriginalTraceOutput

The thing about this output is that its hard to tell which “open mouth” call is associated with which action – what if it recorded critical diagnostic information but you couldn’t associate it to the particular function it was invoked from? Lets divide the program into two functional areas.

OverlappingFunctions

The Correlation Manager can help you draw this picture in your log files by associating some contextual information with each trace entry. With this contextual information you trace output would look something like this.

CategorizedTraceOutput

Gotta love those nice colours eh? This basically allows us to scan our output and understand the logical sequence of events that look place for an individual thread of execution. Now lets take this understanding and apply it to some code.

StartLogicalOperation

This is a recursive factorize method which finds the prime factors of the number that is put in. Notice that at the beginning and the end of the function I call StartLogicalOperation and StopLogicalOperation respectively. Also notice that on the StartLogicalOperation call I am passing in the input integer – in this program this serves as important diagnostic information about the factorization process.

In the mainline of the program (download the sample code here) I wire-up a ConsoleTraceListener and I set the new TraceOutputOptions property to also include the “logical operation stack” in the output.

ConfigureListenerToLos

When the program runs I get the following output on the console.

ConsoleStackOutput

Not only does it display the trace message, it also includes a thing called the “LogicalOperationStack”. This is a nested list of logical operations that have occured, and by setting the TraceOutputOptions on the trace listener we have told it to emit this information – the stack essentially looks like this.

LogicalOperationStack

In the case of the ConsoleTraceListener the stack is displayed in reverse order but you can clearly see the input numbers used as it recursively called the Factorize(…) factorize method.

Final Thoughts

Using the Correlation Manager it is possible to make your diagnostic output more useful and if I was a betting man I’d say that the fact that it also uses CallContext to manage the LogicalOperationStack means that its a fairly safe bet that this operation stack will be visible across distributed applications. Finally – be careful about using the XmlWriterTraceListener with this particular program – since the output for some numbers can be enourmous.