Efficient structured event logging

Share

Event tracing for windows

Event Tracing for Windows (ETW) represents specialized and highly optimized structured, strongly typed logging mechanism for recording useful information for debugging purposes, state monitoring, profiling, tracing or data mining.

ETW was introduced with Windows 2000, but number of registered ETW providers is rising with every new version of Microsoft OS. Windows 7 has 600+ registered event providers, Windows 10 has 900+. Event providers can be enabled or disabled on demand.

Efficient Structured Event Logging

ETW also supports custom implementation of event providers by implementing EventSource (starting from .NET 4.5).

ETW design

ETW has been designed to have minimal performance impact and to be very lightweight. ETW also provides controlling mechanism for turning on and off some event providers and analyzing logged events for period while event collecting was turned on.

.

Efficient Structured Event Logging

There are already existing event providers that are exposing great deal of OS information (IO, memory, thread switching, .NET GC, Finalizers, JIT…). Next to these, new event provider can be created.

Event Providers can be enabled or disabled using Controllers. One of tools used for controlling and trace analyzing is PerfView (here). Controlling of event providers also can be done through the event viewer (if event provider is registered), programmatically through the code, periodically or exception-based.

An event tracing session need to set up to listen to events from one or more providers. Consumers are logging modules for writing events to a trace file (.etl), into database, or exposing live stream of logged data.

SLAB and Tx (LINQ to Logs and Traces) can be used as event consumers.

SLAB (Semantic Logging Application Block) can be used to create an observable event listener for an EventSource, and then define one or more event sinks. There are many sinks available out of the box including ConsoleSink, RollingFlatFileSink and SQLDatabaseSink. SLAB can be used in-process or out-of-process.

Show me some code

Here is one simple example where we have a wrapper around a queue of integers that exposes events during interaction with the queue. Events of interest are exposed through API defined in IQueueEvent interface. Idea is to expose next information:

–    When a queue gets created
–    Number of items in the queue when starts enqueuing
–    Number of items in the queue when enqueuing gets completed
–    Duration of enqueue procedure

public interface IQueueEvent
{
  void QueueCreated(string queueName);
  void StartEnqueuing(int count, int item, string queueName);
  void Enqueued(int count, int item, string queueName);
}

public class TestQueue
{
  private readonly ConcurrentQueue<int> instance = new ConcurrentQueue<int>();
  private readonly string instanceName;
  private readonly IQueueEvent etw;

  public TestQueue(IQueueEvent etw, string instanceName)
  {
    this.etw = etw;
    this.instanceName = instanceName;

    etw.QueueCreated(instanceName);
  }

  public void Enqueue(int item)
  {
    etw.StartEnqueuing(instance.Count, item, instanceName);
    instance.Enqueue(item);
    Thread.Sleep(300); // Some processing...
    etw.Enqueued(instance.Count, item, instanceName);
  }
}

Implementation of IQueueEvent interface could be any logging or tracing framework. In this case implementation is based on ETW.

Every custom event source must implement EventSource class and must have a unique name. Event attribute on event methods is necessary in case of interface implementation.

In this example StartEnqueuing and Enqueued events have two numeric and one string parameter. WriteEvent can be used for writing the event. It will use method overload with params object[] that is relatively insufficient because of array allocation and boxing-unboxing. It should only be used for relatively rare events (e.g. less than 100 / sec). If rates are faster than that, WriteEventCoreshould be used to create fast helpers for particular method signature. In this example, there is unsafe method – OptimalWriteEvent created for this case of arguments. This method is also marked as NonEvent indicating that it is not event generator but helper method.

[EventSource(Name ="Test-Queue-ETW")]
public class QueueEventETW : EventSource, IQueueEvent
{
  public static IQueueEvent Log = new QueueEventETW();

  [Event(1)]
  public void QueueCreated(string queueName)
  {
    if (IsEnabled()) WriteEvent(1, queueName);
  }

  [Event(2, Opcode = EventOpcode.Start)]
  public void StartEnqueuing(int count, int item, string queueName)
  {
    if (IsEnabled()) OptimalWriteEvent(2, count, item, queueName);
  }

  [Event(3, Opcode = EventOpcode.Stop)]
  public void Enqueued(int count, int item, string queueName)
  {
    if (IsEnabled()) OptimalWriteEvent(3, count, item, queueName);
  }

  [NonEvent]
  private unsafe void OptimalWriteEvent(int eventId, int count, int item, string name)
  {
    fixed (char* nameBytes = name)
    {
      var descrs = stackalloc EventSource.EventData[3];
      descrs[0].DataPointer = (IntPtr)(&count);
      descrs[0].Size = 4;

      descrs[1].DataPointer = (IntPtr)(&item);
      descrs[1].Size = 4;

      descrs[2].DataPointer = (IntPtr)nameBytes;
      descrs[2].Size = (name.Length + 1) * 2;
      WriteEventCore(eventId, 3, descrs);
    }
  }
}

Test client is very simple. It creates two instances of TestQueue with different queue names and enqueues few elements.

static void Main()
{
  TestQueue firstQueue = new TestQueue(QueueEventETW.Log, "FirstQueue");
  TestQueue secondQueue = new TestQueue(QueueEventETW.Log, "SecondQueue");
  firstQueue.Enqueue(1);
  secondQueue.Enqueue(2);

  firstQueue.Enqueue(3);
  secondQueue.Enqueue(4);

  firstQueue.Enqueue(5);
  secondQueue.Enqueue(6);
}

To enable event providers and collect events, we can use PerfView. After starting the application, there is collection option.

Efficient Structured Event Logging

Name of the custom event must be specified as additional provider:

Efficient Structured Event Logging

In case of previously defined event source, name was specified through class attribute – Test-Queue-ETW (name can also be dynamically defined through EventSource constructor).

PerfView syntax requires a * before event source name.

After clicking on “Start Collection” event provider is enabled and all events generated after this moment will be captured and stored into Events log. Click on “Stop Collection” shows collected data in the next form.

Efficient Structured Event Logging

After opening the Events form, events can be filtered by name and columns of interest can specified. All events are time ordered and ready for analyzing.

Efficient Structured Event Logging

Queue name has been passed so it is easy to distinguish which entry is related to which queue. State before and after enqueuing is distinguished by Start and Stop event names.

Stop event also has DURATION_MSEC column populated, it is duration between Start and Stop events. This duration is calculated on processing of collected data, so it is completely independent of event generation. To support this, events must have Start and Stop op-codes, or their naming must have Start and Stop prefix, and they must be defined one after the other.

ActivityID represents correlation between events. ActivityIDs can be nested depending on use case and asynchronous logic. Events with the same root are considered related. ActivityID works fine with Tasks(link is external), it tracks correlation between events generated in different threads and based on it execution duration can be measured.

ETW filtering

ETW has two mechanisms for controlling which events will be logged and they can be applied independently depending on the need.

Each event can be tagged with two quantities:

  1. A small integer (enumeration) called a ‘Level’ which defines how verbose the event is. Levels go from 0 to 5: 0 (Always) 1 (Critical) 2 (Error) 3 (Warning) 4 (Information) 5 (Verbose).

If nothing specified, the default is 4 (Informational). When the provider is turned on and level is specified, only those events that are less than or equal to the specified level will log messages.

  1. A 64 bit, bit-vector called the events ‘Keywords’ which define 64 groups that an event can belong to. When the provider is turned on a set of keywords is passed in. Only those events whose keywords overlap with the passed in keywords will log messages.

Next to it, every event can be “logically” grouped. For example, all event traces that are related to enqueuing can be part of “Enqueuing” logical group. ‘Task‘ property is used for that.

In order to do this event method must have Event attribute where Level, Keyword, Task properties can be specified.

[Event(1,
   Keywords = Keywords.Diagnostic,
   Task = Tasks.Commanding,
   Level = EventLevel.Warning)]
public void LogCommand(string commandName)
{ if (IsEnabled()) WriteEvent(1, commandName); }

Definition of Tasks and Keywords must be defined like nested classes within event source class with bit exclusive values. Tasks and Keywords cannot be specified in runtime.

public class Tasks
{
  public const EventTask Commanding = (EventTask)1;
  public const EventTask Analyzing = (EventTask)2;
}

public class Keywords
{
  public const EventKeywords Diagnostic = (EventKeywords)1;
  public const EventKeywords Perf = (EventKeywords)2;
  public const EventKeywords Internal = (EventKeywords)4;
}

PerfView has next syntax for defining provider with a keyword or log level filtering:

*{EventSourceName}:{Keyword value}:{Log level}

For example:

*Test-Queue-ETW:*:Verbose – for collecting all events, regardless of keyword with all log levels

*Test-Queue-ETW:2 – for collecting all events with Keyword with value 2 (Perf) with default log level (Info)

When should use Log levels and Keywords?

General guideline is to NOT overdo Tasks, Keywords and OpCodes. “Missing an event is far more serious than having extra events”. And keep it simple from users point of view.
Again in some cases there is good reason to specify and use keyword filtering.

  • Events that fire less than 100 times a second on average are ‘too small to care about’. Don’t bother making special keywords for them, but make a catch all ‘Default’ keyword.
  • Events that fire more than 1K times a second on average need keywordsto turn them off if they are not needed.
  • Events in between 100 and 1K are a judgment call
  • Define keywords from a Users’ (or scenario) point of view. Users don’t like to set more than one keyword (too error prone) so define the keywords that all common scenarios can be done by setting a single keyword. Most users will use simply use the default (turn everything on).
  • Levels less than Informational are meant for relatively rare warnings or errors. When in doubt stick with the default of Informational, and use verbose for events that can happen more than 1K events / second. Typically, users use Keywords much more than Level, so don’t worry about it too   much.

Tasks on the other hand can be very handful in case of big amount of logically related events.

ETW performances

ETW was implemented to be super-fast and very lightweight. But what does “super-fast” mean? What is performance impact when event provider is disabled and enabled? How does input property type affects performances of the provider?

Kathleen Dollard has few interesting courses regarding ETW on pluralsight.  In “What’s New in .NET 4.5” she did some benchmark where she showed performance measurements.

Here are my measurements on standard Virtual Machine on i3 with 24GB of RAM and .NET 4.6.2 while event providers are enabled:

Efficient Structured Event Logging

Additional performance notes:

– If data provider is disabled, regardless of previous test case, average execution was 1.200.000.000 calls / seconds. Performance impact is minimal if data provider is disable.

– To keep previous statement true, IsEnabled must be checked before each WriteEvent.

– Logging complex type is most expansive, but next to it, it is necessary to initialize base EventSource with “EventSourceSettings.EtwSelfDescribingEventFormat” parameter (MSDN). Just the fact that custom event source class has an event method with complex type, slows down performances of all other event methods that work with primitive types. Performance drop is significant ~ 30%. Generally, complex types should be avoided in on paths with frequent execution. Although sometimes it is nice to have it supported, in this case recommendation would be to separate event sources with complex event logs with primitive types.

– Note that event logs with few parameters can be very expensive. To optimize it, there is unsafe WriteEventCore method that can be used. By MSDN it is supposed to be used in case of high rate (high rate is considered more than 1000 events / sec).

Summary

This post gave brief introduction to the structured logging using ETW. Demonstrative example showed that it is easy to create an Event provider and how to collect events using PerfView tool. Performances measurements proofed that event generation is highly optimized if it is used correctly.

 

Share

Sign in to get new blogs and news first:

Leave a Reply

Željko Todorović

Software Architect @Schneider Electric DMS NS
mm

Željko Todorović is Software Architect at Schneider Electric DMS NS. Part of the “Realtime Engine” of the team that is responsible for the development of mission critical SCADA platform. During his professional career he worked with several international teams and became a Full Stack .NET developer, and later the Technical Lead – Architect. In order to achieve better performance and higher quality products and code, Željko constantly teaches and promotes best practices within the design team. Free time spend with his daughter and in the hall for boxing.

Sign in to get new blogs and news first.

Categories