jump to navigation

Using CEP to monitor replication heartbeat March 25, 2012

Posted by Mich Talebzadeh in Complex Event Processing.

There are “front Office Trading Systems* that use bi-directional or peer-to-peer replication to access local hub in different locations, say London, NY and Tokyo. Database in each hub is replicated to the other hubs and kept in sync via Sybase Replication Server. These trading systems are often characterised by moderate throughput. However, they have little tolerance for latency of data. The health of such systems and the underlying technology (in this case Sybase Replication Server) is paramount.

Objectives: Highlights the benefits of deploying Aleri CEP to continuously monitor Replicated delivery feed  and raise alarms when conditions are met

Aimed at: Those enterprises that are contemplating using Aleri CEP as part of their strategy of moving towards Event Driven Architecture and its obvious benefits

The Problem:   When Replicated delivery falls unexpectedly, Replication Server may have an issue

The Solution:   Continuously monitor Replicated delivery feed, calculate averages across specific time windows (one hour), compare to a threshold, and generate alerts and send this alert to subscribers such as ASE/RS monitoring tools, DBA dashboard, Application Support etc

Why use Aleri CEP:  This is a concept. So it may not be the most cost efficient solution. However, it has certain advantages in terms of its throughput, reduced latency and the fact that there is really no overhead of storage of data in CEP itself. Technically there is no physical and logical IO. I worked on this as a method familiarising DBAs with it. Complex Event Processing (CEP) is *Event Driven and memory based*. There is no requirement for either physical or logical IO. I will show you a diagram of Aleri Studio that helps to construct streams and operators to process events as they come in.

So we have the following components in an event driven processing

Event Generators –> Event Channel –> Event Processing –> Downstream Event-Driven Activity

Event Generators: In this case this is Replication Server heartbeat

Event Channel: This could be Direct read from replicate database,  message queue or replication adapter for CEP

Event Processing: With the breakdown of component as follows

ReplicatedTicksInput: This is Source stream for Replication Server heart beat delivery in replicate ASE. So we are periodically sending data into an ASE table. Replication Server takes that and using function strings delivers that data to a replicate database on replicate ASE. The frequency of inserts for heartbeat can vary throughout the day. During trading hours, the number of ticks (inserts to primary table) could be one in 15 seconds. During batch timing this could be relaxed to one row (tuple) every 30 seconds. Anyhow Aleri engine can handle large volume of data in a second etc. So data gets into replicate table and streamed into Aleri via message bus.

LocalTime: This is a ComputeStream operator: This operator adds a single field “LocalTime” that has the current time in it. This time is from CEP itself and not from SoureceStream.

TicksPerHour: This is an aggregateStream operator opens a one hour window for each feed to determine a tuple count for that feed. This application could be done with a single 24 hour aggregate, but that would require that the Aleri application holds 24 hour of data in memory which could be large and impractible. This aggregate reduces that need to a single hour of data.

MeanPer24Hour: This aggregate operator will open a sliding window per feed to generate statistics (i.e., mean and standard deviation) for 24 time slices (from the previous aggregate). The assumption is that on a Monday to Friday basis, the volume of ticks on a given day should be comparable with next day. Obviously week-end could be exception. For the sake of this demo, weekends are ignored, although Aleri CEP can handle that as well (assuming the mean is built for weekends)

SetThreshold: Using this operator we set a threshold alert for ticks to fall below MeanPer24Hour (see below)

TickFallOffFilter: This filter stream allows tuples to pass through whenever the tuple count for an hour falls below 90% of the average (replication heartbeat falls bellow 90% of average per hour).  This makes sense in an otherwise high volume environment (say OLTP, trading hours) where there is a large count per hour, but will generate more alerts in a low-volume scenario.  The filtered tuples are dropped (although this can be changed by adding an output port for non-matching tuples).

TickFallOffAlert: If an alert occurs in this application, that would imply that the RS feed is stale and there is an issue with Replication Server.  Therefore, any reporting on replicated database should be suspended until the feed health is restored. Action would be to resolve the usual replication issues (DBA territory).

ReplicatedTicksStats: This stream provides statistics on the feeds that could be used by a client application to visualize the feed health. It could also write to a database if needed

Downstream Event-Driven Activity:

The main alerting would be from the output of TickFallOffAlert that will have Subscribers for this alerting (examples: SMS, automated ticket raising, DBA dashboard etc). Additionally the other output  ReplicatedTicksStats  will provide the stats such as FeedName, StartOfTimeSlice, AvgTicksPerHour, StdevTicksPerHour, LastTicksPerHour etc. This data can be stored for historical analysis in Sybase IQ etc.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: