jump to navigation

Using CEP to monitor replication heartbeat March 25, 2012

Posted by Mich Talebzadeh in Complex Event Processing.
add a comment

There are “front Office Trading Systems* that use bi-directional or peer-to-peer replication to access local hub in different locations, say London, NY and Tokyo. Database in each hub is replicated to the other hubs and kept in sync via Sybase Replication Server. These trading systems are often characterised by moderate throughput. However, they have little tolerance for latency of data. The health of such systems and the underlying technology (in this case Sybase Replication Server) is paramount.

Objectives: Highlights the benefits of deploying Aleri CEP to continuously monitor Replicated delivery feed  and raise alarms when conditions are met

Aimed at: Those enterprises that are contemplating using Aleri CEP as part of their strategy of moving towards Event Driven Architecture and its obvious benefits

The Problem:   When Replicated delivery falls unexpectedly, Replication Server may have an issue

The Solution:   Continuously monitor Replicated delivery feed, calculate averages across specific time windows (one hour), compare to a threshold, and generate alerts and send this alert to subscribers such as ASE/RS monitoring tools, DBA dashboard, Application Support etc

Why use Aleri CEP:  This is a concept. So it may not be the most cost efficient solution. However, it has certain advantages in terms of its throughput, reduced latency and the fact that there is really no overhead of storage of data in CEP itself. Technically there is no physical and logical IO. I worked on this as a method familiarising DBAs with it. Complex Event Processing (CEP) is *Event Driven and memory based*. There is no requirement for either physical or logical IO. I will show you a diagram of Aleri Studio that helps to construct streams and operators to process events as they come in.

So we have the following components in an event driven processing

Event Generators –> Event Channel –> Event Processing –> Downstream Event-Driven Activity

Event Generators: In this case this is Replication Server heartbeat

Event Channel: This could be Direct read from replicate database,  message queue or replication adapter for CEP

Event Processing: With the breakdown of component as follows

ReplicatedTicksInput: This is Source stream for Replication Server heart beat delivery in replicate ASE. So we are periodically sending data into an ASE table. Replication Server takes that and using function strings delivers that data to a replicate database on replicate ASE. The frequency of inserts for heartbeat can vary throughout the day. During trading hours, the number of ticks (inserts to primary table) could be one in 15 seconds. During batch timing this could be relaxed to one row (tuple) every 30 seconds. Anyhow Aleri engine can handle large volume of data in a second etc. So data gets into replicate table and streamed into Aleri via message bus.

LocalTime: This is a ComputeStream operator: This operator adds a single field “LocalTime” that has the current time in it. This time is from CEP itself and not from SoureceStream.

TicksPerHour: This is an aggregateStream operator opens a one hour window for each feed to determine a tuple count for that feed. This application could be done with a single 24 hour aggregate, but that would require that the Aleri application holds 24 hour of data in memory which could be large and impractible. This aggregate reduces that need to a single hour of data.

MeanPer24Hour: This aggregate operator will open a sliding window per feed to generate statistics (i.e., mean and standard deviation) for 24 time slices (from the previous aggregate). The assumption is that on a Monday to Friday basis, the volume of ticks on a given day should be comparable with next day. Obviously week-end could be exception. For the sake of this demo, weekends are ignored, although Aleri CEP can handle that as well (assuming the mean is built for weekends)

SetThreshold: Using this operator we set a threshold alert for ticks to fall below MeanPer24Hour (see below)

TickFallOffFilter: This filter stream allows tuples to pass through whenever the tuple count for an hour falls below 90% of the average (replication heartbeat falls bellow 90% of average per hour).  This makes sense in an otherwise high volume environment (say OLTP, trading hours) where there is a large count per hour, but will generate more alerts in a low-volume scenario.  The filtered tuples are dropped (although this can be changed by adding an output port for non-matching tuples).

TickFallOffAlert: If an alert occurs in this application, that would imply that the RS feed is stale and there is an issue with Replication Server.  Therefore, any reporting on replicated database should be suspended until the feed health is restored. Action would be to resolve the usual replication issues (DBA territory).

ReplicatedTicksStats: This stream provides statistics on the feeds that could be used by a client application to visualize the feed health. It could also write to a database if needed

Downstream Event-Driven Activity:

The main alerting would be from the output of TickFallOffAlert that will have Subscribers for this alerting (examples: SMS, automated ticket raising, DBA dashboard etc). Additionally the other output  ReplicatedTicksStats  will provide the stats such as FeedName, StartOfTimeSlice, AvgTicksPerHour, StdevTicksPerHour, LastTicksPerHour etc. This data can be stored for historical analysis in Sybase IQ etc.

 

CEP and IMDB Contrasts March 25, 2012

Posted by Mich Talebzadeh in Complex Event Processing.
add a comment

Fundamentally products like classic databases (DBMS), in-memory database (IMDB) or Complex Event Processing (CEP engines) strive or work towards *maximizing throughput and minimising latency*.

That is probably easier said than done.

A DBMS including IMDB is charged with three basic tasks. It must be able to *store data*, keep that data, and take the data out and work with it. The keyword is storage of data which involves a structure requiring physical and logical IO. In case of IMDB things are simplified (less complex indexes like Oracle TimesTen or simply in case of ASE IMDB, be content with Physical IO cost for table access to be zero). However, the nature of the beast has not changed. Data has to be stored whether on disk or in memory and has to be persistent. Access to data will require as minimum Logical IO.

In contrast, Complex Event Processing engines like Sybase ALERI, Oracle CEP or Streambase CEP are *Event Driven and memory based*. There is no requirement for either physical or logical IO. In such scenarios data from various sources is streamed in like a stream/river so to speak. You build logic to alert you if an event is happening (an event as described before). Think about it as fishing for Salmon. You may see alert for Salmon that could turn up to be carp and choose to ignore it. CEP do not have any persistent database. They are not meant to store data (hence event driven/on the spot decision). They can have adapters to flush out memory data into something like Sybase IQ as Sybase Aleri can do. As an analogy, in UK most anklers have a  bucket that keep fish in it for a short while. They catch the fish, keep it in the bucket, take few pictures and release the fish back to the stream (carp normally, you do not get Salmon in most British rivers except Scotland) . So the bucket could be the database but is not part of the process and you do not need it.

In trading when you have streams of data coming in like in algorithmic trading into CEP with rules defined, the quotes or ticks are events to themselves. Think of it as correlation of two instruments like Exxon and Shell. They are both oil related and share quotes/ratios are strongly correlated. Any significant divergence in the ratio say outside of Bollinger bands (Simple moving averages (SMA) outside of SMA + 2σ) may trigger buy or sell condition for one or other. In these scenarios one acts on alerts and may get historical data from IQ to help Situation Assessment.

What is Complex Event Processing (CEP) March 25, 2012

Posted by Mich Talebzadeh in Complex Event Processing.
1 comment so far

Those who follow the general trends in the industry may recall the advent of Complex Event Processing (CEP). In a nutshell CEP involves the continuous processing and analysis of high-volume, high-speed data streams from inside and outside of an enterprise to detect business-critical issues as they happen in real time.

Contrast this to the traditional processes involving database systems, which provide delayed analysis. An example of CEP would be real-time financial market data analysis and decision process allowing traders or anyone else to make a decision on the spot based on real time data. Prime example is a Forex System where on the basis of certain indicators (say moving averages for the past 14 periods) you make a decision to buy or sell.

As an example Sybase got into CEP through acquisition, rolling up 2 of the leading independent CEP providers: Aleri and Coral8. As such Sybase now has  two CEP products. These are Sybase Aleri Streaming Platform, and Sybase CEP. The latter is based I believe on Coral8 product. Later on these two products may be integrated to one as one product. Other vendors like Oracle, much like IMDBs have their own pet CEPs.

CEP software offers two major components: a high-level language for programmers to easily describe how to process the streams/messages, and an infrastructure engine for processing and analyzing high-volume data streams. Although CEP software performs different functions, the component structure is somehow analogous to database software, where there is a language (SQL) and an engine (the database server). The objectives of CEP is to get the product and save on development cycle traditionally done by in-house developers.

I believe that in the next year or so CEP will be a product that will be considered by many shops and hence there will be inevitable infra-structure/DBA involvement. Both Sybase CEP products integrate with ASE, IQ and RAP, but they can be deployed on their own. They do not require the use of a database – both have their own purpose-build in-memory data manager as well as disk-based storage manager for (optional) data recovery capability. But when integrated with an external DB (from Sybase or other), the DB can be a source of data into the CEP engine, can be integrated “in line” with the CEP engine calling out to the DB to lookup info, and can be a repository for output.

An interesting book on the subject is “The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems” by David Luckham (ISBN 978-0201727890).