jump to navigation

Technical Architect and the challenge of Big Data Architecture March 23, 2015

Posted by Mich Talebzadeh in Big Data.

In the previous articles I gave an introduction to Big Data and Big Data Architecture. What makes Big Data Architecture challenging is the fact that there are multiple sources of data (Big Data Sources) and one storage pool, namely the Apache Hadoop Distributed File System (HDFS) as part of Data conversion and storage layer as shown in diagram 1 below.


This makes the job of Technical Architect very challenging. I believe Technical Architect is an appropriate combined term for someone who deals with Data Architecture and Technology Architecture simultaneously. I combined these two domains together because I personally believe that in almost most cases there is a strong correlation between these two domains! TOGAF defines Data Architecture as “the structure of an organization’s logical and physical data assets and data management resources”. Further it defines Technology Architecture as “the software and hardware capabilities that are required to support the deployment of business, data and application services, including IT infrastructure, middleware, networks, communications, processing, and standards”. Putting semantics aside, I have never come across a Data Architect who is ambivalent to the underlying technology and neither to a Technologist who does not care about the type of data and the network bandwidth that is required to deliver that data. Sure plenty of people who seem to spend most of their time creating nice looking diagrams and quoting a phrase from George Orwell’s Animal Farm as soon as these diagrams were ready they were burnt in the furnace.

Anyway back to the subject, the sources of data have been around for a long time. What is the new kid in class is Hadoop. To be sure Hadoop has not been around for long. There are who of the opinion that Big Data is still more hype than reality, but agree that Hadoop might be a keeper.  It would only be the 2nd example in the IT industry, though, of a standard promoted by a giant industry player becoming universal, the 1st being Java of course as Sun Microsystems was a giant in its time, amazing though that now seems.

Big Data Architecture is all about developing reliable and scalable data pipelines with almost complete automation. That is no small feat as the Technology Architect requires deep knowledge of every layer in the stack, beginning with identifying the sources and quality of data, Data ingestion and conversion processes, the volume of data and the requirement for designing the Hadoop cluster to cater for that data. Think about it as coming up for each data input what needs to be done to factorize the raw data into relevant information, the information which in all probability has to confirm to Master Data Model, has to be tailored to individual consumption layer, with all access and security needed for it.

It would be appropriate here to define broadly what Hadoop is all about.


Think about a relational database management system (RDBMS) first as it is more familiar. Examples are Oracle or SAP Sybase ASE. RDBMS manages resources much like an operating system. It is charged with three basic tasks:

  • It must be able to put data in
  • keep that data
  • Take the data out and work with it.

Unlike an RDBMS, Hadoop is not a database. For example, it has basic APIs to put data in. It certainly keeps the data but again has very basic APIs to take that data out. So it is an infra-structure, it is also called a framework or Ecosystem. Its native API commands are similar to Linux commands. Case in point, in Linux you get the list of root directories by issuing an ls / command. In HDFS (see below) you get the same list by issuing hdfs dfs -ls / command.

Hadoop consists of two major components. These are:

  1. A massively scalable Hadoop Distributed File System (HDFS) that can support large amount of data on a cluster of commodity hardware (JBOD)
  2. A massively Parallel Processing System called MapReduce Engine, based on MapReduce programming model for computing results using a parallel, distributed algorithm on a cluster.

In short Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware scaled horizontally to speed up computations and reduce latency. Compare this architecture to a typical RDBMS which is scaled vertically using a Symmetric Multi processing (SMP) architecture in a centralised environment.

Understanding Hadoop Distributed File System (HDFS)

HDFS is basically a resilient and clustered approach to managing files in a big data environment. Since data is written once and read many times after, compared to the transactional databases with constant read writes, HDFS lend itself to supporting big data analysis.


As shown in figure 1 there is a NameNode and multiple DataNodes running on a commodity hardware cluster. The NameNode holds the metadata for the entire storage including details of where the blocks for a specific file are kept, the whereabouts of multiple copies of data blocks and access to files among other things.

It is akin to dynamic data dictionary views in Oracle and SYS/MDA tables in SAP ASE. It is essential that the metadata is available pretty fast.  NameNode behaves like an in-memory database IMDB and uses a disk file system called the FsImage to load the metadata as startup. This is the only place that I see value for Solid State Disk to make this initial load faster. For the remaining period until HDFS shutdown or otherwise NameNode will use the in memory cache to access metadata. Like an IMDB, if the amount of metadata in NameNode exceeds the allocated memory, then it will cease to function. Hence this is where large amount of RAM helps on the host that NameNode is running. With regard to sizing of NameNode to store metadata, one can use the following rules of thumb (heuristics):

  • NameNode consumes roughly 1 GB for every 1 million blokes (source Hadoop Operations, Eric Sammer, ISBN: 978-1-499-3205-7). So if you have 128 MB block size, you can store 128 * 1E6 / (3 *1024) = 41,666 GB of data for every 1 GB. Number 3 comes from the fact that the block is replicated three times (see below). In other words just under 42TB of data. So if you have 10 GB of NameNode cache, you can have up to 420 TB of data on your DataNodes


Data nodes provide the work horse for HDFS. Within the HDFS cluster data blocks are replicated across multiple data nodes (default three, two on the same cluster, one on another) and the access is managed by name nodes. Before going any further let us look at data node block sizes.

HDFS Block size

Block size in HDFS can be specified per file by the client application. A client writing to HDFS can pass the block size to the NameNode during its file creation request, and it (the client) may explicitly change the block size for the request, bypassing any cluster-set defaults.

The default block size of a client, which is used when a user does not explicitly specify a block size in their API call can be defaulted to the value specified by the local configuration parameter fs.block.size (in bytes). By default this value is set to 128MB (it can be 64MB, 256MB and I have seen 512MB). This block size is only applicable to DataNodes and has no bearing on NameNode.

It would be interesting to make a passing comparison to a typical block size in RDBMS. Standard RDBMS (Oracle, SAP ASE, MSSQL) have 8K per block or page. I am ignoring multi-block size or larger pool sizes for now. In simplest operation every single physical IO will fetch 8K worth of data. If we are talking about Linux, the OS block size is 4K. In other words in every single physical IO, we will fetch two OS blocks from the underlying disk into memory. In contrast, with HDFS a single physical IO will fetch 128*1024K/4K = 32,768 OS blocks. This shows the extreme coarse granularity of HDFS blocks that is suited for batch reads. How about the use of SSD here? Can creating HDFS directories on SSD are going to improve performance substantially. With regard to data nodes, we are talking about few writes (say bulk-insert) and many more reads. Let us compare SSD response compared to magnetic spinning hard disks (HDD) for bulk inserts. Although not on HDFS, I did tests using Oracle and SAP ASE RDBMS for bulk inserts and full table scans respectively as described below

 SSD versus HDD for Bulk Inserts

 SSD faired reasonably better compared to HDD but not that much faster. The initial seek time for SSD was faster but the rest of the writes were sequential extent after extent which would not much different from what HDD does.

In summary:

  • First seek time for positioning the disk on the first available space will be much faster for SSD than HDD
  • The rest of the writes will have marginal improvements with SSD over HDD
  • The performance gains I got was twice faster for SSD compared to HDD for bulk inserts

SSD versus HDD for full table scans

Solid state devices are typically twice here. When the query results in table scans, the gain is not that much. The reason being that HDD is reading sequentially from blocks and the seek time impact is considerably less compared to random reads.

In summary Solid State Disks are best suited for random access via index scan as seek time disappears and accessing any part of the disks is just as fast. When the query results in table scans, the gain is not that much. In HDFS practically all block reads are going to be sequential so having Solid State Disks for data nodes may not pay.

How does HDFS keep track of its data

Let us take an example of HDFS configured with 128MB of block size. So for every 1GB worth of data we consume eight blocks to store data. However, we will need redundancy. In RDBMS redundancy comes from mirroring using SAN disks and having DR resiliency in case of data centre failure. In HDFS the same comes from replicating every block that makes up the file, throughout the cluster in case of single point of failure by a node. So now that blocks are replicated how do we keep track of them? Recall that I mentioned NameNode and metadata. That is where all this info comes. In short, NameNode records:

  • The entity life history of the files, when they were created, modified, accessed, deleted etc
  • The map of all blocks of the files
  • The access and security of the files
  • The number of files, the overall, used and remaining capacity of HDFS
  • The cluster, the number of nodes, functioning and failed
  • The location of logs and the transaction log for the cluster

Unlike NameNode, DataNodes are very much like the workers. However, they do provide certain functions as below:

  • DataNode stores and retrieves the blocks in the local file system of the node
  • Stores the metadata of the block in the local file system, based on the metadata info from the NameNode
  • Does periodic validation of file checksums
  • Sends heartbeat and availability of blocks to NameNode
  • Directly can exchange metadata and data with the client application programs
  • Can forward data to the other DataNodes

Hadoop MapReduce

The second feature of Hadoop is its MapReduce engine. Why it is called engine, it is because it is an implementation of MapReduce algorithm. It is designed to processing large data sets in parallel across a Hadoop cluster. MapReduce as name sounds is a two step process, namely map and reduce. The job configuration supplies map and reduce analysis functions and the Hadoop engine provides the scheduling, distribution, and parallelization services. During the map phase, the input data is divided into input splits for analysis by map tasks running in parallel across the Hadoop cluster. That is the key point in Massive Parallel Processing (MPP). By default, the MapReduce engine gets input data from HDFS. The reduce phase uses results from map tasks as input to a set of parallel reduce tasks. The reduce tasks consolidate the data into final results. By default, the MapReduce engine stores results in HDFS.  Although the reduce phase depends on output from the map phase, map and reduce processing is not necessarily sequential. That is, reduce tasks can begin as soon as any map task completes. It is not necessary for all map tasks to complete before any reduce task can begin. Let us see this in action.

In below example I have a table tdash in HDFS that I imported it from Oracle database via Sqoop and Hive (will cover them in detail later). Sqoop (SQL to Hadoop) uses JDBC drivers to access relational database like Oracle and SAP ASE and stores them in HDFS as text delimited files. Hive software facilitates creating, querying and managing large datasets in relational table format residing in HDFS. So with Sqoop and Hive combined, you can get a table and its data from RDBMS and create and populate it in HDFS in relational format. Hive has a SQL like language similar to Transact SQL in SAP ASE called HiveQL. Now if you run a HQL query on Hive, you will see that it will use MapReduce to get the results out. You can actually create an index in Hive as well but it won’t be a B-tree index though (and at current state it will not be used by the Optimizer).  Diagram two below shows the steps needed to import an RDBMS table into HDFS. I will explain this diagram in later sections.


This table has 4 million rows and the stats for this table in Hive can be obtained believe it or not from update statistics command (BTW very similar to old Oracle one)

hive> analyze table tdash compute statistics;
Hadoop job information for Stage-0: number of mappers: 121; number of reducers: 0
2015-03-24 21:10:05,834 Stage-0 map = 0%,  reduce = 0%
2015-03-24 21:10:15,366 Stage-0 map = 2%,  reduce = 0%, Cumulative CPU 6.68 sec
2015-03-24 21:10:18,522 Stage-0 map = 6%,  reduce = 0%, Cumulative CPU 24.72 sec
2015-03-24 21:10:28,896 Stage-0 map = 7%,  reduce = 0%, Cumulative CPU 34.02 sec
2015-03-24 21:19:04,265 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 326.5 sec
MapReduce Total cumulative CPU time: 5 minutes 26 seconds 500 msec
Ended Job = job_1426887201015_0016
Table oraclehadoop.tdash stats: [numFiles=4, numRows=4000000, totalSize=32568651117, rawDataSize=32564651117]
MapReduce Jobs Launched:
Stage-Stage-0: Map: 121   Cumulative CPU: 326.5 sec   HDFS Read: 32570197317 HDFS Write: 10164 SUCCESS
Total MapReduce CPU Time Spent: 5 minutes 26 seconds 500 msec

So it tells us that this table has 4 million rows (numRows=4000000) and is split into 4 files (numFiles=4). The total size is 30GB has shown below. Total size is quoted in bytes

0: jdbc:hive2://rhes564:10010/default> select (32568651117/1024/1024/1024) AS GB;


|         gb         |


| 30.33192001003772  |


FYI this table in Oracle is 61GB without compression. However, it is good that we have compression in HDFS. If I work this one out this table should table around 30*8 HDFS blocks on the assumption that each block is 128MB and 1 GB data takes around 8 blocks

Now let us run a simple query and see MapReduce in action

0: jdbc:hive2://rhes564:10010/default> select count(1) from tdash
. . . . . . . . . . . . . . . . . . .> where owner = 'PUBLIC'
. . . . . . . . . . . . . . . . . . .> and object_name = 'abc'
. . . . . . . . . . . . . . . . . . .> and object_type = 'SYNONYM'
. . . . . . . . . . . . . . . . . . .> and created = '2009-08-15 00:26:23.0';
INFO  : Hadoop job information for Stage-1: number of mappers: 121; number of reducers: 1
INFO  : 2015-03-24 21:38:40,015 Stage-1 map = 0%,  reduce = 0%
INFO  : 2015-03-24 21:39:01,523 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 18.53 sec
INFO  : 2015-03-24 21:39:03,575 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 18.74 sec
NFO  : 2015-03-24 21:40:31,761 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 79.15 sec
INFO  : 2015-03-24 21:40:37,948 Stage-1 map = 19%,  reduce = 6%, Cumulative CPU 83.21 sec
INFO  : 2015-03-24 21:40:44,129 Stage-1 map = 20%,  reduce = 6%, Cumulative CPU 88.91 sec
INFO  : 2015-03-24 21:40:47,197 Stage-1 map = 20%,  reduce = 7%, Cumulative CPU 90.13 sec
INFO  : 2015-03-24 21:47:23,923 Stage-1 map = 97%,  reduce = 32%, Cumulative CPU 401.5 sec
INFO  : 2015-03-24 21:47:31,207 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 405.54 sec
INFO  : 2015-03-24 21:47:32,237 Stage-1 map = 98%,  reduce = 33%, Cumulative CPU 405.54 sec
INFO  : 2015-03-24 21:47:37,356 Stage-1 map = 99%,  reduce = 33%, Cumulative CPU 407.87 sec
INFO  : 2015-03-24 21:47:41,452 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 409.14 sec
INFO  : 2015-03-24 21:47:42,474 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 409.88 se
INFO  : MapReduce Total cumulative CPU time: 6 minutes 49 seconds 880 msec
INFO  : Ended Job = job_1426887201015_0017

| _c0  |
| 0    |

The query started with 121 mappers and one reducer. I expected zero rows back. However, you will notice that after doing 19% mapping, the reduce process started at 6% and finally both map and reduce processes finished at 100% after 6 minutes and 49 seconds. This shows that reduce process does not need for mappers to complete before it starts.

Planning for Hadoop clustering

The philosophy behind big data is horizontal scaling so this embraces dynamic addition of nodes to a cluster depending on the famous three Vs; namely Volume, Velocity and Variety of data. Planning Hadoop cluster is not much different from planning for Oracle RAC or SAP Shared Disk Cluster. In contrast to these two products where the databases are in one place, HDFS distributes blocks of data among Data Nodes. The same resiliency and DR issues apply to Hadoop as much as Oracle and SAP ASE classic databases.

Hadoop runs on commodity hardware. That does not necessarily mean that you can deploy any hardware. Commodity hardware is a relative term. Sure you will not need to invest in SAN technology as you can get away with using JBODs, so there is relative savings to be made in here. For Hadoop to function properly, you will need to consider the hardware for NameNodes (AKA masters) and DataNodes (AKA workers). The master nodes require very good resiliency as a failure of a master node can disrupts the processing. In contrast, worker nodes that hold data can fail as there is in-built redundancy of data across workers.

Hardware for NameNodes

NameNodes hold metadata for HDFS data plus job scheduler, job tracker, worker nodes heartbeat monitor and secondary NameNode. So it is essential that redundancy is built into NameNode hardware. Bear in mind that the metadata is cached in memory. To this end the NameNode host must have enough RAM and pretty fast disks for start-up load. In above I recommended having SSD disks for NameNodes which I think would be beneficial to performance. NameNode will be memory hungry and low on disk storage. The OS disk needs to be mirrored internally (a default configuration) in all cases. The metadata will include filename, distribution of blocks and the location of replicas for each block, permissions, owner and group data. Bear in mind that longer filenames will occupy more bytes in memory. In general the more files the NameNode needs to track the more metadata it needs to maintain and more memory will be consumed. Depending on the size of the cluster if we are talking about 10-20 DataNodes, and assuming that 1GB of NameNode can store up to 42 TB of metadata, then we are looking at 420 to 840 TB of data storage so any host with 24 GB of RAM with quad-core, 1Gbit Ethernet connection and two JBOD drives will do, in addition to two disks that will be used for Operating System and mirror. Larger clusters will require NameNode with 48, 96 up to 128 GB of RAM or more.

Data Ingestion

Once we have pinpointed the source of data, Technical Architect needs to classify the characteristics of the data that must be processed according to each application. Data regardless of its source will have common characteristics:

  • Data origin – front office, middle office, back office, external providers, machine generated, email, intranet, wiki, spreadsheets, logs, audio, legacy, ftp
  • Data format – structured, unstructured
  • Data frequency – end of day, intra-day (on demand), continuous, time series
  • Data type – company data, customer reference data, historical, transactional
  • Data size – the volume of data expected from the silos and their daily growth. Your friendly DBAs should be able to provide this information plus various other administrators. You will need this information to assess the size of Big Data repository and its growth
  • Data delivery – How the data needs to be processed at the consumption layer

There is a terminology used for manipulating data. It is referred to as Data Ingestion. Data ingestion is the process of getting and processing data for later use or storage. The key point is processing these disparate data. Specifically Data Ingestion includes the following:

  • Data Identification – identifying different data formats for structured data and default formats for unstructured data.
  • Data filtration – only interested in data relevant to master data management (MDM) needs
  • Data validation – continuous validation of data against MDM
  • Data pruning – removing unwanted data
  • Data transformation – migrating from local silo schema to MDM
  • Data Integration – integrating the transformed data into Big Data

Real time data Ingestion

Real time reporting is becoming increasingly important. This has been around as a continuous real time log based transactional data capture. One of the challenges would be to implement it to deliver real time data to Hadoop. I am currently actively involved in getting this tested and production ready for a solution and I hope to cover it later.

Batch data Ingestion

The most straight forward case is getting data from structured data sources. Think about RDBMS data as source. In all probability you are familiar from getting data from RDBMS to another RDBMS. For example you can text delimited files to get data out of say SAP ASE and import it into Oracle (BCP data out from ASE, put data in using SQLLDR).

The only difference here is that the destination is Hadoop. Currently the most widely used tool is called Sqoop that uses JDBC to get data out of RDBMS and import it into HDFS in different file formats. I will discuss this later. Sqoop can also do incremental import the so called incremental append by checking the last unique identifier of rows and importing the new ones. It will effectively use SQL like SELECT * FROM <TABLE> WHERE IDENTIFIER > SOME_VALUE. However, this will not address the deletes and updates unless the full data is overwritten which will be time consuming if the underlying table is very large.

To be continued …

Big Data Architecture February 19, 2015

Posted by Mich Talebzadeh in Big Data.
1 comment so far

Data sources

I mentioned before that there are sources and consumers of big data and a big bridge in between. Let us take a heuristic approach and see what we mean by big data sources and the consumption layer. This is depicted in the diagram below:


This layer provides the foundation and data sources necessary for a data driven enterprise. From the figure, let us have a look at the sources of data. Clearly we can see that we have varieties. We have structured data, mainframe data (ISAM, VSAM all that) in the form of legacy systems, we have text, images, spreadsheets and documents, intranet, XML and message bus data (TIBCO, MQ etc). While looking at more data is valuable and complex, looking at more data sources will likely add value to business.

The job of analysing the whole of this disparate data is complex. However, companies believe that it will open new challenges and revenues. In short they want to convert this enormous heap to information. Remember data on its own is practically worthless until it can be analysed and turned into information. So we must keep in mind that the ultimate purpose of data is to support business needs.

We are all familiar with the structured data that we keep in relational databases in one shape and form. Unstructured data such as logs, emails, blogs, conference recordings, webinars etc are everywhere often hidden below layers of heap, babble, misspelled names etc that make it difficult to store and turn it into information.

Can a relational database be made to deal with unstructured data? Well not very likely. Recall that we still have capacity and latency issues with image and text fields stored in relational databases. Those who are familiar with relational databases know full well that replication of text and image columns (clob and blob) create challenges for replication server in Sybase and materialized views in Oracle, notably the added latency and backlog to push the data. Simply put, relational databases were not built to process large amount of unstructured data in the form of texts, images etc.

How about Data warehouses based on columnar models of relational databases? Again they lack the build to provide meaningful and efficient analysis of un-structured data.

Data conversion and storage layer

Now that we have identified the source of big data, we will need to identify the characteristics of the data that must be processed according to each application. Data regardless of its source will have common characteristics:

Data origin – front office, middle office, back office, external providers, machine generated, email, intranet, wiki, spreadsheets, logs, audio, legacy, ftp
Data format – structured, unstructured
Data frequency – end of day, intra-day (on demand), continuous, time series
Data type – company data, customer reference data, historical, transactional
Data size – the volume of data expected from the silos and their daily growth. Your friendly DBAs should be able to provide this information plus various other administrators. You will need this information to assess the size of Big Data repository and its growth
Data delivery – How the data needs to be processed at the consumption layer

There is a terminology used for manipulating data. It is referred to as Data Ingestion. Data ingestion is the process of getting and processing data for later use or storage. The key point is processing these disparate data. Specifically Data Ingestion includes the following:

Data Identification – identifying different data formats for structured data and default formats for unstructured data.
Data filtration – only interested in data relevant to master data management (MDM) needs, see below
Data validation – continuous validation of data against MDM
Data pruning – removing unwanted data
Data transformation – migrating from local silo schema to MDM
Data Integration – integrating the transformed data into Big Data

These are shown in diagram 2 below. The golden delivery is data according to Master Data Management (MDM). MDM is defined as a comprehensive method of enabling an organisation to link all of its critical data to one file, called a master file that provides a common point of reference. It is supposed to streamline data sharing among architectures, platforms and applications. For me and you it is basically the enterprise wide data model and what data conversion is ultimately trying to achieve is to organize data of different structures into a common schema. Using this approach, the system can relate myriad of data including structured data such as trading and risk data and unstructured data such as product documents, wiki, graphs, power point presentations and conference recordings.

So in summary what are the objectives of an enterprise wide data model? Well to run any business, you manipulate data at hand to make it do something specific for you; to change the way you do business, you take data as-is and determine what new things it can do for you. These two approaches are more powerful combined than on their own.

The challenge is to bring the two schools of thoughts together. So, it does make sense to create a common architecture model to leverage all types of data in the organisation to open new opportunities.


Within the storage layer, you will see we have Hadoop Distributed File System (HDFS) which can be used by a variety of NoSQL databases, plus Enterprise data warehouse.

For majority of technologists, I thought it would be useful to mention few words on HDFS and how it can be used. Hadoop is a framework for distributed data processing and storage. It is not a database, so it lacks functionality that is necessary in many analytics scenarios. Fortunately, there are many options available for extending Hadoop to support complex analytics, including real-time predictive models. So what are these options?

These are collectively called NoSQL databases. No does not mean that some cannot use SQL. It means Not Only SQL databases. Why is that? It is because SQL is by far the most widely used language to get data out of databases. Generations of developers know and use SQL. So it makes sense for tools that deal with data stored in Hadoop to use a dialect of SQL.

In general, a NoSQL database relaxes the rigid requirements of a traditional RDBMS (meaning ACID properties) by following CAP theorem (Consistency, Availability, Partition) by compromising Consistency in favour of Availability and Partition and can implement new usage models that require lower latency and higher levels of scalability. NoSQL databases come in four major types:

Key/Value Object Store – is the simplest type of database and can store any kind of digital object. Each object is assigned a key and is retrieved using the same key. Since you can only retrieve an object using its associated key, there is no way to search or analyze data until you have first retrieved it. Data storage and retrieval are extremely fast and scalable, but analytic capabilities are limited. Examples of key/value object stores include MemcacheDB, Redi and Riak.

Document Stores– These are a type of key/value store in which documents are stored in recognizable formats and are accompanied by metadata. Because of the consistent formats and the metadata, you can perform search and analysis without first retrieving the documents. Examples of document stores include MarkLogic, Couchbase and MongoDB.

Columnar Databases – These provide varying degrees of row and column structure, but without the full constraints of a traditional RDBMS. You can use a columnar database to perform more advanced queries on big data. Examples of columnar databases include Cassandra and HBase.

Graph Databases – These store networks of objects that are linked using relationship attributes. For example, the objects could be people in a social network who are related as friends, colleagues or strangers. You can use a graph database to map and quickly analyze very complex networks. Examples of graph databases include InfiniteGraph, Allegro and Neo4J.

Analysis layer

The analysis layer is responsible for transforming the data from Big Data to the Consumption layer. In other words this is the place where we need to make business sense of collected data. Analysis layer will have to adopt different approaches to take advantages of Big Data. The approach will have to include both business as usual analytics used in data warehouses plus innovative solutions for unstructured data. The fundamental difference between something like data warehouse and Hadoop is that a typical data warehouse is scaled vertically using a Symmetric Multi processing (SMP) architecture in a centralised environment, whereas Hadoop based tools use Hadoop distributed file system (HDFS) to physically distribute unstructured data based on Massive Parallel Processing (MPP) through MapReduce Engine.

Within the Analysis layer we need to identify what usage we are going to make of data collected in HDFS and enterprise data warehouse. The bulk of analysis tools should be able to answer the following:

What happened -This is the domain of traditional data warehouses. The information can be augmented by using query engines against HDFS for reduced query time

Why did it happen – What can we learn? Traditional BI engines and Decision Management tools can help.

What will happen in short, medium and long term – Pretty self explanatory. Event scoring and Recommendation Engines based on Predictive and Statistical Models can help.

Added value – How can we make it happen to benefit the business? Decision Management engines and Recommendation Engines can be deployed

In addition we need to process real time notifications and event driven requests which are typically stream based and may require Complex Event Processing (CEP) engines such as SAP Event Stream Processor or Oracle CEP.

Modelling can deploy pricing/quant’s models together with statistical models. These are shown in diagram 3 below.


Finally the whole of Big Data Architecture is shown in diagram 4 below.


Introduction to Big Data February 19, 2015

Posted by Mich Talebzadeh in Big Data.
add a comment

Big Data is nowadays a vogue of the Month or the year. I recall around 10 years ago Linux was in the same category as well. Anyway we have got used to these trends. Bottom line Big Data is a trendy term these days and if you don’t know about it and you are in I.T. then you are missing the buzz.

Anyway let us move on and separate the signal from the pedestal so to speak

According to Wikipedia, Big Data is defined as and I quote:

“Big Data is a broad term for data sets so large or complex that they are difficult to process using traditional data processing applications. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy.”

Right, what does so large and complex mean in this context? Broadly it refers to the following characteristics:

Extremely large Volume of data:

– Data coming from message buses (TIBCO, MQ-Series etc)
– Data coming from transactional databases (Oracle, Sybase, Microsoft and others)
– Data coming from legacy sources (ISAM, VSAM etc)
– Data coming from unstructured sources such as emails, word and excel documents etc
– Data coming from algorithmic trading through Complex Event Processing
– Data coming from Internet (HTML, WML etc)
– Data coming from a variety of applications

OK I used the term unstructured above. So what is that really? First for me structured data means anything kept in a relational database and accessed via Structured Query Language (SQL). So there we go the word structured popped up. SQL is very structured. Get it wrong or misspell or misplace anything and it won’t work!

Now what does the term unstructured data mean in this context? It basically means that its structure is not predictable or it does not follow a specified format. In other words it does not fit into relational context. I am sure some puritans will disagree with me on that point.

Extremely High Velocity:

Data is streaming in at unprecedented speed and must be dealt with in a timely manner. This could be batch or real time

Extremely wide variety of data:

OK, if you look at points under extremely large Volume of data, you will see the variety of structured and unstructured data that Big Data deals with.

Traditionally over the past 20-30 years the bulk of data used in many industries including Financial sector is stored in relational databases in structured format. Yes the one with tables and columns, primary keys, indexes, third normal form and stuff that originated by Dr Codd in 1980. Before that they were hierarchical and network databases but we won’t go into that. The current estimate is that around 20-30% of available data is stored in structured format and the rest are unstructured. Unstructured refers to piece of data that either does not have a pre-defined data model or is not organized in a pre-defined manner. You can pick them up from the above list.

So if you are coming from relational database management background like myself, you may be correctly asking yourself, why we need all this when we have reliable, time tested relational databases (RDBMS)? The answer is something that probably you already know. In short not every data fits into a structured mold and the unstructured data accounts for 70% of the data!

You may argue that we have data warehouses these days. However, recall that data warehouses like SAP Sybase IQ is a columnar implementation of relation model. In contrast, databases like Oracle and SAP Sybase ASE are row based implementation of relational model. Both Oracle and SAP Sybase ASE nowadays are offering an in-memory database as well. Oracle with version 12c about to be offering a columnar in-memory implementation of database but it will still use the same row based optimizer that has been in the heart of Oracle for many years.

Bottom line the data that enterprises hold are in different silos/databases and in different form and shapes, even the relational schemas are often local and reflect the legacy data model and they do not adhere to a common enterprise wide data model.

Let me give you an example. Imagine a Bank wants to know its total exposure to certain country today as part of its risk analysis. The position is not clear and is kept in different applications within the bank. Often these applications feed each other one on one. In other words, moving numbers from A to B and back again. A rather simplistic but surprisingly accurate summary!

That does not help to solve the problem. May be the bank has lent directly to this country. It may also have dealings with other banks and institutions that have large exposure to the said country. So where is the aggregate risk position as of now for the bank? It cannot go on shifting spreadsheets from one silo to another. Even if it did that it would take ages and will never be real time.

Banks are further under regulatory pressure to abide by many accounting and regulatory standards and will need IT transformation. One of these would be abiding by IFRS 9 Financial Instruments (regulation) under Risk and Legal Entity. These will all require (eventually) a golden source of data to get the required information.

Big Data is not a single technology but incorporates many new and exitsting technologies. It is not just just about having something like Hadoop Framework in place where you can use it as a big massive kitchen sink of all data. It is a myriad of technologies all working together. Ignoring the complexity of middle layers, we should think about the Big Data sources at one end and the so called Consumption layer at the other end.

In-Memory Data Grid (IMDG), why do we need it and how different it is from an In-Memory Database January 16, 2015

Posted by Mich Talebzadeh in Data Grid.
add a comment


Relational databases have been the backbone of data storage for the majority of data stored in computers since 1980s. They are extensively used in many domains including financial sector.

My experience is heavily biased towards Banking. So I can only bore you from what I know. Throughout my long exposure from Front Office trading systems to back office and reconciliation, I have come to the conclusion that banking is all about moving numbers from A to B and back again. A rather simplistic but surprisingly accurate summary! Think of it as coping data from one silo to another and in between make sure that extract, transform and load (call it ETL if you like) will work fine.

For years this has been around. One database, two tier or three tier application in a client-server environment. If you had bandwidth and latency issues you would have used some form of replication. Replication is the process of sharing data at multiple databases and keeping them in sync. Products like Oracle Golden Gate, Sybase Replication server or simply Materialized Views are still commonly used to replicate data among multiple geographical locations or among heterogeneous databases. So these were the original data distributed set-ups. Replication is primarily used to address the bandwidth and latency issues. However, the access path with associated cost such as Physical IO, Logical IO, LRU-MRU chain etc would still be there.

The fundamental concern with the traditional data fetch from disk resident “classic” databases is that a significant portion of time is consumed by what is referred to as the data access overhead, where a service or application spends the majority of its effort retrieving the relevant data and often transforming the data to internal data structures. I call this bucketing. Think of it as the fruit of a developer who writes a report to extract end of day or intra-day reports, gets the data out of the database, stores it in C# or Java objects (i.e. mapping from a relational model to an object model) and finally pushes that data or rather cooked data (raw data with added bells and whistles) into another database, yet ready for another report. I am sure most of you are familiar with this process. This process may involve literally hundreds of reports/extracts running against the primary or reporting databases, multiple polling and generally a gradual impact on the throughput and performance of the systems.

Another caveat is the fact that often individual applications have their own specific schema or data model that may have to go through a fair bit of transformation to be of value to the target system. For example, you may find dates that are cryptic and need to change to standard format.

The Approach

Delivering the real-time responsiveness required of modern multi-tier applications requires a new set of data services. These data services must be delivered through a complementary architecture that leverages the data within relational databases, but crucially optimizes its delivery for real-time application requirements. Remember we still need a persistent data store for permanent storage (the so called golden source) and audit requirements so the database is not going to go away. It may move away from relational to NoSQL databases but it still has to store data one way or other.

We need to consider the following challenges:

  1.  Mapping from relational to object model
  2. Create a schema or model for wider needs
  3. Eliminate data access path overhead of typical of databases
  4. Move from the traditional client/server model to Service Oriented Architecture
  5. Use some form of in-memory caching. By caching data in memory across many servers instead of within a single database server, distributed caching eliminates performance bottlenecks and minimizes access times.
  6. Resiliency is provided by having in-memory data on multiple servers kept in sync
  7. In-memory Data grid automatically scales its capacity and handles network or server failures. It also provides APIs for coordinating data access and asynchronously notifying clients. To this data grid adds powerful tools, such as parallel query and map/reduce, to give applications the insights they need for fast, in-depth analysis.
  8. We still have to content with legacy/local databases. Remember that most applications are legacy and they will have to use legacy/local databases. The cost to rewrite these types of applications may be prohibitive unless they are decommissioned or replaced.

Index Join in Oracle and the equivalent Index Intersection in Sybase ASE September 1, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
1 comment so far

Both Oracle and Sybase ASE provide multipole access paths to the optimizer. Many people have this perception that for a given table only one index can be used by the optimizer. Although this is a widely expected outcome and it holds true for most cases, under certain conditions where the optimizer decides that it does not need to access the base table, more than one index on the same table can be used to satisfy the query.

Consider a simple case of a select list returning the values for a column or columns based on columns used in a where caluse.  In olther words the select list is entirely satisfied by the columns in the where clause. Let us look at a simple example in each case

Oracle (11gR2)

First create the table:

CREATE TABLE index_joins AS
 (ROWNUM + trunc(dbms_random.value(0,1000))) AS c2,
 LPAD('A',255,'A') AS c3
 LEVEL <=10000;
CREATE UNIQUE INDEX ind_c1 ON index_joins(c1);
CREATE INDEX ind_c2 on index_joins(c2);

Let us query this table to get columns c1 and c2

alter system flush buffer_cache;
alter system flush shared_pool;
set timing on
set autotrace on
SELECT c1, c2
FROM index_joins
where   c1 between 100 and 1000
and     c2 between 50 and 100;

The select list contains column c1 and c2. The index ind_c1 covers c1 column and the index ind_c2 is on c2. These two indexes on index_joins table satisfy the query. Hence there is no need for the optimizer to visit the base table. Let us look at the plan:

| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |                  |     4 |    36 |     6  (17)| 00:00:01 |
|*  1 |  VIEW              | index$_join$_001 |     4 |    36 |     6  (17)| 00:00:01 |
|*  2 |   HASH JOIN        |                  |       |       |            |          |
|*  3 |    INDEX RANGE SCAN| IND_C2           |     4 |    36 |     3  (34)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN| IND_C1           |     4 |    36 |     4  (25)| 00:00:01 |
Predicate Information (identified by operation id):
 1 - filter("C2"<=100 AND "C1"<=1000 AND "C1">=100 AND "C2">=50)
 2 - access(ROWID=ROWID)
 3 - access("C2">=50 AND "C2"<=100)
 4 - access("C1">=100 AND "C1"<=1000)

Let us look at the plan outpout. The optimizer does a index range scan of the two indexes extracting the rowid and c1 from index ind_c1 and the rowid and c2 from ind_c2. Then the optimizer performs a hash join between threse two results set on the rowid values. It is doing this because anytime the two rowsources have a rowid in common, that is the interestion that we want and both c1 and c2 values will be sent back as the result of the query.

Sybase ASE (ASE 15.7 ESD 2)

First we create the equivalent table in Sybase

create table index_joins
c1 int not null
, c2 int not null
, c3 varchar(255)
lock datarows
CREATE UNIQUE INDEX ind_c1 ON index_joins(c1)
CREATE INDEX ind_c2 on index_joins(c2)
declare @i int
declare @rows int
set @rows = 100
set @i = 1
set nocount on
while @i <=@rows
 insert into #generator values (@i)
 set @i = @i + 1
set nocount off
insert into
 , rs.id + abs(rs.random % 1000)
 , REPLICATE('A',255)
from    ( select
 @rows * ( g1.id - 1 ) + g2.id AS id
 #generator g1,
 #generator g2
 ) rs
update index statistics index_joins

So that is the equivalent table in Sybase.

Unlike Oracle, the optimizer in Sybase will not consider using two indexes on the same table. The only way one could use two different indexes for the same table in a query is to have an  index intersection plan. Index Intersection is a technique to enable ASE optimizer to use more than one index on a table to satisfy a given query. This should work similar to Oracle “index join” whereby each index satisfies the predicate on the query. The hash join between two indexes results in common Row IDs (RID) to be selected on intersection and base table rows accessed by the intersection RowIDs. In ASE this index interesection has to be enabled explicitly by the following set command:

set index_intersection on

In addition, you need to supply multiscan abstract plan operator m_scan. The query will look like the following code:

set switch on 3604
declare @dbname varchar(30)
set @dbname=db_name()
dbcc cachedataremove(@dbname)
set statement_cache off
set showplan on
set statistics time,io,plancost on
set index_intersection on
SELECT c1, c2
FROM index_joins
where   c1 between 100 and 1000
and     c2 between 50 and 100
plan '(m_scan index_joins)'

The output of the Lava tree operator shows:

==================== Lava Operator Tree ====================
                     (VA = 6)
                     r:0 er:1
                     cpu: 0
              (VA = 5)
              r:0 er:1
              l:4 el:7
              p:0 ep:0
     /                                              \
    HashJoin                                     Restrict
    Inner Join                                   (0)(0)(0)(13)(0)
    (VA = 2)                                     (VA = 4)
    r:0 er:1                                     r:0 er:1
    l:4 el:5
    p:0 ep:0
    bufct: 128
   /                      \                        /
IndexScan               IndexScan               TableScan
ind_c2                  ind_c1                  index_joins
(VA = 0)                (VA = 1)                (VA = 3)
r:0 er:1                r:0 er:900              r:0 er:1
l:2 el:2                l:0 el:3                l:0 el:934
p:2 ep:2                p:0 ep:3                p:0 ep:5
============================================================<br />

The execution plan steps are described as follows:

  1. index scan of ind_c2 to get the RIDs of the qualifying rows for “index_joins.c2 between 50 and 100″
  2. index scan of ind_c1 to get the RIDs of the qualifying rows for “t.index_joins.c1 between 100 and 1000″
  3. does HashJoin between those two sets of RIDs to return the common RID
  4. The stream of unique RIDs is passed to the RID JOIN operator. This operator creates a worktable (aka Dynamic Index) and fills it with a single-column row with each RID.
  5. does a RID Scan on the table index_joins using those RIDs from the worktable and gets back the rows.

Note that there is a subtle difference in behaviour between Oracle and ASE. Oracle does not require to visit the base table. However, ASE will visit the base table to feth the result set for RIDs from the worktable created. This method will also work for cases where the result set includes other columns from the table. for example a select * from the base table will also use this access path in ASE.

Oracle 10g, instance and database diagrams July 23, 2012

Posted by Mich Talebzadeh in Oracle.
add a comment

Use this link for Oracle10gR2 Architecture diagram Use this link for Oracle10gR2 Instance diagram Use this link for  Oracle10gR2 database diagram

Understanding AWR Report June 13, 2012

Posted by Mich Talebzadeh in Oracle.

Updated December 2014

In the post Automatic Workload Repository  (AWR) Performance Monitoring Tool Basics , I described the basic set up and report generation for AWR. In this post we will try to understand the AWR report itself.

Before going further I must emphasise that this report was generated by running a PL/SQL block immediately after the instance was rebooted. The code was  used to simulate a typical OLTP workload with frequent insert/update/deletes and commits. The sample code:

  • Performs checkpoints immediately before and after PL/SQL block
  • Manually takes AWR snapshots before and after running PL/SQL block

The code is shown below

  type ObjIdArray is table of tdash.object_id%TYPE index by binary_integer;
  l_ids objIdArray;
  CURSOR c IS SELECT object_id FROM tdash;
  OPEN c;

      FORALL rs in 1 .. l_ids.COUNT
        UPDATE testwrites
          SET PADDING1 =  RPAD('y',4000,'y')
        WHERE object_id = l_ids(rs);
      FORALL rs in 1 .. l_ids.COUNT
        DELETE FROM testwrites
        WHERE object_id = l_ids(rs);
      FORALL rs in 1 .. l_ids.COUNT
       INSERT INTO testwrites
       SELECT * FROM tdash t WHERE t.object_id = l_ids(rs);
        DBMS_OUTPUT.PUT_LINE('Transaction failed');
  CLOSE c;

The output from the AWR report is shown below.

The snapshot details

This is normally shown at the beginning of the report.

        Snap Id      Snap Time   Sessions Curs/Sess
      --------- -------------- --------   -----
Begin Snap:4413 17-May-12 09:22:34   28   3.0
  End Snap:4414 17-May-12 09:43:35   29   3.3
   Elapsed:          21.02 (mins)
   DB Time:          21.78 (mins)

Note that unlike the hourly usual sample, this sampling took 21 minutes to complete. The snapshots in this case were taken immediately before and after running the PL/SQL block.

Elapsed (time) is just the interval of time between the start and end snapshots. Another important quantity is database DB Time which is the sum of working sessions’ time.  So DB Time =  sum of database CPU time + waits. In systems with multiple concurrent active sessions DB Time can be larger than the elapsed time. This is because DB Time is a sum over all active sessions that are using CPU(s) or waiting for an event. Note that Background processes are not included in that.

Load Profile

Next we will look at the Load Profile in AWR report

Load Profile      Per Second    Per Transaction Per Exec Per Call
~~~~~~~~~~~~   --------------- --------------- --------- -------
DB Time(s):            1.0                0.0       0.02    0.39
DB CPU(s):             0.1                0.0       0.00    0.04
Redo size:    27,284,637.6          916,465.9
Logical reads:    46,156.0            1,550.3
Block changes:    31,214.4            1,048.5
Physical reads:    2,446.5               82.2
Physical writes:   5,627.1              189.0
User calls:            2.7                0.1
Parses:               50.8                1.7
Hard parses:           0.8                0.0
W/A MB processed:      0.1                0.0
Logons:                0.1                0.0
Executes:             62.0                2.1
Rollbacks:             0.0                0.0
Transactions:         29.8

Per Second here means that this measure correlated to 1 second. It is worth looking at Per Second statistics to ensure that the amount of work the instance is doing is reasonable for the tasks it has to perform. For example is doing nearly 30 transactions per second good enough?

To understand what Per Transaction means, we need to look at another statistic further down the report from section under Instance Activity Stats

Statistic                     Total     per Second     per Trans
----------------------- ----------- -------------- -------------
user commits                 37,550           29.8           1.0

Per transaction is just per commits. So here we had 37,550 commits. To work out the total Redo size generated, we can take the Redo size throughput per transaction at 916,465.9 bytes and multiply that with the number of commits at 37,550. That will give us the total of 914465.9 * 37550/1024/1024 = 32747.4 MB of redo generated.

Other interesting statistics are Logical reads, Physical reads and Physical writes. These are quoted in blocks. so as before the total values would be 58,213,765 Logical block reads, 3,086,610 Physical block reads and 7,096,950 Physical block writes. Now a useful statistic would be the total amount of writes occurred in MB. Since the server is 8K block then the total amount of writes is (7096950 * 8192)/(1024 * 1024)  = 55,445 MB.

Instance Efficiency Percentages

This part is not that useful. Let us have a look at it

Instance Efficiency Percentages (Target 100%)
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   94.70    In-memory Sort %:  100.00
            Library Hit   %:   95.49        Soft Parse %:   98.41
          Execute to Parse %:   18.06         Latch Hit %:  100.00
Parse CPU to Parse Elapsd %:   59.75     % Non-Parse CPU:   98.91

OK we have 100% targets in few places. On its own this section does not tell us much. Buffet Hit is 94.70% but this is expected as we booted the instance before the test. Then we have Library Hit at 95.49% for the same reason.


Think of Parsing as basically a two step process, that of a Syntax Check to check the validity of the statement and that of a Semantic Check; to ensure the statement can execute properly. The next step in the parse operation is to see if the statement being parsed has already being  processed by another session.  If that is the case then we can skip the next two steps in the process, that of optimization and row source generation.  If we can skip these next two steps in the process, we have done what is known as a Soft Parse, a shorter process to getting our query going.  If we cannot, if we must do all of the steps, we are performing as what is known as a Hard Parse; we must parse, optimize and generate the plan for the query.  Needless to say that Hard Parsing is more CPU intensive than Soft Parsing. Note that we have 59.75% Parse CPU to Parse Elapsed. We are losing a fair bit of time while parsing. However, that should not matter as after reboot shared pool did not contain any useful compiled query and since parsing is such a small fraction of the work being done.

Top 5 Timed Foreground Events

This is probably one of the most important section of AWR report. In a nutshell it is all about wait events. It records the number of waits encountered in the period and the total time spent waiting together with the average time waited for each event. The section is ordered by the percentage  of the total database time that each Event is responsible for.

Top 5 Timed Foreground Events
                                         wait   % DB
Event                   Waits   Time(s)  (ms)   time Wait Class
---------------------------- ----------- ------ ------ ----------
db file sequential read 2,817,676    810   0   62.0 User I/O
DB CPU                               130        9.9
free buffer waits           5,505     70  13    5.3 Configurat
log file sync                 142     46 324    3.5 Commit
log buffer space              371    36   97    2.7 Configurat

Dependent on what is seen in this section, other report sections may need to be referenced in order to quantify or check the findings. For example, the wait count for a particular event needs to be assessed based upon the duration of the reporting period and also the number of active sessions at the time. In other words do stats look OK compared to the reported workload?

In above 62% of database time was spent waiting for db file sequential read, which in this case was about single block index reads. However, the average wait time was zero. Another 9.9% of the time was spent waiting for or using CPU time. A tenth of database time is not bad at all. In general High CPU usage is often a symptom of poorly tuned SQL (excessive logical I/O due to sub-optimal search path or excessive physical I/O). In this case CPU is not busy at all. However, in case CPU is busy say above 80%, you can investigate under  section SQL ordered by CPU Time and find the SQL Id that makes CPU DB busy. When you find that SQL Id that is taking high CPU resources, that sql query can be tuned for better performance.

Further down we have 5.3% of database time spent on free buffer waits. Free buffer wait occurs when a user session reads a block from the disk and cannot find a free block in the buffer cache to place it in. One frequently attributed reason is that the DBWR process is not clearing the buffer cache fast enough. However, there can be many reasons and one generic answer will not help. You will need to investiggate the cause or causes of this problem.

The log file sync wait event reports 3.5% of database time. It is triggered when a user session issues a commit (or a rollback). The user session will signal or post the LGWR to write the log buffer to the redo log file. If you look at the sql code above we are indeed issuing frequent commits after processing every 100 rows. Finally, we have Log Buffer Space wait event taking 2.7% of DB time. This event occurs when server processes write data into the log buffer faster than the LGWR process can write it out. Refer to Oracle documentation if you see high readings for this wait event.

The time model statistics

Time model statistics tell you the processing time spent on various matrics during the snapshot interval. You should not expect the % of DB Time to add up to 100% because there is overlap among statistics. For example sql execute elapsed time requires CPU time for sql execution. Note that some statistics such as background elapsed time and background cpu time are shown as well but these are not part of  % of DB Time.

Time Model Statistics                    DB/Inst: MYDB/mydb  Snaps: 4413-4414
-&amp;amp;amp;amp;amp;gt; Total time in database user-calls (DB Time): 1306.8s
-&amp;amp;amp;amp;amp;gt; Statistics including the word &amp;amp;amp;amp;amp;quot;background&amp;amp;amp;amp;amp;quot; measure background process
time, and so do not contribute to the DB time statistic
-&amp;amp;amp;amp;amp;gt; Ordered by % or DB time desc, Statistic name
Statistic Name                          Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time               1,247.7         95.5
DB CPU                                   129.9          9.9
connection management call elapsed time    5.5           .4
parse time elapsed                         4.2           .3
hard parse elapsed time                    3.7           .3
PL/SQL execution elapsed time              1.2           .1
PL/SQL compilation elapsed time            0.4           .0
hard parse (sharing criteria) elapsed time 0.1           .0
repeated bind elapsed time                 0.0           .0
hard parse (bind mismatch) elapsed time    0.0           .0
sequence load elapsed time                 0.0           .0
failed parse elapsed time                  0.0           .0
DB time                                1,306.8
background elapsed time                3,395.9
background cpu time                       19.6

We can go through these figures now. 129.9 seconds CPU time was used for all user sessions. This was just under 10% of database resources. In total there was 1,306.8 seconds database time used. The total wait event time can be calculated as DB time – DB CPU, i.e. 1,306.8 – 129.9 = 1,176.9 seconds. The lion share of database time (95.5%) was spent on executing sql which is a good sign. The toal parse time was 5.5 seconds of which 4.2 seconds was hard parsing. The rest of statistics is tiny in this example.

Foreground Wait Class

We are familiar with Oracle wait events. In AWR report under Wait Class, wait events (to be precise user generated wait events AKA Foreground wait events) are grouped into classes. These include User I/O, DB CPU, Configuration, Commit, Other, System I/O, Application, Network and Concurrency. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue. The following list includes common examples of the waits in some of the classes:

  • User I/O: Waits for user IO (db file sequential read, db file scattered read, direct path read, direct path write etc)
  • DB CPU: See above for the definition of this class
  • Configuration: Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size etc)
  • Commit: waits for redo log write confirmation after a commit
  • Other: Waits which should not typically occur on a system (for example, ‘wait for EMON to spawn’)
  • System I/O: Waits for background process IO
  • Application: Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)
  • Network: waits for data to be sent over the network
  • Concurrency: Waits for internal database resources (for example, latches)

For other classes  not covered above,  you can get information from Oracle manuals.

Let us look at this section in our AWR report:

                             %Time   Total Wait  wait
Wait Class      Waits        -outs   Time (s)    ms)  %DB time
----------- ---------------- ----- ---------------- -------- -----
User I/O          2,866,247     0     838        0      64.2
DB CPU                                130                9.9
Configuration         5,930     0     110       18       8.4
Commit                  142     0      46      324       3.5
Other                 1,971    96      20       10       1.6
System I/O              417     0       0        0       0.0
Application              88     0       0        0       0.0
Network               2,653     0       0        0       0.0
Concurrency               5     0       0        0       0.0

User I/O taking 64.2% of database time was explained before. The CPU usage was just under 9.9%. Configuration contributed 8.4% of wait time. These are attributed to database settings. We will see the events contributing to this class below.

Foreground Wait Events

-&amp;amp;amp;amp;amp;gt; s  - second, ms - millisecond -    1000th of a second
-&amp;amp;amp;amp;amp;gt; Only events with Total Wait Time (s) &amp;amp;amp;amp;amp;gt;= .001 are shown
-&amp;amp;amp;amp;amp;gt; ordered by wait time desc, waits desc (idle events last)
-&amp;amp;amp;amp;amp;gt; %Timeouts: value of 0 indicates value was &amp;amp;amp;amp;amp;lt; .5%.  Value of null is truly 0
                              %Time Total Wait  wait  Waits   % DB
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
db file sequential r 2,817,676    0   810       0     75.0   62.0
free buffer waits        5,505    0    70      13      0.1    5.3
log file sync              142    0    46     324      0.0    3.5
log buffer space           371    0    36      97      0.0    2.7
db file scattered r     48,458    0    28       1      1.3    2.1
rdbms ipc reply             10   90    19    1949      0.0    1.5
log file switch             48    0     4      81      0.0     .3

For the sake of brevity I have only included those wait events that contribute to % DB time. db file sequential read and db file scattered read belong to user I/O wait class. log file sync is part of Commit wait class. free buffer waits, log buffer space, rdbms ipc reply and log file switch wait events belong to Configuration wait class.

Background Wait Events

This section of AWR report provides statistics on wait events for Oracle background processes.

Background Wait Events
-&amp;amp;amp;amp;amp;gt; ordered by wait time desc, waits desc (idle events last)
-&amp;amp;amp;amp;amp;gt; Only events with Total Wait Time (s) &amp;amp;amp;amp;amp;gt;= .001 are shown
-&amp;amp;amp;amp;amp;gt; %Timeouts: value of 0 indicates value was &amp;amp;amp;amp;amp;lt; .5%.  Value of null is truly 0
                              %Time Total Wait  wait  Waits   % bg
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
log file parallel write 10,319    0   845      82      0.3   24.9
db file parallel write  43,425    0   207       5      1.2    6.1
direct path write           55    0     2      40      0.0     .1
control file seq re     2,501     0     2       1      0.1     .1

The log file parallel write shows LGWR is waiting for blocks to be written to all online redo log members in one group. LGWR will wait until all blocks have been written to all members.   So here we had 24.9% of backhround total time spent on log file parallel write. The db file parallel write wait event belongs to DBWR process since it is the only process that writes dirty blocks from the SGA to datafiles.  DBWR process compiles a set of dirty blocks, hands the batch over to the OS, and waits on the db file parallel write event for the I/O to complete. The parameter of interest here is Avg wait (ms). In our case it is 5ms which is a perfectably respetable figure. Obviously larger average wait times point to slower I/O subsystem or poor I/O configurations.

Service Level Statistics

The service statistics section gives information about how particular services configured in the database are performing.

    • A service is a grouping of processes
    • Users may be grouped in SYS$USERS
    • A single user (application login) may be grouped with that user name
Service Statistics
-&amp;amp;amp;amp;amp;gt; ordered by DB Time
                                            Physical      Logical
 Service Name   DB Time (s)   DB CPU (s)    Reads (K)    Reads (K)
------------- ------------ ------------ ------------ ------------
SYS$USERS            1,281          129        3,079       58,172
mydb.mich.local         25            1            2           34
SYS$BACKGROUND           0            0            5           14
mydbXDB                  0            0            0            0

Note that SYS$USERS service is a default service for all applications. You can configure different services for different production applications. This will enable DBA to identify the heavy loading services on the system. by grouping services according to the production need. For example services created for batch applications etc.

Service Wait Class Stats
-&amp;amp;amp;amp;amp;gt; Wait Class info for services in the Service Statistics section.
-&amp;amp;amp;amp;amp;gt; Total Waits and Time Waited displayed for the following wait
   classes:  User I/O, Concurrency, Administrative, Network
-&amp;amp;amp;amp;amp;gt; Time Waited (Wt Time) in seconds
Service Name
User I/O  User I/O  Concurcy Concurcy Admin  Admin Network Network
Tot Wts Wt Time   Tot Wts Wt Time Total Wts Wt Time Tot Wts Wt Time
--------- ------- -------- -------- --------- --------- ------- -------
2864754     838      5         0         0         0     796     0
1493          0      0         0         0         0    1857     0
761          4      48         1         0         0       0     0

SQL Section

Any SQL statement appears in the top 5 statements in two or more areas below, then it is a prime candidate for tuning. The sections are:

  • Total Elapsed Time
  • Total CPU Time
  • Total Buffer Gets
  • Total Disk Reads
  • Total Executions
  • Total Parse Calls
  • Total Sharable Memory
  • Total Version Count

Let us try to see what these mean.

Total Elapsed Time

Total Elapsed Time = CPU Time + Wait Time. If a SQL statement appears in the total elapsed time area of the report this means its CPU time plus any other wait times made it pop to the top of the pile. Excessive Elapsed Time could be due to excessive CPU usage or excessive wait times.

This is the area that you need to examine and probably the one that will be reported by the users or application support. From a consumer perspective, the finer details don’t matter. The application is slow. Full stop.

In conjunction with excessive Elapsed time check to see if this piece of SQL is also a high consumer under Total CPU Time. It is normally the case. Otherwise check the wait times and Total Disk Reads. They can either indicate issues with wait times (slow disks, latch gets etc) or too much Physical IO associated with tables scans or sub-optimal indexes.  This section is a gate opener and often you will need to examine other sections.

Total CPU Time

When a statement appears in the Total CPU Time area this indicates it used excessive CPU cycles during its processing. Excessive CPU processing time can be caused by sorting, excessive function usage or long parse times. Indicators that you should be looking at this section for SQL tuning candidates include high CPU percentages in the service section for the service associated with this SQL (a hint, if the SQL is uppercase it probably comes from a user or application; if it is lowercase it usually comes from the internal or background processes). To reduce total CPU time, reduce sorting by using composite indexes that can cover sorting and use bind variables to reduce parse times.

Total Buffer Gets

Total buffer gets mean a SQL statement is reading a lot of data from the db block buffers. Generally speaking buffer gets (AKA logical IO or LIO) are OK, except when they become excessive. The old saying that you reduce the logical IO, because then the physical IO (disk read) will take care of itself holds true. LIO may  have incurred a PIO in order to get the block into the buffer in the first place. Reducing buffer gets is very important and should not be underestimated. To get a block from db block buffers, we have to latch it (i.e. in order to prevent someone from modifying the data structures we are currently reading from the buffer). Although latches are less persistent than locks, a latch is still a serialization device. Serialization devices inhibit scalability, the more you use them, the less concurrency you get. Therefore in most cases optimal buffer gets can result in improved performance. Also note that by lowering buffer gets you will require less CPU usage and less latching. |Thus to reduce excessive buffer gets, optimise SQL to use appropriate indexes and reduce full table scans. You can also look at improving the indexing strategy and consider deploying partitioning (licensed).

Total Disk Reads

High total disk reads mean a SQL statement is reading a lot of data from disks rather than being able to access that data from the db block buffers. High physical reads after a server reboot are expected as the cache is cold and data is fetched from the disk. However, disk reads (or physical reads) are undesirable in an OLTP system, especially when they become excessive. Excessive disk reads do cause performance issues. The usual norm is to increase the db buffer cache to allow more buffers and reduce ageing . Total disk reads are typified by high physical reads, a low buffer cache hit ratio, with high IO wait times. Higher wait times for Disk IO can be associated with a variety of reasons (busy or over saturated SAN, slower underlying storage, low capacity in HBC and other hardware causes). Statistics on IO section in AWR, plus the Oerating System diagnostic tools as simple as iostatcan help in identifying these issues. To reduce excessive disk reads, consider partitioning, use indexes and look at optimizing SQL to avoid excessive full table scans.

Total Executions

High total executions need to be reviewed to see if they are genuine executions or loops in SQL code. I have also seen situations where autosys jobs fire duplicate codes erroneously. In general statements with high numbers of executions usually are being properly reused. However, there is always a chance of unnecessary loop in PL/SQL, Java or C#. Statements with high number of executions, high number of logical and or physical reads are candidates for review to be sure they are not being executed multiple times when a single execution would serve. If the database has excessive physical and logical reads or excessive IO wait times, then look at the SQL statements that show excessive executions and show high physical and logical reads.

Parse Calls

Whenever a statement is issued by a user or process, regardless of whether it is in the SQL pool it undergoes a parse.  As explained under Parsing, the parse can be a hard parse or a soft parse. Excessive parse calls usually go with excessive executions. If the statement is using what are known as unsafe bind variables then the statement will be reparsed each time. If the header parse ratios are low look here and in the version count areas.

Sharable Memory

Sharable Memory refers to Shared Pool memory area in SGA , hence this perticular section in AWR Report states about the SQL STATEMENT CURSORS which consumed the maximum amount of the Shared Pool for their execution.

In general high values for Sharable Memory doesn’t necessary imply there is an issue It simply means that

  1. These SQL statements are big or complex and Oracle has to keep lots of information about these statements OR
  2. big number of child cursors exist for those parent cursors
  3. combination of 1 & 2

In case of point 2, it may be due to poor coding such as bind variables mismatch, security mismatch  or overly large SQL statements that join many tables. In a DSS or  DW environment large complex statements are normal. In an OLTP database large or complex statements are usually the result of over-normalization of the database design, attempts to use an OLTP system as a DW or simply poor coding techniques. Usually large statements will result in excessive parsing, recursion, and large CPU usage.

Version Count

High version counts are usually due to multiple identical-schema databases, unsafe bind variables, or Oracle bugs.

Instance Activity Statistics

This section compares the statistic values of instance activity between the two snapshot sets. For each statistic, the value of the statistic is shown along with the differentials measured by DB time, elapsed time, and per transaction. You have a comprehensive set of statistics here. Use these stats with the other stats

Tablespace IO Stats

This is a useful section in that it shows the tablespace IO activities. Remember the stats here are at the tablespace level. A good place to correlate tablespace IO stats is with section Statistic depicting Physical read and writes total IO requests as shown below:

Statistic Total per Second per Trans
-------------------------------- ------------------ -------------- -------------
physical read total IO requests 2,873,185 2,278.0 76.5
physical write total IO requests 5,636,945 4,469.3 150.1

Back to Tablespace IO Stats we see:

Tablespace IO Stats
-&amp;amp;amp;amp;amp;gt; ordered by IOs (Reads + Writes) desc

Tablespace ------------------------------
               Av       Av     Av               Av     Buf  Av Buf
    Reads     Reads/s  Rd(ms) Blks/Rd  Writes Writes/s Waits Wt(ms)
  ---------- ------- ------- -------  --------- ---- -------------
APP1 2,860,666  2,268   0.3     1.1  3,564,644   2,826   0    0.0
UNDOTBS1    90      0   9.8     1.0  1,990,759   1,578   0    0.0
SYSTEM   7,190      6   0.2     1.3        204       0   0    0.0
SYSAUX   1,938      2   0.3     2.1      1,047       1   0    0.0
APP1_HDD   105      0   0.1     1.0        105       0   0    0.0

Let us look at the physical read total IO requests. We have 2,873,185  in total. If we add up Reads from Tablespaces we see the lion share comes from App1 tablespace with a total of 2,860,666 Physical IO reads (Not block reads). The Average Physical IO Reads per second (Av Reads/s) was 2,268.  On Average every Physical IO reads took 0.3 ms and fetched 1.1 blocks.

Now switching back to writes, we see activity in both APP1 and UNDOTBS1 tablespaces. APP1 being the user tablespace and UNDOTBS1 gathering undo data!. Again we we add up the total Physical IO writes 3,564,644+1,990,759, we arrive top the figure of 5,555,403 not far off from 5,636,945 under physical write total IO requests. The Average physical write total IO for APP1 tablespace was 2,826. Obviously the throughput will tell if there has been any sub-optimal performance compared to previous records. If that is the case then you will need to examine the underlying disk sub-systems for the tablespace.

Next we look at the section under File IO Stats. This section describes the underlying files for tablespaces.  So with that in mind we will look at the readings for each file.

Tablespace Filename
------------------------ ----------------------------------------------------
 Av Av Av Av Buffer Av Buf
 Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------- ------- ------------ -------- ---------- -------
APP1 /ssddata4/oracle/mydb/APP02.dbf
 695,636 552 0.4 1.1 1,602,099 1,270 0 0.0
APP1 /ssddata4/oracle/mydb/APP03.dbf
 702,494 557 0.2 1.0 849,159 673 0 0.0
APP1 /ssddata6/oracle/mydb/app01.dbf
 1,462,536 1,160 0.3 1.1 1,113,386 883 0 0.0

The main interest would be those file systems that belong to tablespace App1 above, namely APP02.dbf, APP03.dbf and app01.dbf. These are all built on SSD devices so should be pretty fast. Let us look at some of the readings here, for example there were 2,860,666 physical IO read for tablespace APP1. Breaking down we can see that there were 1,462,536 attributed to app01.dbf, 695,636 to APP02.dbf and the rest of 702,494 were from APP03.dbf. Now taking the Average Physical IO Reads per second (Av Reads/s) was 1,160 for app01.dbf, compared to 552 for APP02.dbf and 557 for APP03. So does this imply faster readings for app01.dbf compared to two other ones?  To investigate that we need to look at figures for Average Physical IO Reads/ms (Av Rd(ms)). We see figures of 0.3, 04 and 0.2. So the throughput is pretty consistent. The only noticeable point being that most of data for tablespace APP1 comes from the first file system app01.dbf and hence we see higher AV Reads/s for that file system in the report.

Is there a silver bullet IT Strategy June 4, 2012

Posted by Mich Talebzadeh in Uncategorized.
add a comment

This is about IT strategy and most of the guys that I know have had many years of IT experience. This industry is sharply divided between doers and watchers. The leaders and strategists are the ones that make head waves and usually they know what they want. However, the big obstacles seem to be organizations that we work in.

As a consultant I have been  to many sites that the IT managers and directors have little or no inclination to move IT in a pragmatic way. For most of them the status quo is just fine. In other words they toe the line and they speak the official language. However, as a result the enterprise suffers. I believe there is a culture that values loyalty more than capability. This in turn results in a rigid environment where changes are seldom accepted. Moreover, the IT strategists background is always favored towards a certain product, since, presumably, the manager feels more comfortable with a given product. That is hardly any strategy for progress.

It is also interesting to know that the average employee loyalty to a given firm is two to three years and more often these days people change jobs than before. Any long term vision or strategy is nonexistent chiefly because the IT director has a different agenda for himself/herself, ending up in a lame duck strategy.

I have seen too often that organizations bring in someone (say a new director) from outside (usually the director will tell you that he/she was headhunted for the job!). The pattern is always similar. Let us cut cost, streamline the processes (meaning get rid of few people), bringing in the outsourcing (usually they now few firms in South Asia who they have worked with before) and we can reduce the projects’ lifecycle drastically. There is usually an announcement of some town hall meeting or soap box meeting with video conferencing and large participation by the staff of all type. Mind you the staff were told to attend. Then we have the director giving his/her vision of things to come and IT strategy for the next two years. The figures shown on their favorite Power point projection are impressive. That is until such time that you start asking questions about these figures. The graphs do not add up or skewed too much. The IT director will have difficulty in explaining these figures. Otherwise he/she will tell you that a particular graph is not normalized. You shake your head with disbelief knowing that most of the stuff are downloadable/downloaded from the web. The director has difficulty explaining because he does not believe those figures either. Let us face it, how long it takes to formulate a project strategy. It does not happen overnight, a week, a month or a quarter. I am at loss to understand how these so called IT gurus are going to make it happen? BTW, they are usually pretty good at making cliché type statements. Gone few months you hear that the IT director has left the firm without delivering anything at great cost to the company without any value add. I am at loss to see why this industry is following this trend. Is that because the IT industry unlike some disciplines like civil engineering is still very much evolving? As strategists what are the great unknowns in IT? In my view there is no revolutionary solution. It is only evolutionary. A company’s IT strategy can only evolve through sound leadership and decision making. More importantly the IT strategy has got to be fit for the underlying business.

Use Case Comparison of ORACLE 11gR2 and Sybase ASE 15.7 on Solid State Disks June 2, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
add a comment

Database performance is ultimately constrained by the need to read and write data from a persistent storage. For almost the entire history of Oracle and Sybase, this persistent storage has been provided by the magnetic spinning hard disks (HDD). The relative performance of magnetic disk has been so poor compared to the rest of the hardware, that database performance tuning has focused primarily on minimizing the disk IO. Only now, with the advent of Solid State Disk (SSD) technology and in-memory databases, we are getting to the point of an increase in IO performance to match the improvements we have come to take for granted in CPU and memory access speeds.

Although replacing all the storage component of Oracle and Sybase with SSD is a rather expensive option for many shops, Both databases can benefit from an intelligent use of solid state disks. However, it is not always obvious how best to leverage limited quantities of SSD in order to get the best performance return from the investment.

A client of mine requested me recently to undertake “Use Case Comparison of ORACLE 11G and Sybase ASE 15 “on Solid State Disks. They wanted to improve their system response and both Oracle and Sybase databases were feeding their trading systems. I undertook this study separately for both Oracle 11gR2 and Sybase ASE 15.7 and the outcomes were very interesting. To simplify the matters we separated  read intensive and write intensive activities for both Oracle and Sybase. The results of the studies for read intensive activitiesfor Oracle have now been published in Oracle SELECT journal. SELECT Journal is a quarterly journal of peer-reviewed technical articles and news published by the Independent Oracle Users Group (IOUG). This article was published under title “Use Case Comparison of ORACLE 11g on Solid State Disks (Part One)” . The result of studies for write intensive activities for Oracle 11g will be published shortly

Likewise, the result of studies for read intensive activities for Sybase ASE 15 was published in International Sybase User Group (ISUG) journal in April 2012 edition titled “Use Case Comparison of ASE 15.7 on Solid State Disks [Part 1]“. The result of studies for write intensive activitiesfor Sybase ASE 15 have now been published in June 2012 edition of ISUG journal titled “Use Case Comparison of ASE 15.7 on Solid State Disks [Part 2]“.

A user defined function for Sybase ASE to emulate Oracle’s dbms_random.string() function May 31, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
add a comment

In Oracle, dbms_random package allows one to generate random data. A very widely used function within this package is dbms_random.string(opt, len) that returns a random string whose length is determined by the len argument.

The returned string’s characteristics are determined by the opt argument. opt must be a char. The following five options are supported:
‘u’ or ‘U’: Upper case alpha characters
‘l’ or ‘L’: Lower case alpha characters
‘a’ or ‘A’: Upper and lower case alpha characters
‘x’ or ‘X’: Upper alpha and numeric characters
‘p’ or ‘P’: Any printable character

An example calling the above function with different options would be

col u format a10
col l format a10
col a format a10
col x format a10
col p format a10

2    dbms_random.string('u', 10) as u,
3    dbms_random.string('l', 10) as l,
4    dbms_random.string('a', 10) as a,
5    dbms_random.string('x', 10) as x,
6    dbms_random.string('p', 10) as p
7  from
8    dual;
U          L          A          X          P
---------- ---------- ---------- ---------- ----------
PSXPJUOMMT byebvyevlb JPRjgoAxOb QM4XU14IPX [`i06!e8m0

In Sybase ASE this random string function does not exist. The problem is that even if you write your own user defined function (UDF) it won’t work. There is currently an issue with calling non deterministic functions (getdate(),rand, rand2, newid(), etc) directly from within an ASE UDF. Basically if I try to do the following:

create function sf_test(@dummy int)
returns int
  declare @v int
  select @v = rand2()  -- calling rand function here
  return @v

I will get the following error:

Msg 11798, Level 15, State 65:
Illegal BUILTIN FUNCTION statement within a scalar SQL function.

The work around is to define these functions in a view and call them in UDF referencing the view (thus avoiding direct references to these functions in UDF).

For example create the view in sybsystemprocs first

create view vudf
, CONVERT(char(1),ABS(CAST(NEWID() AS VARBINARY) %9)) as vrandnum
, CHAR(ASCII('!')+ABS(CAST(NEWID() AS VARBINARY) %95))as vprintable
, FLOOR(RAND2()*100) as vrand
, GETDATE() as vgetdate

This is not ideal but as a work around it can be expanded to provide all the non deterministic functions required. The overhead is maintaining this view in sybsystemprocs. An example calling getdate in udf:

create function sf_getdate(@datein datetime)
returns int
  declare @diff int
  select @diff = datediff(day,@datein,vgetdate)from sybsystemprocs.dbo.vudf
  return @diff
-- run this
select dbo.sf_getdate('Apr 15 2012')

Similar to above and using the view generated we can write a random string function in ASE:

use sybsystemprocs
if exists(select 1 from sysobjects where type = 'V' and name = 'vudf')
drop view vudf
create view vudf
, CONVERT(char(1),ABS(CAST(NEWID() AS VARBINARY) %9)) as vrandnum
, CHAR(ASCII('!')+ABS(CAST(NEWID() AS VARBINARY) %95))as vprintable
, FLOOR(RAND2()*100) as vrand
, GETDATE() as vgetdate
if exists (select 1 from sysobjects where type = 'SF' and name = 'sf__randalpha')
drop function sf__randalpha
create function sf__randalpha (@opt char(1),@len int)
returns varchar(255)
** Name        : sf__randalpha
** Created By  : Mich Talebzadeh
** Date        : 30/05/2012
** Purpose     : The function string returns a string whose length is determined by the len argument.
**               The returned string's characteristics are determined by the opt
**               argument. opt must be a char. The following five options are possible:
**               'u' or 'U': Upper case alpha characters
**               'l' or 'L': Lower case alpha characters
**               'a' or 'A': Upper and lower case alpha characters
**               'x' or 'X': Upper alpha and numeric characters
**               'p' or 'P': Any printable character
**               Examples.
**               declare @c varchar(30)
**               select @c = sybsystemprocs.dbo.sf__randalpha('a',30)
  if UPPER(@opt) NOT IN ('U','L','A','X','P') return null
  if (@len <= 0) return null
  declare @randalpha varchar(255)
  declare @counter int
  declare @valpha char(1)
  declare @i int
  set @randalpha = null
  set @counter = 1
  while (@counter <= @len)
    if UPPER(@opt)='U' select @valpha = valphaU from sybsystemprocs.dbo.vudf
    if UPPER(@opt)='L' select @valpha = valphaL from sybsystemprocs.dbo.vudf
    if UPPER(@opt)='P' select @valpha = vprintable from sybsystemprocs.dbo.vudf
    if UPPER(@opt) = 'A'
      select @i = vrand from sybsystemprocs.dbo.vudf
      if @i < 50 begin select @valpha = valphaU from sybsystemprocs.dbo.vudf end else begin select @valpha = valphaL from sybsystemprocs.dbo.vudf end
    if UPPER(@opt) = 'X'
     select @i = vrand from sybsystemprocs.dbo.vudf
     if @i < 50 begin select @valpha = valphaU from sybsystemprocs.dbo.vudf end else begin select @valpha = vrandnum from sybsystemprocs.dbo.vudf end
    select @randalpha = @randalpha || @valpha
    select @counter = @counter + 1
  return substring(@randalpha,1,@len)
grant exec on dbo.sf__randalpha to public
sp_help sf__randalpha

Now if I use the above function I can create all the random strings as per Oracle

1> select
2>   substring(sybsystemprocs.dbo.sf__randalpha('u',10),1,10) as u,
3>   substring(sybsystemprocs.dbo.sf__randalpha('l',10),1,10) as l,
4>   substring(sybsystemprocs.dbo.sf__randalpha('a',10),1,10) as a,
5>   substring(sybsystemprocs.dbo.sf__randalpha('x',10),1,10) as x,
6>   substring(sybsystemprocs.dbo.sf__randalpha('p',10),1,10) as p
7> go
u          l          a          x          p
---------- ---------- ---------- ---------- ----------
GEMIGRGZHV sxttmcwujw nNxJBYefqa W5K2NTHG67 oi'8e&8{GV

Automatic Workload Repository (AWR) Performance Monitoring Tool Basics May 5, 2012

Posted by Mich Talebzadeh in Oracle.
add a comment

What is AWR Starting from Oracle 10g statspack utility has evolved into the Automatic Workload Repository (AWR). AWR is a comprehensive and powerful tool for monitoring Oracle performance. Unlike tracing, AWR is not enabled for the session. It is for the complete database. It also requires diagnostic licence. The AWR collects and stores database statistics relating to problem detection and tuning. AWR is an evolvement for the statspack utility which helps gather database performance statistics. AWR generates snapshots of key matrices such as system and session statistics, cpu, waits, redo, writes and other time-model statistics. AWR stores data in SYSAUX tablespace. The background process MMON is responsible for collecting the statistics in the SGA and saving them to the SYSAUX tablespaces. AWR provides statistics in two formats

  • temporary – in memory collection of statistics in the SGA, accessible via the V$ views
  • persistent – type of performance data in the form of regular AWR snapshots which are accessible via the DBA_ views

It will collect data on the following matrices

  • Base statistics that are also part of the v$SYSSTAT and V$SESSTAT views
  • SQL statistics
  • Database object-usage statistics
  • Time-model statistics
  • Wait statistics
  • ASH (active session history) statistics
  • Operating system statistics

AWR uses the following V$ views to collect statistics

  • v$sys_time_model time model stats (db time, java execution time, pl/sql execution time and others)
  • v$osstat Operating system stats (avg_busy_ticks, avg_idle_ticks, etc)
  • v$service_stats wait statistics ( db cpu, app wait time, user commits, etc)
  • v$sysstat system stats
  • v$sesstat session stats

Database performance stats fall into one of three categories:

  • Cumulative values – collect stats over a period of time from the v$sysstat, etc
  • Metrics – use the collected stats to make some sort of sense.
  • Sampled data – the ASH sampler is used to collect these stats.

Setting up AWR Statistics in Oracle deped on the system parameter statistics_level. By default this parameter is set to typical which collects standard level of statistics. There is more comprehensive level called all that collects execution plans and timing info. Beware that setting this value in production can result in a typical four times overhead. It will show all roll over timings as well (say in the output of autotrace or tkprof) but at a cost of running the query much slower! AWR can be disabled by setting statistics_level to basic. Beware that this disables a lot more than just AWR. To display the current setting for statistics_level do

show parameter statistics_level;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
statistics_level                     string      TYPICAL
-- To set up statistics level to typical do
alter system set statistics_level = typical;
-- To set up statistics level to full do
alter system set statistics_level = all;
-- to diable AWR and many other things
alter system set statistics_level = basic;

Snapshotting Most monitor tools work with intervals and snapshots and AWR is no exception. Oracle provides dbms_workload_repository package to define AWR snapshots. Alternatively you can use Enterprise Manager To display a list of snapshots do

select * from (select snap_id, snap_level, to_char(begin_interval_time, 'dd/mm/yy hh24:mi:ss') AS begin from dba_hist_snapshot order by 1) where rownum <=4;
SNAP_ID SNAP_LEVEL BEGIN ---------- ---------- -----------------       3908          1 28/04/12 08:26:10       3909          1 28/04/12 08:36:28       3910          1 28/04/12 10:00:00       3911          1 28/04/12 11:00:49

To see snapshot interval and retention period do

select  snap_interval
,       retention
from    dba_hist_wr_control;
------------------------------ ------------------------------
+00000 01:00:00.0              +00008 00:00:00.0

You can use modify_snapshot_settings procedurte in dbms_workload_repository package to change the snapshot settings. For example to change the interval to 30minutes you can do

exec dbms_workload_repository.modify_snapshot_settings (interval => 30) ;

To change the retention period to two weeks you can do

exec dbms_workload_repository.modify_snapshot_settings (retention => 14*24*60)

To manually create a snalpshot you can do:

exec dbms_workload_repository.create_snapshot();

To delete snapshots

exec dbms_workload_repository.drop_snapshot_range (low_snap_id => 1077, high_snap_id => 1078);

To create a baseline

exec dbms_workload_repository.create_baseline (start_snap_id => 1110, end_snap_id => 1120, baseline_name => 'Batch baseline');

To delete a basline

exec dbms_workload_repository.drop_baseline (baseline_name => 'Batch baseline', cascade => FALSE);

Generating AWR Reports DBAs can run AWR reports. The most common method of generating AWR report is to run the following piece of sql


For example as user system I run this report


Current Instance ~~~~~~~~~~~~~~~~

DB Id    DB Name      Inst Num Instance ----------- ------------ -------- ------------  2678249386 MYDB                1 mydb

Specify the Report Type ~~~~~~~~~~~~~~~~~~~~~~~ Would you like an HTML report, or a plain text report? Enter 'html' for an HTML report, or 'text' for plain text Defaults to 'html' Enter value for report_type: text

Type Specified:  text

Instances in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DB Id     Inst Num DB Name      Instance     Host ------------ -------- ------------ ------------ ------------ * 2678249386        1 MYDB         mydb         rhes564

Using 2678249386 for database Id Using          1 for instance number

Specify the number of days of snapshots to choose from ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Entering the number of days (n) will result in the most recent (n) days of snapshots being listed.  Pressing  without specifying a number lists all completed snapshots.

Enter value for num_days: 1

Listing the last day's Completed Snapshots

Snap Instance     DB Name        Snap Id    Snap Started    Level ------------ ------------ --------- ------------------ ----- mydb         MYDB              4121 06 May 2012 00:00      1                                4122 06 May 2012 01:00      1                                4123 06 May 2012 02:00      1                                4124 06 May 2012 03:00      1                                4125 06 May 2012 04:00      1                                4126 06 May 2012 05:00      1                                4127 06 May 2012 06:00      1                                4128 06 May 2012 07:00      1                                4129 06 May 2012 08:00      1                                4130 06 May 2012 09:00      1                                4131 06 May 2012 10:00      1                                4132 06 May 2012 11:00      1                                4133 06 May 2012 12:00      1                                4134 06 May 2012 13:00      1
Specify the Begin and End Snapshot Ids ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter value for begin_snap: 4128 Begin Snapshot Id specified: 4128

Enter value for end_snap: 4129 End   Snapshot Id specified: 4129

Specify the Report Name ~~~~~~~~~~~~~~~~~~~~~~~ The default report file name is awrrpt_1_4128_4129.txt.  To use this name, press  to continue, otherwise enter an alternative.

Enter value for report_name:

The report will be generated and in the current working directory.


Get every new post delivered to your Inbox.