jump to navigation

Index Join in Oracle and the equivalent Index Intersection in Sybase ASE September 1, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
add a comment

Both Oracle and Sybase ASE provide multipole access paths to the optimizer. Many people have this perception that for a given table only one index can be used by the optimizer. Although this is a widely expected outcome and it holds true for most cases, under certain conditions where the optimizer decides that it does not need to access the base table, more than one index on the same table can be used to satisfy the query.

Consider a simple case of a select list returning the values for a column or columns based on columns used in a where caluse.  In olther words the select list is entirely satisfied by the columns in the where clause. Let us look at a simple example in each case

Oracle (11gR2)

First create the table:

CREATE TABLE index_joins AS
 (ROWNUM + trunc(dbms_random.value(0,1000))) AS c2,
 LPAD('A',255,'A') AS c3
 LEVEL <=10000;
CREATE UNIQUE INDEX ind_c1 ON index_joins(c1);
CREATE INDEX ind_c2 on index_joins(c2);

Let us query this table to get columns c1 and c2

alter system flush buffer_cache;
alter system flush shared_pool;
set timing on
set autotrace on
SELECT c1, c2
FROM index_joins
where   c1 between 100 and 1000
and     c2 between 50 and 100;

The select list contains column c1 and c2. The index ind_c1 covers c1 column and the index ind_c2 is on c2. These two indexes on index_joins table satisfy the query. Hence there is no need for the optimizer to visit the base table. Let us look at the plan:

| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT   |                  |     4 |    36 |     6  (17)| 00:00:01 |
|*  1 |  VIEW              | index$_join$_001 |     4 |    36 |     6  (17)| 00:00:01 |
|*  2 |   HASH JOIN        |                  |       |       |            |          |
|*  3 |    INDEX RANGE SCAN| IND_C2           |     4 |    36 |     3  (34)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN| IND_C1           |     4 |    36 |     4  (25)| 00:00:01 |
Predicate Information (identified by operation id):
 1 - filter("C2"<=100 AND "C1"<=1000 AND "C1">=100 AND "C2">=50)
 2 - access(ROWID=ROWID)
 3 - access("C2">=50 AND "C2"<=100)
 4 - access("C1">=100 AND "C1"<=1000)

Let us look at the plan outpout. The optimizer does a index range scan of the two indexes extracting the rowid and c1 from index ind_c1 and the rowid and c2 from ind_c2. Then the optimizer performs a hash join between threse two results set on the rowid values. It is doing this because anytime the two rowsources have a rowid in common, that is the interestion that we want and both c1 and c2 values will be sent back as the result of the query.

Sybase ASE (ASE 15.7 ESD 2)

First we create the equivalent table in Sybase

create table index_joins
c1 int not null
, c2 int not null
, c3 varchar(255)
lock datarows
CREATE UNIQUE INDEX ind_c1 ON index_joins(c1)
CREATE INDEX ind_c2 on index_joins(c2)
declare @i int
declare @rows int
set @rows = 100
set @i = 1
set nocount on
while @i <=@rows
 insert into #generator values (@i)
 set @i = @i + 1
set nocount off
insert into
 , rs.id + abs(rs.random % 1000)
 , REPLICATE('A',255)
from    ( select
 @rows * ( g1.id - 1 ) + g2.id AS id
 #generator g1,
 #generator g2
 ) rs
update index statistics index_joins

So that is the equivalent table in Sybase.

Unlike Oracle, the optimizer in Sybase will not consider using two indexes on the same table. The only way one could use two different indexes for the same table in a query is to have an  index intersection plan. Index Intersection is a technique to enable ASE optimizer to use more than one index on a table to satisfy a given query. This should work similar to Oracle “index join” whereby each index satisfies the predicate on the query. The hash join between two indexes results in common Row IDs (RID) to be selected on intersection and base table rows accessed by the intersection RowIDs. In ASE this index interesection has to be enabled explicitly by the following set command:

set index_intersection on

In addition, you need to supply multiscan abstract plan operator m_scan. The query will look like the following code:

set switch on 3604
declare @dbname varchar(30)
set @dbname=db_name()
dbcc cachedataremove(@dbname)
set statement_cache off
set showplan on
set statistics time,io,plancost on
set index_intersection on
SELECT c1, c2
FROM index_joins
where   c1 between 100 and 1000
and     c2 between 50 and 100
plan '(m_scan index_joins)'

The output of the Lava tree operator shows:

==================== Lava Operator Tree ====================
                     (VA = 6)
                     r:0 er:1
                     cpu: 0
              (VA = 5)
              r:0 er:1
              l:4 el:7
              p:0 ep:0
     /                                              \
    HashJoin                                     Restrict
    Inner Join                                   (0)(0)(0)(13)(0)
    (VA = 2)                                     (VA = 4)
    r:0 er:1                                     r:0 er:1
    l:4 el:5
    p:0 ep:0
    bufct: 128
   /                      \                        /
IndexScan               IndexScan               TableScan
ind_c2                  ind_c1                  index_joins
(VA = 0)                (VA = 1)                (VA = 3)
r:0 er:1                r:0 er:900              r:0 er:1
l:2 el:2                l:0 el:3                l:0 el:934
p:2 ep:2                p:0 ep:3                p:0 ep:5
============================================================<br />

The execution plan steps are described as follows:

  1. index scan of ind_c2 to get the RIDs of the qualifying rows for “index_joins.c2 between 50 and 100″
  2. index scan of ind_c1 to get the RIDs of the qualifying rows for “t.index_joins.c1 between 100 and 1000″
  3. does HashJoin between those two sets of RIDs to return the common RID
  4. The stream of unique RIDs is passed to the RID JOIN operator. This operator creates a worktable (aka Dynamic Index) and fills it with a single-column row with each RID.
  5. does a RID Scan on the table index_joins using those RIDs from the worktable and gets back the rows.

Note that there is a subtle difference in behaviour between Oracle and ASE. Oracle does not require to visit the base table. However, ASE will visit the base table to feth the result set for RIDs from the worktable created. This method will also work for cases where the result set includes other columns from the table. for example a select * from the base table will also use this access path in ASE.

Oracle 10g, instance and database diagrams July 23, 2012

Posted by Mich Talebzadeh in Oracle.
add a comment

Use this link for Oracle10gR2 Architecture diagram

Use this link for Oracle10gR2 Instance diagram

Use this link for  Oracle10gR2 database diagram

Understanding AWR Report June 13, 2012

Posted by Mich Talebzadeh in Oracle.

In the post Automatic Workload Repository  (AWR) Performance Monitoring Tool Basics , I described the basic set up and report generation for AWR. In this post we will try to understand the AWR report itself.

Before going further I must emphasise that this report was generated by running a PL/SQL block immediately after the instance was rebooted. The code was  used to simulate a typical OLTP workload with frequent insert/update/deletes and commits. The sample code:

  • Performs checkpoints immediately before and after PL/SQL block
  • Manually takes AWR snapshots before and after running PL/SQL block

The code is shown below

  type ObjIdArray is table of tdash.object_id%TYPE index by binary_integer;
  l_ids objIdArray;
  CURSOR c IS SELECT object_id FROM tdash;
  OPEN c;

      FORALL rs in 1 .. l_ids.COUNT
        UPDATE testwrites
          SET PADDING1 =  RPAD('y',4000,'y')
        WHERE object_id = l_ids(rs);
      FORALL rs in 1 .. l_ids.COUNT
        DELETE FROM testwrites
        WHERE object_id = l_ids(rs);
      FORALL rs in 1 .. l_ids.COUNT
       INSERT INTO testwrites
       SELECT * FROM tdash t WHERE t.object_id = l_ids(rs);
        DBMS_OUTPUT.PUT_LINE('Transaction failed');
  CLOSE c;

The output from the AWR report is shown below.

The snapshot details

This is normally shown at the beginning of the report.

        Snap Id      Snap Time   Sessions Curs/Sess
      --------- -------------- --------   -----
Begin Snap:4413 17-May-12 09:22:34   28   3.0
  End Snap:4414 17-May-12 09:43:35   29   3.3
   Elapsed:          21.02 (mins)
   DB Time:          21.78 (mins)

Note that unlike the hourly usual sample, this sampling took 21 minutes to complete. The snapshots in this case were taken immediately before and after running the PL/SQL block.

Elapsed (time) is just the interval of time between the start and end snapshots. Another important quantity is database DB Time which is the sum of working sessions’ time.  So DB Time =  sum of database CPU time + waits. In systems with multiple concurrent active sessions DB Time can be larger than the elapsed time. This is because DB Time is a sum over all active sessions that are using CPU(s) or waiting for an event. Note that Background processes are not included in that.

Load Profile

Next we will look at the Load Profile in AWR report

Load Profile      Per Second    Per Transaction Per Exec Per Call
~~~~~~~~~~~~   --------------- --------------- --------- -------
DB Time(s):            1.0                0.0       0.02    0.39
DB CPU(s):             0.1                0.0       0.00    0.04
Redo size:    27,284,637.6          916,465.9
Logical reads:    46,156.0            1,550.3
Block changes:    31,214.4            1,048.5
Physical reads:    2,446.5               82.2
Physical writes:   5,627.1              189.0
User calls:            2.7                0.1
Parses:               50.8                1.7
Hard parses:           0.8                0.0
W/A MB processed:      0.1                0.0
Logons:                0.1                0.0
Executes:             62.0                2.1
Rollbacks:             0.0                0.0
Transactions:         29.8

Per Second here means that this measure correlated to 1 second. It is worth looking at Per Second statistics to ensure that the amount of work the instance is doing is reasonable for the tasks it has to perform. For example is doing nearly 30 transactions per second good enough?

To understand what Per Transaction mean, we need to look at another statistic further down the report.

Statistic                     Total     per Second     per Trans
----------------------- ----------- -------------- -------------
user commits                 37,550           29.8           1.0

Per transaction is just per commits. So here we had 37,550 commits. To work out the total Redo size generated, we can take the Redo size throughput per transaction at 916,465.9 bytes and multiply that with the number of commits at 37,550. That will give us the total of 914465.9 * 37550/1024/1024 = 32747.4 MB of redo generated.

Other interesting statistics are Logical reads, Physical reads and Physical writes. These are quoted in blocks. so as before the total values would be 58,213,765 Logical block reads, 3,086,610 Physical block reads and 7,096,950 Physical block writes. Now a useful statistic would be the total amount of writes occurred in MB. Since the server is 8K block then the total amount of writes is (7096950 * 8192)/(1024 * 1024)  = 55,445 MB.

Instance Efficiency Percentages

This part is not that useful. Let us have a look at it

Instance Efficiency Percentages (Target 100%)
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   94.70    In-memory Sort %:  100.00
            Library Hit   %:   95.49        Soft Parse %:   98.41
          Execute to Parse %:   18.06         Latch Hit %:  100.00
Parse CPU to Parse Elapsd %:   59.75     % Non-Parse CPU:   98.91

OK we have 100% targets in few places. On its own this section does not tell us much. Buffet Hit is 94.70% but this is expected as we booted the instance before the test. Then we have Library Hit at 95.49% for the same reason.


Think of Parsing as basically a two step process, that of a Syntax Check to check the validity of the statement and that of a Semantic Check; to ensure the statement can execute properly. The next step in the parse operation is to see if the statement being parsed has already being  processed by another session.  If that is the case then we can skip the next two steps in the process, that of optimization and row source generation.  If we can skip these next two steps in the process, we have done what is known as a Soft Parse, a shorter process to getting our query going.  If we cannot, if we must do all of the steps, we are performing as what is known as a Hard Parse; we must parse, optimize and generate the plan for the query.  Needless to say that Hard Parsing is more CPU intensive than Soft Parsing. Note that we have 59.75% Parse CPU to Parse Elapsed. We are losing a fair bit of time while parsing. However, that should not matter as after reboot shared pool did not contain any useful compiled query and since parsing is such a small fraction of the work being done.

Top 5 Timed Foreground Events

This is probably one of the most important section of AWR report. In a nutshell it is all about wait events. It records the number of waits encountered in the period and the total time spent waiting together with the average time waited for each event. The section is ordered by the percentage  of the total database time that each Event is responsible for.

Top 5 Timed Foreground Events
                                         wait   % DB
Event                   Waits   Time(s)  (ms)   time Wait Class
---------------------------- ----------- ------ ------ ----------
db file sequential read 2,817,676    810   0   62.0 User I/O
DB CPU                               130        9.9
free buffer waits           5,505     70  13    5.3 Configurat
log file sync                 142     46 324    3.5 Commit
log buffer space              371    36   97    2.7 Configurat

Dependent on what is seen in this section, other report sections may need to be referenced in order to quantify or check the findings. For example, the wait count for a particular event needs to be assessed based upon the duration of the reporting period and also the number of active sessions at the time. In other words do stats look OK compared to the reported workload?

In above 62% of database time was spent waiting for db file sequential read, which in this case was about single block index reads. However, the average wait time was zero. Another 9.9% of the time was spent waiting for or using CPU time. A tenth of database time is not bad at all. In general High CPU usage is often a symptom of poorly tuned SQL (excessive logical I/O due to suboptimal search path or excessive physical I/O).

Further down we have 5.3% of database time spent on free buffer waits. Free buffer wait occurs when a user session reads a block from the disk and cannot find a free block in the buffer cache to place it in. One frequently attributed reason is that the DBWR process is not clearing the buffer cache fast enough. However, there can be many reasons and one generic answer will not help. You will need to investiggate the cause or causes of this problem.

The log file sync wait event reports 3.5% of database time. It is triggered when a user session issues a commit (or a rollback). The user session will signal or post the LGWR to write the log buffer to the redo log file. If you look at the sql code above we are issuing frequent commits after processing every 100 rows. Finally, we have Log Buffer Space wait event taking 2.7% of DB time. This event occurs when server processes write data into the log buffer faster than the LGWR process can write it out. Refer to Oracle documentation if you see high readings for this wait event.

The time model statistics

Time model statistics tell you the processing time spent on various matrics during the snapshot interval. You should not expect the % of DB Time to add up to 100% because there is overlap among statistics. For example sql execute elapsed time requires CPU time for sql execution. Note that some statistics such as background elapsed time and background cpu time are shown as well but these are not part of  % of DB Time.

Time Model Statistics                    DB/Inst: MYDB/mydb  Snaps: 4413-4414
-> Total time in database user-calls (DB Time): 1306.8s
-> Statistics including the word "background" measure background process
time, and so do not contribute to the DB time statistic
-> Ordered by % or DB time desc, Statistic name
Statistic Name                          Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time               1,247.7         95.5
DB CPU                                   129.9          9.9
connection management call elapsed time    5.5           .4
parse time elapsed                         4.2           .3
hard parse elapsed time                    3.7           .3
PL/SQL execution elapsed time              1.2           .1
PL/SQL compilation elapsed time            0.4           .0
hard parse (sharing criteria) elapsed time 0.1           .0
repeated bind elapsed time                 0.0           .0
hard parse (bind mismatch) elapsed time    0.0           .0
sequence load elapsed time                 0.0           .0
failed parse elapsed time                  0.0           .0
DB time                                1,306.8
background elapsed time                3,395.9
background cpu time                       19.6

We can go through these figures now. 129.9 seconds CPU time was used for all user sessions. This was just under 10% of database resources. In total there was 1,306.8 seconds database time used. The total wait event time can be calculated as 1,306.8 – 129.9 = 1,176.9 seconds. The lion share of database time (95.5%) was spent on executing sql which is a good sign. The toal parse time was 5.5 dseconds of which 4.2 seconds was hard parsing. The rest of statistics is tiny in this case

Foreground Wait Class

We are familiar with Oracle wait events. In AWR report under Wait Class, wait events (to be precise user generated wait events AKA Foreground wait events) are grouped into classes. These include User I/O, DB CPU, Configuration, Commit, Other, System I/O, Application, Network and Concurrency. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue. The following list includes common examples of the waits in some of the classes:

  • User I/O: Waits for user IO (db file sequential read, db file scattered read, direct path read, direct path write etc)
  • DB CPU: See above for the definition of this class
  • Configuration: Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size etc)
  • Commit: waits for redo log write confirmation after a commit
  • Other: Waits which should not typically occur on a system (for example, ‘wait for EMON to spawn’)
  • System I/O: Waits for background process IO
  • Application: Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)
  • Network: waits for data to be sent over the network
  • Concurrency: Waits for internal database resources (for example, latches)

For other classes  not covered above,  you can get info from Oracle manuals.

Let us look at this section in our AWR report:

                             %Time   Total Wait  wait
Wait Class      Waits        -outs   Time (s)    ms)  %DB time
----------- ---------------- ----- ---------------- -------- -----
User I/O          2,866,247     0     838        0      64.2
DB CPU                                130                9.9
Configuration         5,930     0     110       18       8.4
Commit                  142     0      46      324       3.5
Other                 1,971    96      20       10       1.6
System I/O              417     0       0        0       0.0
Application              88     0       0        0       0.0
Network               2,653     0       0        0       0.0
Concurrency               5     0       0        0       0.0

User I/O taking 64.2% of database time was explained before. The CPU usage was just under 9.9%. Configuration contributed 8.4% of wait time. These are attributed to database settings. We will see the events contributing to this class below.

Foreground Wait Events

-> s  - second, ms - millisecond -    1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0
                              %Time Total Wait  wait  Waits   % DB
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
db file sequential r 2,817,676    0   810       0     75.0   62.0
free buffer waits        5,505    0    70      13      0.1    5.3
log file sync              142    0    46     324      0.0    3.5
log buffer space           371    0    36      97      0.0    2.7
db file scattered r     48,458    0    28       1      1.3    2.1
rdbms ipc reply             10   90    19    1949      0.0    1.5
log file switch             48    0     4      81      0.0     .3

For the sake of brevity I have only included those wait events that contribute to % DB time. db file sequential read and db file scattered read belong to user I/O wait class. log file sync is part of Commit wait class. free buffer waits, log buffer space, rdbms ipc reply and log file switch wait events belong to Configuration wait class.

Background Wait Events

This section of AWR report provides statistics on wait events for Oracle background processes.

Background Wait Events
-> ordered by wait time desc, waits desc (idle events last)
-> Only events with Total Wait Time (s) >= .001 are shown
-> %Timeouts: value of 0 indicates value was < .5%.  Value of null is truly 0
                              %Time Total Wait  wait  Waits   % bg
Event                Waits    -outs   Time (s)  (ms)   /txn   time
-----------------  ------------ ---------- ------- -------- ------
log file parallel write 10,319    0   845      82      0.3   24.9
db file parallel write  43,425    0   207       5      1.2    6.1
direct path write           55    0     2      40      0.0     .1
control file seq re     2,501     0     2       1      0.1     .1

The log file parallel write shows LGWR is waiting for blocks to be written to all online redo log members in one group. LGWR will wait until all blocks have been written to all members.   So here we had 24.9% of backhround total time spent on log file parallel write. The db file parallel write wait event belongs to DBWR process since it is the only process that writes dirty blocks from the SGA to datafiles.  DBWR process compiles a set of dirty blocks, hands the batch over to the OS, and waits on the db file parallel write event for the I/O to complete. The parameter of interest here is Avg wait (ms). In our case it is 5ms which is a perfectably respetable figure. Obviously larger average wait times point to slower I/O subsystem or poor I/O configurations.

Service Level Statistics

The service statistics section gives information about how particular services configured in the database are performing.

    • A service is a grouping of processes
    • Users may be grouped in SYS$USERS
    • A single user (application login) may be grouped with that user name
Service Statistics
-> ordered by DB Time
                                            Physical      Logical
 Service Name   DB Time (s)   DB CPU (s)    Reads (K)    Reads (K)
------------- ------------ ------------ ------------ ------------
SYS$USERS            1,281          129        3,079       58,172
mydb.mich.local         25            1            2           34
SYS$BACKGROUND           0            0            5           14
mydbXDB                  0            0            0            0

Note that SYS$USERS service is a default service for all applications. You can configure different services for different production applications. This will enable DBA to identify the heavy loading services on the system. by grouping services according to the production need. For example services created for batch applications etc.

Service Wait Class Stats
-> Wait Class info for services in the Service Statistics section.
-> Total Waits and Time Waited displayed for the following wait
   classes:  User I/O, Concurrency, Administrative, Network
-> Time Waited (Wt Time) in seconds
Service Name
User I/O  User I/O  Concurcy Concurcy Admin  Admin Network Network
Tot Wts Wt Time   Tot Wts Wt Time Total Wts Wt Time Tot Wts Wt Time
--------- ------- -------- -------- --------- --------- ------- -------
2864754     838      5         0         0         0     796     0
1493          0      0         0         0         0    1857     0
761          4      48         1         0         0       0     0

SQL Section

Any SQL statement appears in the top 5 statements in two or more areas below, then it is a prime candidate for tuning. The sections are:

  • Total Elapsed Time
  • Total CPU Time
  • Total Buffer Gets
  • Total Disk Reads
  • Total Executions
  • Total Parse Calls
  • Total Sharable Memory
  • Total Version Count

Let us try to see what these mean.

Total Elapsed Time

Total Elapsed Time = CPU Time + Wait Time. If a SQL statement appears in the total elapsed time area of the report this means its CPU time plus any other wait times made it pop to the top of the pile. Excessive Elapsed Time could be due to excessive CPU usage or excessive wait times.

This is the area that you need to examine and probably the one that will be reported by the users or application support. From a consumer perspective, the finer details don’t matter. The application is slow. Full stop.

In conjunction with excessive Elapsed time check to see if this piece of SQL is also a high consumer under Total CPU Time. It is normally the case. Otherwise check the wait times and Total Disk Reads. They can either indicate issues with wait times (slow disks, latch gets etc) or too much Physical IO associated with tables scans or sub-optimal indexes.  This section is a gate opener and often you will need to examine other sections.

Total CPU Time

When a statement appears in the Total CPU Time area this indicates it used excessive CPU cycles during its processing. Excessive CPU processing time can be caused by sorting, excessive function usage or long parse times. Indicators that you should be looking at this section for SQL tuning candidates include high CPU percentages in the service section for the service associated with this SQL (a hint, if the SQL is uppercase it probably comes from a user or application; if it is lowercase it usually comes from the internal or background processes). To reduce total CPU time, reduce sorting by using composite indexes that can cover sorting and use bind variables to reduce parse times.

Total Buffer Gets

Total buffer gets mean a SQL statement is reading a lot of data from the db block buffers. Generally speaking buffer gets (AKA logical IO or LIO) are OK, except when they become excessive. The old saying that you reduce the logical IO, because then the physical IO (disk read) will take care of itself holds true. LIO may  have incurred a PIO in order to get the block into the buffer in the first place. Reducing buffer gets is very important and should not be underestimated. To get a block from db block buffers, we have to latch it (i.e. in order to prevent someone from modifying the data structures we are currently reading from the buffer). Although latches are less persistent than locks, a latch is still a serialization device. Serialization devices inhibit scalability, the more you use them, the less concurrency you get. Therefore in most cases optimal buffer gets can result in improved performance. Also note that by lowering buffer gets you will require less CPU usage and less latching. |Thus to reduce excessive buffer gets, optimise SQL to use appropriate indexes and reduce full table scans. You can also look at improving the indexing strategy and consider deploying partitioning (licensed).

Total Disk Reads

High total disk reads mean a SQL statement is reading a lot of data from disks rather than being able to access that data from the db block buffers. High physical reads after a server reboot are expected as the cache is cold and data is fetched from the disk. However, disk reads (or physical reads) are undesirable in an OLTP system, especially when they become excessive. Excessive disk reads do cause performance issues. The usual norm is to increase the db buffer cache to allow more buffers and reduce ageing . Total disk reads are typified by high physical reads, a low buffer cache hit ratio, with high IO wait times. Higher wait times for Disk IO can be associated with a variety of reasons (busy or over saturated SAN, slower underlying storage, low capacity in HBC and other hardware causes). Statistics on IO section in AWR, plus the Oerating System diagnostic tools as simple as iostatcan help in identifying these issues. To reduce excessive disk reads, consider partitioning, use indexes and look at optimizing SQL to avoid excessive full table scans.

Total Executions

High total executions need to be reviewed to see if they are genuine executions or loops in SQL code. I have also seen situations where autosys jobs fire duplicate codes erroneously. In general statements with high numbers of executions usually are being properly reused. However, there is always a chance of unnecessary loop in PL/SQL, Java or C#. Statements with high number of executions, high number of logical and or physical reads are candidates for review to be sure they are not being executed multiple times when a single execution would serve. If the database has excessive physical and logical reads or excessive IO wait times, then look at the SQL statements that show excessive executions and show high physical and logical reads.

Parse Calls

Whenever a statement is issued by a user or process, regardless of whether it is in the SQL pool it undergoes a parse.  As explained under Parsing, the parse can be a hard parse or a soft parse. Excessive parse calls usually go with excessive executions. If the statement is using what are known as unsafe bind variables then the statement will be reparsed each time. If the header parse ratios are low look here and in the version count areas.

Sharable Memory

Sharable Memory refers to Shared Pool memory area in SGA , hence this perticular section in AWR Report states about the SQL STATEMENT CURSORS which consumed the maximum amount of the Shared Pool for their execution.

In general high values for Sharable Memory doesn’t necessary imply there is an issue It simply means that

  1. These SQL statements are big or complex and Oracle has to keep lots of information about these statements OR
  2. big number of child cursors exist for those parent cursors
  3. combination of 1 & 2

In case of point 2, it may be due to poor coding such as bind variables mismatch, security mismatch  or overly large SQL statements that join many tables. In a DSS or  DW environment large complex statements are normal. In an OLTP database large or complex statements are usually the result of over-normalization of the database design, attempts to use an OLTP system as a DW or simply poor coding techniques. Usually large statements will result in excessive parsing, recursion, and large CPU usage.

Version Count

High version counts are usually due to multiple identical-schema databases, unsafe bind variables, or Oracle bugs.

Instance Activity Statistics

This section compares the statistic values of instance activity between the two snapshot sets. For each statistic, the value of the statistic is shown along with the differentials measured by DB time, elapsed time, and per transaction. You have a comprehensive set of statistics here. Use these stats with the other stats

Tablespace IO Stats

This is a useful section in that it shows the tablespace IO activities. Remember the stats here are at the tablespace level.

Tablespace IO Stats
 -> ordered by IOs (Reads + Writes) desc

Tablespace ------------------------------
               Av       Av     Av               Av     Buf  Av Buf
    Reads     Reads/s  Rd(ms) Blks/Rd  Writes Writes/s Waits Wt(ms)
  ---------- ------- ------- -------  --------- ---- -------------
APP1 2,860,666  2,268   0.3     1.1  3,564,644   2,826   0    0.0
UNDOTBS1    90      0   9.8     1.0  1,990,759   1,578   0    0.0
SYSTEM   7,190      6   0.2     1.3        204       0   0    0.0
SYSAUX   1,938      2   0.3     2.1      1,047       1   0    0.0
APP1_HDD   105      0   0.1     1.0        105       0   0    0.0

These stats are very useful in identifying the throughput for each tablespace. For example, for tablespace APP1 we see in total 2,860,666 blocks read in. The throughput for read is 2,268 blocks per second and each read on average fetched 1.1 blocks. They were 3,564,644 block writes in total with a throughput of 2,826 blocks per second.Obviously those throughputs will tell if there has been any sub-optimal performance compared to previous records. If that is the case then you will need to examine the underlying disk sub-systems for the tablespace.

Is there a silver bullet IT Strategy June 4, 2012

Posted by Mich Talebzadeh in Uncategorized.
add a comment

This is about IT strategy and most of the guys that I know have had many years of IT experience. This industry is sharply divided between doers and watchers. The leaders and strategists are the ones that make head waves and usually they know what they want. However, the big obstacles seem to be organizations that we work in.

As a consultant I have been  to many sites that the IT managers and directors have little or no inclination to move IT in a pragmatic way. For most of them the status quo is just fine. In other words they toe the line and they speak the official language. However, as a result the enterprise suffers. I believe there is a culture that values loyalty more than capability. This in turn results in a rigid environment where changes are seldom accepted. Moreover, the IT strategists background is always favored towards a certain product, since, presumably, the manager feels more comfortable with a given product. That is hardly any strategy for progress.

It is also interesting to know that the average employee loyalty to a given firm is two to three years and more often these days people change jobs than before. Any long term vision or strategy is nonexistent chiefly because the IT director has a different agenda for himself/herself, ending up in a lame duck strategy.

I have seen too often that organizations bring in someone (say a new director) from outside (usually the director will tell you that he/she was headhunted for the job!). The pattern is always similar. Let us cut cost, streamline the processes (meaning get rid of few people), bringing in the outsourcing (usually they now few firms in South Asia who they have worked with before) and we can reduce the projects’ lifecycle drastically. There is usually an announcement of some town hall meeting or soap box meeting with video conferencing and large participation by the staff of all type. Mind you the staff were told to attend. Then we have the director giving his/her vision of things to come and IT strategy for the next two years. The figures shown on their favorite Power point projection are impressive. That is until such time that you start asking questions about these figures. The graphs do not add up or skewed too much. The IT director will have difficulty in explaining these figures. Otherwise he/she will tell you that a particular graph is not normalized. You shake your head with disbelief knowing that most of the stuff are downloadable/downloaded from the web. The director has difficulty explaining because he does not believe those figures either. Let us face it, how long it takes to formulate a project strategy. It does not happen overnight, a week, a month or a quarter. I am at loss to understand how these so called IT gurus are going to make it happen? BTW, they are usually pretty good at making cliché type statements. Gone few months you hear that the IT director has left the firm without delivering anything at great cost to the company without any value add. I am at loss to see why this industry is following this trend. Is that because the IT industry unlike some disciplines like civil engineering is still very much evolving? As strategists what are the great unknowns in IT? In my view there is no revolutionary solution. It is only evolutionary. A company’s IT strategy can only evolve through sound leadership and decision making. More importantly the IT strategy has got to be fit for the underlying business.

Use Case Comparison of ORACLE 11gR2 and Sybase ASE 15.7 on Solid State Disks June 2, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
add a comment

Database performance is ultimately constrained by the need to read and write data from a persistent storage. For almost the entire history of Oracle and Sybase, this persistent storage has been provided by the magnetic spinning hard disks (HDD). The relative performance of magnetic disk has been so poor compared to the rest of the hardware, that database performance tuning has focused primarily on minimizing the disk IO. Only now, with the advent of Solid State Disk (SSD) technology and in-memory databases, we are getting to the point of an increase in IO performance to match the improvements we have come to take for granted in CPU and memory access speeds.

Although replacing all the storage component of Oracle and Sybase with SSD is a rather expensive option for many shops, Both databases can benefit from an intelligent use of solid state disks. However, it is not always obvious how best to leverage limited quantities of SSD in order to get the best performance return from the investment.

A client of mine requested me recently to undertake “Use Case Comparison of ORACLE 11G and Sybase ASE 15 “on Solid State Disks. They wanted to improve their system response and both Oracle and Sybase databases were feeding their trading systems. I undertook this study separately for both Oracle 11gR2 and Sybase ASE 15.7 and the outcomes were very interesting. To simplify the matters we separated  read intensive and write intensive activities for both Oracle and Sybase. The results of the studies for read intensive activitiesfor Oracle have now been published in Oracle SELECT journal. SELECT Journal is a quarterly journal of peer-reviewed technical articles and news published by the Independent Oracle Users Group (IOUG). This article was published under title “Use Case Comparison of ORACLE 11g on Solid State Disks (Part One)” . The result of studies for write intensive activities for Oracle 11g will be published shortly

Likewise, the result of studies for read intensive activities for Sybase ASE 15 was published in International Sybase User Group (ISUG) journal in April 2012 edition titled “Use Case Comparison of ASE 15.7 on Solid State Disks [Part 1]“. The result of studies for write intensive activitiesfor Sybase ASE 15 have now been published in June 2012 edition of ISUG journal titled “Use Case Comparison of ASE 15.7 on Solid State Disks [Part 2]“.

A user defined function for Sybase ASE to emulate Oracle’s dbms_random.string() function May 31, 2012

Posted by Mich Talebzadeh in Oracle and Sybase.
add a comment

In Oracle, dbms_random package allows one to generate random data. A very widely used function within this package is dbms_random.string(opt, len) that returns a random string whose length is determined by the len argument.

The returned string’s characteristics are determined by the opt argument. opt must be a char. The following five options are supported:
‘u’ or ‘U’: Upper case alpha characters
‘l’ or ‘L’: Lower case alpha characters
‘a’ or ‘A’: Upper and lower case alpha characters
‘x’ or ‘X’: Upper alpha and numeric characters
‘p’ or ‘P’: Any printable character

An example calling the above function with different options would be

col u format a10
col l format a10
col a format a10
col x format a10
col p format a10

2    dbms_random.string('u', 10) as u,
3    dbms_random.string('l', 10) as l,
4    dbms_random.string('a', 10) as a,
5    dbms_random.string('x', 10) as x,
6    dbms_random.string('p', 10) as p
7  from
8    dual;
U          L          A          X          P
---------- ---------- ---------- ---------- ----------
PSXPJUOMMT byebvyevlb JPRjgoAxOb QM4XU14IPX [`i06!e8m0

In Sybase ASE this random string function does not exist. The problem is that even if you write your own user defined function (UDF) it won’t work. There is currently an issue with calling non deterministic functions (getdate(),rand, rand2, newid(), etc) directly from within an ASE UDF. Basically if I try to do the following:

create function sf_test(@dummy int)
returns int
  declare @v int
  select @v = rand2()  -- calling rand function here
  return @v

I will get the following error:

Msg 11798, Level 15, State 65:
Illegal BUILTIN FUNCTION statement within a scalar SQL function.

The work around is to define these functions in a view and call them in UDF referencing the view (thus avoiding direct references to these functions in UDF).

For example create the view in sybsystemprocs first

create view vudf
, CONVERT(char(1),ABS(CAST(NEWID() AS VARBINARY) %9)) as vrandnum
, CHAR(ASCII('!')+ABS(CAST(NEWID() AS VARBINARY) %95))as vprintable
, FLOOR(RAND2()*100) as vrand
, GETDATE() as vgetdate

This is not ideal but as a work around it can be expanded to provide all the non deterministic functions required. The overhead is maintaining this view in sybsystemprocs. An example calling getdate in udf:

create function sf_getdate(@datein datetime)
returns int
  declare @diff int
  select @diff = datediff(day,@datein,vgetdate)from sybsystemprocs.dbo.vudf
  return @diff
-- run this
select dbo.sf_getdate('Apr 15 2012')

Similar to above and using the view generated we can write a random string function in ASE:

use sybsystemprocs
if exists(select 1 from sysobjects where type = 'V' and name = 'vudf')
drop view vudf
create view vudf
, CONVERT(char(1),ABS(CAST(NEWID() AS VARBINARY) %9)) as vrandnum
, CHAR(ASCII('!')+ABS(CAST(NEWID() AS VARBINARY) %95))as vprintable
, FLOOR(RAND2()*100) as vrand
, GETDATE() as vgetdate
if exists (select 1 from sysobjects where type = 'SF' and name = 'sf__randalpha')
drop function sf__randalpha
create function sf__randalpha (@opt char(1),@len int)
returns varchar(255)
** Name        : sf__randalpha
** Created By  : Mich Talebzadeh
** Date        : 30/05/2012
** Purpose     : The function string returns a string whose length is determined by the len argument.
**               The returned string's characteristics are determined by the opt
**               argument. opt must be a char. The following five options are possible:
**               'u' or 'U': Upper case alpha characters
**               'l' or 'L': Lower case alpha characters
**               'a' or 'A': Upper and lower case alpha characters
**               'x' or 'X': Upper alpha and numeric characters
**               'p' or 'P': Any printable character
**               Examples.
**               declare @c varchar(30)
**               select @c = sybsystemprocs.dbo.sf__randalpha('a',30)
  if UPPER(@opt) NOT IN ('U','L','A','X','P') return null
  if (@len <= 0) return null
  declare @randalpha varchar(255)
  declare @counter int
  declare @valpha char(1)
  declare @i int
  set @randalpha = null
  set @counter = 1
  while (@counter <= @len)
    if UPPER(@opt)='U' select @valpha = valphaU from sybsystemprocs.dbo.vudf
    if UPPER(@opt)='L' select @valpha = valphaL from sybsystemprocs.dbo.vudf
    if UPPER(@opt)='P' select @valpha = vprintable from sybsystemprocs.dbo.vudf
    if UPPER(@opt) = 'A'
      select @i = vrand from sybsystemprocs.dbo.vudf
      if @i < 50 begin select @valpha = valphaU from sybsystemprocs.dbo.vudf end else begin select @valpha = valphaL from sybsystemprocs.dbo.vudf end
    if UPPER(@opt) = 'X'
     select @i = vrand from sybsystemprocs.dbo.vudf
     if @i < 50 begin select @valpha = valphaU from sybsystemprocs.dbo.vudf end else begin select @valpha = vrandnum from sybsystemprocs.dbo.vudf end
    select @randalpha = @randalpha || @valpha
    select @counter = @counter + 1
  return substring(@randalpha,1,@len)
grant exec on dbo.sf__randalpha to public
sp_help sf__randalpha

Now if I use the above function I can create all the random strings as per Oracle

1> select
2>   substring(sybsystemprocs.dbo.sf__randalpha('u',10),1,10) as u,
3>   substring(sybsystemprocs.dbo.sf__randalpha('l',10),1,10) as l,
4>   substring(sybsystemprocs.dbo.sf__randalpha('a',10),1,10) as a,
5>   substring(sybsystemprocs.dbo.sf__randalpha('x',10),1,10) as x,
6>   substring(sybsystemprocs.dbo.sf__randalpha('p',10),1,10) as p
7> go
u          l          a          x          p
---------- ---------- ---------- ---------- ----------
GEMIGRGZHV sxttmcwujw nNxJBYefqa W5K2NTHG67 oi'8e&8{GV

Automatic Workload Repository (AWR) Performance Monitoring Tool Basics May 5, 2012

Posted by Mich Talebzadeh in Oracle.
add a comment

What is AWR

Starting from Oracle 10g statspack utility has evolved into the Automatic Workload Repository (AWR). AWR is a comprehensive and powerful tool for monitoring Oracle performance. Unlike tracing, AWR is not enabled for the session. It is for the complete database. It also requires diagnostic licence. The AWR collects and stores database statistics relating to problem detection and tuning. AWR is an evolvement for the statspack utility which helps gather database performance statistics. AWR generates snapshots of key matrices such as system and session statistics, cpu, waits, redo, writes and other time-model statistics. AWR stores data in SYSAUX tablespace. The background process MMON is responsible for collecting the statistics in the SGA and saving them to the SYSAUX tablespaces.

AWR provides statistics in two formats

  • temporary – in memory collection of statistics in the SGA, accessible via the V$ views
  • persistent – type of performance data in the form of regular AWR snapshots which are accessible via the DBA_ views

It will collect data on the following matrices

  • Base statistics that are also part of the v$SYSSTAT and V$SESSTAT views
  • SQL statistics
  • Database object-usage statistics
  • Time-model statistics
  • Wait statistics
  • ASH (active session history) statistics
  • Operating system statistics

AWR uses the following V$ views to collect statistics

  • v$sys_time_model time model stats (db time, java execution time, pl/sql execution time and others)
  • v$osstat Operating system stats (avg_busy_ticks, avg_idle_ticks, etc)
  • v$service_stats wait statistics ( db cpu, app wait time, user commits, etc)
  • v$sysstat system stats
  • v$sesstat session stats

Database performance stats fall into one of three categories:

  • Cumulative values – collect stats over a period of time from the v$sysstat, etc
  • Metrics – use the collected stats to make some sort of sense.
  • Sampled data – the ASH sampler is used to collect these stats.

Setting up AWR

Statistics in Oracle deped on the system parameter statistics_level. By default this parameter is set to typical which collects standard level of statistics. There is more comprehensive level called all that collects execution plans and timing info. Beware that setting this value in production can result is a typical four times overhead. It will show all roll over timings as well (say in the output of autotrace or tkprof) but at a cost of running the query much slower! AWR can be disabled by setting statistics_level to basic. Beware that this disables a lot more than just AWR.

To display the current setting for statistics_level do

show parameter statistics_level;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
statistics_level                     string      TYPICAL
-- To set up statistics level to typical do
alter system set statistics_level = typical;
-- To set up statistics level to full do
alter system set statistics_level = all;
-- to diable AWR and many other things
alter system set statistics_level = basic;


Most monitor tools work with intervals and snapshots and AWR is no exception. Oracle provides dbms_workload_repository package to define AWR snapshots. Alternatively you can use Enterprise Manager

To display a list of snapshots do

select * from (select snap_id, snap_level, to_char(begin_interval_time, 'dd/mm/yy hh24:mi:ss') AS begin from dba_hist_snapshot order by 1) where rownum <=4;
SNAP_ID SNAP_LEVEL BEGIN ---------- ---------- -----------------       3908          1 28/04/12 08:26:10       3909          1 28/04/12 08:36:28       3910          1 28/04/12 10:00:00       3911          1 28/04/12 11:00:49

To see snapshot interval and retention period do

select  snap_interval
,       retention
from    dba_hist_wr_control;
------------------------------ ------------------------------
+00000 01:00:00.0              +00008 00:00:00.0

You can use modify_snapshot_settings procedurte in dbms_workload_repository package to change the snapshot settings. For example to change the interval to 30minutes you can do

exec dbms_workload_repository.modify_snapshot_settings (interval => 30) ;

To change the retention period to two weeks you can do

exec dbms_workload_repository.modify_snapshot_settings (retention => 14*24*60)

To manually create a snalpshot you can do:

exec dbms_workload_repository.create_snapshot();

To delete snapshots

exec dbms_workload_repository.drop_snapshot_range (low_snap_id => 1077, high_snap_id => 1078);

To create a baseline

exec dbms_workload_repository.create_baseline (start_snap_id => 1110, end_snap_id => 1120, baseline_name => 'Batch baseline');

To delete a basline

exec dbms_workload_repository.drop_baseline (baseline_name => 'Batch baseline', cascade => FALSE);

Generating AWR Reports

DBAs can run AWR reports. The most common method of generating AWR report is to run the following piece of sql


For example as user system I run this report


Current Instance ~~~~~~~~~~~~~~~~

DB Id    DB Name      Inst Num Instance ----------- ------------ -------- ------------  2678249386 MYDB                1 mydb

Specify the Report Type ~~~~~~~~~~~~~~~~~~~~~~~ Would you like an HTML report, or a plain text report? Enter 'html' for an HTML report, or 'text' for plain text Defaults to 'html' Enter value for report_type: text

Type Specified:  text

Instances in this Workload Repository schema ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DB Id     Inst Num DB Name      Instance     Host ------------ -------- ------------ ------------ ------------ * 2678249386        1 MYDB         mydb         rhes564

Using 2678249386 for database Id Using          1 for instance number

Specify the number of days of snapshots to choose from ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Entering the number of days (n) will result in the most recent (n) days of snapshots being listed.  Pressing <return> without specifying a number lists all completed snapshots.

Enter value for num_days: 1

Listing the last day's Completed Snapshots

Snap Instance     DB Name        Snap Id    Snap Started    Level ------------ ------------ --------- ------------------ ----- mydb         MYDB              4121 06 May 2012 00:00      1                                4122 06 May 2012 01:00      1                                4123 06 May 2012 02:00      1                                4124 06 May 2012 03:00      1                                4125 06 May 2012 04:00      1                                4126 06 May 2012 05:00      1                                4127 06 May 2012 06:00      1                                4128 06 May 2012 07:00      1                                4129 06 May 2012 08:00      1                                4130 06 May 2012 09:00      1                                4131 06 May 2012 10:00      1                                4132 06 May 2012 11:00      1                                4133 06 May 2012 12:00      1                                4134 06 May 2012 13:00      1
Specify the Begin and End Snapshot Ids ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter value for begin_snap: 4128 Begin Snapshot Id specified: 4128

Enter value for end_snap: 4129 End   Snapshot Id specified: 4129

Specify the Report Name ~~~~~~~~~~~~~~~~~~~~~~~ The default report file name is awrrpt_1_4128_4129.txt.  To use this name, press <return> to continue, otherwise enter an alternative.

Enter value for report_name:

The report will be generated and in the current working directory.

SAP ERP and SAP HANA April 19, 2012

Posted by Mich Talebzadeh in SAP.
add a comment

SAP ERP – SAP Enterprise Resource Planning  (started as an invoicing/ledger system) comes in different flavours including one for Small to Medium Enterprises (SME). It has been in operation for many years. For backend it can use its own propriety database called  MaxDB, that SAP ships. However, customer may prefer to use another database such as Oracle or Sybase ASE. SAP will be phasing out MaxDB and replacing that with Sybase ASE.

As I understand, as of now the way it achieves this so called data agnosticism is by sticking to portability aspects of SQL. SAP ERP uses a high level programming language called ABAP http://en.wikipedia.org/wiki/ABAP . There is a layer in Netweaver (the engine that executes ABAP) which translates the data access portion of ABAP into pure SQL and executes this against whatever database is serving up as data store. So that translation could be for transact SQL or Oracle SQL or PL/SQL. To achieve this coupling, the database itself does not use any stored procedure or anything proprietary,  so it makes it easier for SAP to install the server/database itself as customer wants (Oracle/ASE etc) as part of SAP ERP package. The traditional design is based on 2-tier architecture but this could be 3-tier of course (the database running on another host).

SAP – HANA – SAP (High Performance Analytic Appliance) is a combined hardware and software kit. SAP HANA enables immediate access, model and analyze all of the transactional and analytical data *in memory* from multiple sources on this appliance/kit.

To achieve this SAP uses hardware from HP and probably others and uses the large amount of memory available  to create an in-memory analytics data store. The technology used deploys columnar database (similar to Sybase IQ), compression and the usual parallel processing. This in effect allows SAP to provide up-to-date report in near real-time mode to the users by eliminating disk usage. For hardware it uses HP ProLiant DL580 G7 and DL980 G7 servers that provides up to 2TB of memory in a single server. To achieve the work load HANA deploys an in-memory computing engine and a data calculation engine.

As I mentionedabove, HANA will act as the in-memory repository for many SAP products including feeds from SAP ERP, SAP Business Information Warehouse (BW) and potentially from any other source. If you look at HANA technical white papers from https://www.experiencesaphana.com/community/blogs/blog/2012/04/12/sap-hana-technical-overview-an-entry-point-to-our-revolutionary-chapter ,you will see that for real time replication service say from SAP ERP databases (log based ones) it will use Sybase replication server. However, IMO Sybase replication server needs to go faster in order to deliver data to HANA IMDB. For example it needs to eliminate or replace all the disk based stable devices. HANA IMDB is also tuned for SAP Business Objects BI solutions. Now where does Sybase IQ fit here? Well if needed data can be flushed out from IMDB to Sybase IQ for persistent storage as an option. That could be on the roadmap.

HANA can handle both row and columnar data and it is indeed very interesting to have an optimizer that can handle row based implementation of relation model (row store) plus column based implementation of relation model (column store).

At this time I don’t have much of a view on HANA as I am still very much in learning mode regarding what it really is and what it really can do. In many ways it seems similar to Oracle Exalytics, though there are also significant differences. For example, HANA is currently only offered as an appliance, while Oracle TimesTen IMDB (and EssBase) can be purchased and used on regular hardware (though the TimesTen Exalytics specific features can only be used on Oracle Exalytics). Also, at least at the moment, it seems that SAP are mostly positioning HANA to work with other SAP products not as a more general purpose thing. Oracle Exalytics on the other hand can inter-operate with many different data sources from different vendors and Oracle TimesTen is also pretty open in that regard.

Looking at Oracle’s performance with SQL Trace and TKPROF April 9, 2012

Posted by Mich Talebzadeh in Oracle.
1 comment so far

The SQL Trace facility and TKPROF (ToolKit Profiler) are two comprehensive and powerful performance diagnostic and summarising tools that can be used to identify the performance issues with Oracle code. Unlike AUTOTRACE that was covered here, SQL Trace covers PL/SQL blocks and can be used for all sorts of performance investigation.

There are a number of methods to trace oracle sessions so I am not going to cover them here. Rather, we will focus using SQL Trace within a session. For the examples here I will use DBMS_MONITOR package


  • DBMS_MONITOR package once installed, its execution is granted to DBA user account.
  • The DBA user account can grant execution on the DBMS_MONITOR package to other users. For example grant execute on DBMS_MONITOR to sh;
  • DBA user account can trace it for other sessions for example EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(session_id => 51, serial_num => 60, waits => TRUE);

Tagging and identifying trace files

Trace output is written to the database’s trace directory that can be identified by

SHOW PARAMETERS user_dump_dest

The default name for a trace files is in the format INSTANCE_PID_ora_TRACEID.trc where:

  • INSTANCE is the name of the Oracle instance
  • PID is the operating system process ID from V$PROCESS.SPID
  • TRACEID is a character string of your choosing. This can be set via ALTER SESSION SET TRACEFILE_IDENTIFIER = “some identifer”
  • trc is the file extension

SQL Trace for a simple query

Let us try the query that I used to illustrate SQL tracing with AUTOTRACE here. The code for creating table t1 and populating it is shown below

LPAD('A',255,'A') c3

Note that column k1 is monolithically increasing number. Column c depends on k1. When k1 = 1 then it is otherwise it is zero. Column lpad is just padding. In this case for column c the first entry would be 1 and the rest would be zero. This will create a highly skewed column.  We will then use DBMS_MONITOR.SESSION_TRACE_ENABLE procedure to diagonise the query (note waits=>true  means that we want wait information to be displayed) and we tag it as test_t1_with_trace:

SELECT k1, c FROM t1 WHERE c <> 0;

Once the code is run the trace file will look something like below

-rw-r----- 1 oracle dba 13450 Apr  9 18:38 mydb_ora_18420_test_t1_with_trace.trc

It is a detailed file and large. One can use Ttkprof utility (lowercase in UNIX/Linux, located in $ORACLE_HOME/bin) to summarize the TRACE output. A simple output in this case excluding system calls will be sufficient

tkprof  <TRACE_FILE> <OUTPUT_FILE> sys=no
tkprof mydb_ora_18420_test_t1_with_trace.trc mydb_ora_18420_test_t1_with_trace.txt sys=no

And it returned

K1          C
---------- ----------
1          1

You can use tkprof with sort option that sorts traced SQL statements in descending order of specified sort options. These are:

PRSCNT  Number of times parsed
PRSCPU  CPU time spent parsing
PRSELA  Elapsed time spent parsing
PRSDSK  Number of physical reads from disk during parse
PRSMIS  Number of consistent mode block reads during parse
PRSCU   Number of current mode block reads during parse
PRSMIS  Number of library cache misses during parse
EXECNT  Number of executes
EXECPU  CPU time spent executing
EXEELA  Elapsed time spent executing
EXEDSK  Number of physical reads from disk during execute
EXEQRY  Number of consistent mode block reads during execute
EXECU   Number of current mode block reads during execute
EXEROW  Number of rows processed during execute
EXEMIS  Number of library cache misses during execute
FCHCNT  Number of fetches
FCHCPU  CPU time spent fetching
FCHELA  Elapsed time spent fetching
FCHDSK  Number of physical reads from disk during fetch
FCHQRY  Number of consistent mode block reads during fetch
FCHCU   Number of current mode block reads during fetch
FCHROW  Number of rows fetched

A simple example with sort would be

tkprof mydb_ora_18420_test_t1_with_trace.trc mydb_ora_18420_test_t1_with_trace.txt sys=no sort = "(PRSDSK, EXEDSK, FCHDSK)"

The interest would be in getting various matrices for the query

SELECT k1, c
t1 WHERE c 0
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.08          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.17       0.35       3911       3930          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.18       0.44       3911       3930          0           1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 97

Rows     Row Source Operation
-------  ---------------------------------------------------
1  VIEW  index$_join$_001 (cr=3930 pr=3911 pw=0 time=0 us cost=6388 size=8 card=1)
1   HASH JOIN  (cr=3930 pr=3911 pw=0 time=0 us)
1    INDEX FAST FULL SCAN IND_T1_C (cr=1826 pr=1818 pw=0 time=0 us cost=2276 size=8 card=1)(object id 87301)
1000000    INDEX FAST FULL SCAN IND_T1_K1 (cr=2104 pr=2093 pw=0 time=622503 us cost=2621 size=8 card=1)(object id 87300)

Misses in library cache during parse: 1 indicates that this was the first time we ran this query (after a reboot). The Row Source Operations provide the number of rows processed for each operation executed on the rows and additional row source information. There is an index fast full scan of the unique index IND_T1_K1 hashing it as build stream with 1,000,000 resultset, followed by index fast full scan  of IND_T1_C returning one row that satifies the filter c != 0. The resultset from scanning this index is chosen as probe stream.  Index fast full scan of IND_T1_K1 cost pr=2,093 physical block reads taking time=622,503 microseconds. The consistent reads were cr=2,104 (logical block reads). The optimizer estimated the costing for this operation to be cost=2,621 units returning size=8 bytes of data and cardinality card=1 rows.  The scan of IND_T1_C cost 1,818 physical blocks reads taking 0 microseconds. The consistent reads were 1,826 logical block reads. The optimizer costing estimate was 2,276 units returning 8 bytes of data and 1 row.

It is important to note that the execution plan has a hierarchical structure, and parents I/O, cost, timing etc. is a sum of corresponding metrics for all its children plus the metrics of the operation itself. In the above, the hash join operation is the parent operation and the two index fast full scans are the children operations. The children statistics are rolled into the parent’s statistics. Turning to hash join operation we see a total of cr= 2104 + 1826 = 3930 pr= 2093 + 1818 = 3911. There was no physical block writes to disks pr=0 as expected. Previously we mentioned that the children statistics are rolled into the parent’s statistics. However, with default statistics=typical, the rowsource/plan execution statistics are disabled (I will cover this later). Hence that time=0 does not mean that  no time was spent on hash join operation.

That one row was returned in one execution of the SQL statement. To retrieve that one row it cost 0.18 sec CPU time and 0.44 sec elapsed time. There were 3,930 queries (logical block I/Os) that included 3,911 physical block reads. Note the CPU time and elapsed time differences.

Further we will need to investigate the cause of this elapsed time. Oracle Wait Event section in TKPROF report shows

Elapsed times include waiting on following events:

Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
SQL*Net message to client                       2        0.00          0.00
Disk file operations I/O                        3        0.00          0.00
db file sequential read                         2        0.00          0.00
db file scattered read                         62        0.00          0.12
SQL*Net message from client                     2        0.00          0.00

To retrieve those records in total they were 62 db file scattered read Physical I/O taking 0.12 sec (note Physical I/O not blocks).

db file scattered read happens when multiple I/O blocks are read and scattered into the buffer cache. These buffer caches may or may not be contigious. This could be as a result of table scan or fast full scan of an index (as in this case). Oracle records the time using the wait event facility and gives the system call a standard name of db file scattered reads. There were no appreciable waits  for other events.

Before moving to the second example, it is worthwhile to mention wait events in Oracle.

Oracle Wait events

When Oracle executes an SQL statement, it is not constantly executing. Sometimes it has to wait for a specific event to happen before it can proceed. For example, if Oracle (or the SQL statement) wants to read data, and the corresponding database block is not currently in the database buffer cache, Oracle waits for this block to be read from the underlying storage.

When a server process asks the operating system for data blocks (block I/O) and places them into the buffer cache, Oracle records the time using the wait event facility and gives the system call a standard name depending on the nature of block I/O.  As a result, three wait events namely; db file sequential read, db file scattered read and direct path read are prominent dealing with block I/O, hence we will explain them here.

db file sequential read

db file sequential read is usually a single block IO, it is an indexed read and not a result of a full scan.  It is also possible to see sequential reads for more than one block access. Single block I/Os go to the mid point insertion.

db file scattered read

db file scattered read happens when multiple I/O blocks are read and scattered into the buffer cache. These buffer caches may or may not be contigious. This could be as a result of table scan or fast full scan of an index. Oracle records the time using the wait event facility and gives the system call a standard name of db file scattered reads.

Direct path read

To read large tables Oracle deploys another strategy to have the server process read the blocks, and process them in its own PGA memory, as opposed to placing them into the buffer cache. This is deemed to be quicker for this one process because buffer cache management overheads like finding a free buffer and latching it does not occur. There is also another dimension to this. In a multi-user environment you ideally do not want to fill the buffer cache with blocks from large tables. For these reasons Oracle can deploy direct path read to satisfy the need of a given session.

The execution engine decides whether a scattered read (multi-block IO call with the results stored in Oracle’s buffer cache) or a direct path read (multi-block IO call with the results processed in the server process’s PGA memory) is performed.

 SQL Trace for a PL/SQL block

In the example below we will use SQL Trace and tkprof output for diagonising the performance of a PL/SQL block. What we are trying to establish here is the waits caused by reading data from the underlying disk.

We start with a table of random data.  We will call this table tdash and we ensure that this table has enough rows to provide meaningful test values. The structure of this table is as follows and is based on all_objects table:


This table is populated with 1,729,204 rows.  It is created on an 8K database and in order to ensure that each  record in the table is larger than a block, two columns PADDING1 VARCHAR2(4000) and PADDING2 VARCHAR2(4000) were added  and populated.

For the first part of the test, we create a primary key on object_id column of tdash

ALTER TABLE tdash ADD CONSTRAINT tdash_pk PRIMARY KEY (object_id);

To progress this further we will need a mechanism to read rows of tdash table into memory. It is also important that the way we do it is best done via a construct in memory.  In other words, we need to loop over the index key values of table tdash, match them and read them into buffer cache. Ideally an associative array could help resolve this issue. So the approach would be to deploy an associative array to load rows from all_objects table into memory (to be precise into PGA) and then go through every row of tdash and read each row into buffer cache:

type array is table of tdash%ROWTYPE index by binary_integer;
l_data array;
l_rec tdash%rowtype;
,RPAD('*',4000,'*') AS PADDING1
,RPAD('*',4000,'*') AS PADDING2


FOR rs IN 1 .. l_data.count
SELECT * INTO l_rec FROM tdash WHERE object_id =

Note the enabling of trace and wait events:


Take note of PLAN_STAT=>’ALL_EXECUTIONS’ in the above line. As of 11g, the procedures in dbms_monitor  accept an additional parameter: PLAN_STAT.  This parameter is used to specify the frequency at which the row source statistics (i.e. information about execution plans) are written to trace files. The accepted values are the following (the default value is NULL):

  • NEVER – no information about the execution plan is written in trace files.
  • FIRST_EXECUTION (equivalent to NULL) – information about the execution plan is written just after the first execution.
  • ALL_EXECUTIONS – information about the execution plan is written for every execution.

So for every execution in the loop the detailed statistics will be written to the trace file.

Back to the code, this is a random read via index scan on object_id column of tdash. The stats about the table and its primary key are as follows:

OBJECT       OBJECT_ID       SIZE/MB   Blocks     ROWS/BLOCK
---------    ----------      -------   ---------  ----------
TDASH             84459      15,240    1,943,824        0.9
TDASH_PK          84460          31        3,968      431.0

Tkprof will be used to summarize the TRACE output. Again a simple output in this case excluding system calls will be sufficient


The interest would be in getting various matrices for the loop in the PL/SQL block. These are shown below:


call     count       cpu    elapsed       disk      query    current        rows
---- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0         62          0           0
Execute  72186      0.31       1.04          0         40          0           0
Fetch    72186      0.67     131.70      81364     368663          0       71726
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   144373      0.99     132.74      81364     368765          0       71726

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 93     (recursive depth: 1)
Rows     Row Source Operation
-------  ---------------------------------------------------
1  TABLE ACCESS BY INDEX ROWID TDASH (cr=5 pr=11 pw=0 time=0 us cost=3 size=4155 card=1)
1   INDEX UNIQUE SCAN TDASH_PK (cr=3 pr=6 pw=0 time=0 us cost=2 size=0 card=1)(object id 84460)

Misses in library cache during parse: 1 indicates that this was the first time we ran this query (after a reboot). The Row Source Operation output means a typical cost to retrieve one record from table tdash, using unique index scan was 11 physical blocks which included 6 physical blocks from the unique index on OBJECT_ID column. It is important to note that the execution plan has a hierarchical structure, and parents I/O, cost, timing etc. is a sum of corresponding metrics for all its children plus the metrics of the operation itself. In the above, the TABLE ACCESS BY INDEX ROWID operation is the parent operation and the INDEX UNIQUE SCANis the child operation. The child’s statistics are rolled into the parent’s statistics.

Those 71,726 rows were returned in 72,186 executions of the SQL statement. To retrieve those 71,726 rows it cost 0.99 sec CPU time and 132.74 sec elapsed time. There were 368,765 queries (logical block I/Os) that included 81,364 physical block reads. Note the CPU time and elapsed time differences

Further we will need to investigate the cause of this elapsed time. Oracle Wait Event section in tkprof report shows

Elapsed times include waiting on following events:

Event waited on                             Times   Max. Wait  Total Waited
----------------------------------------   Waited  ----------  ------------
Disk file operations I/O                        3        0.00          0.00
db file sequential read                     12471        0.15         72.45
db file scattered read                      14587        0.11         56.53

So to retrieve those records in total they were 12,471 db file sequential read Physical I/O taking 72.45 sec (note Physical I/O not blocks). Additionally, there were 14,587 db file scattered read Physical I/O taking 56.53 sec in total.

Filtering the TRACE file on the primary key Waits (object_id = 84460), we get 38 physical I/Os of which 21 were db file scattered read and the remainder was db file sequential read. For the table itself (object_id = 84459), the total waits were 27,022 of which 14,566 were due to db file scattered read and 12,456 were db file sequential read. Note the smaller number of physical I/Os attributed to the primary key

To make this clearer we will look at a typical db scattered file read for table in the TRACE file:

WAIT #13: nam='db file scattered read' ela= 8033 file#=11 block#=1386920 blocks=3 obj#=84459 tim=1328711960280846

What this is telling us is that it took ela = 8033 microseconds elapsed time to fetch blocks = 3 blocks in one physical I/O from the table.

Looking at Oracle’s performance with SQL*Plus and AUTOTRACE April 6, 2012

Posted by Mich Talebzadeh in Oracle.
add a comment

Understanding the Cost Based Optimizer’s behaviour and query performance is crucial for delivering quality applications with acceptable response time. Like it or not in most cases performance is taken as deployment issue and perhaps that is a reason why a lot of efforts is usually spent in production to get the performance under control.

All database engines provide means to measure the SQL performance. SQL performance is usually one of the most crucial factors in determining the quality of applications that use databases as their backend.

We will therefore look at the methods that we can deploy in Oracle to understand the response of the query.

Ignoring other tools at our disposal we will first take a look at AUTOTRACE. My test bed is Oracle 11gR2


AUTOTRACE is a utility switch within SQL*Plus that shows the plan of the queries being executed and the resources being used. First AUTOTRACE utility needs to be set up.

A typical approach would be:

cd $ORACLE_HOME/rdbms/admin
log into SQL*Plus as SYSTEM
Run @utlxplan

By making the PLAN_TABLE table PUBLIC, you let anyone trace using SQL*Plus (which is fine in my view). This prevents the overhead of each user having to install his or her own plan table. The alternative is  to run @utlxplan in every schema that will use AUTOTRACE.

The next step is creating and granting the PLUSTRACE role:

cd $ORACLE_HOME/sqlplus/admin
Run @plustrce

So far so good. In a nutshell what does AUTOTRACE provide? AUTOTRACE allows one to a get a report on the execution path used by the SQL optimizer and the statement execution statistics. The report is generated after successful SQL DQ (SELECT) and SQL DML (INSERT, UPDATE, DELETE  and MERGE) statements. It is useful for monitoring and tuning the performance of typical queries in Oracle.

We can control the depth of query report by setting the AUTOTRACE with different switches. Examples below show:

SET AUTOTRACE OFF -- No AUTOTRACE report is generated. This is the default.

SET AUTOTRACE ON EXPLAIN -- The AUTOTRACE report shows only the optimizer execution path.

SET AUTOTRACE ON STATISTICS -- The AUTOTRACE report shows only the SQL statement execution statistics.

SET AUTOTRACE ON -- The AUTOTRACE report includes both the optimizer execution path and the SQL statement execution statistics.

SET AUTOTRACE TRACEONLY -- Similar to SET AUTOTRACE ON, but it suppresses the printing of the user’s query output, if any.

It is important to note that AUTOTRACE shows the execution plan for a single SQL statement. A PL/SQL block does not have an execution plan.  If it contains one or more SQL statements, then each of those will have an execution plan.

Understanding AUTOTRACE Output applied to the case of Oracle use of index with NOT EQUALS

To understand how we can utilise AUTOTRACE, we start with a simple table comprising three columns and we will test whether Oracle uses index with NOT EQUALS.  We will call this table t1 and we ensure that this table has enough rows to provide meaningful test values. The structure of this table is as follows :

LPAD('A',255,'A') c3
-- Create a unique index on column k1
CREATE UNIQUE INDEX ind_t1_k1 ON t1(k1);
-- Create an index on column c
CREATE INDEX ind_t1_c on t1(c);
-- Now that the table is ready with a million rows, update stats

Note that column k1 is monolithically increasing number. Column “c” depends on k1. When k1 = 1 then it is otherwise it is zero. Column lpad is just padding. In this case for column c the first entry would be 1 and the rest would be zero. This will create a highly skewed column.

Let us try to run a simple query as follows:

SELECT k1, c FROM t1 WHERE c <> 0;

The execution plan from AUTOTRACE is shown below

System altered.
K1          C
---------- ----------
1          1
Elapsed: 00:00:00.29
Execution Plan
Plan hash value: 523648936
| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT       |                  |     1 |     8 |  6388   (1)| 00:01:17 |
|   1 |  VIEW                  | index$_join$_001 |     1 |     8 |  6388   (1)| 00:01:17 |
|*  2 |   HASH JOIN            |                  |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN| IND_T1_C         |     1 |     8 |  2276   (1)| 00:00:28 |
|   4 |    INDEX FAST FULL SCAN| IND_T1_K1        |     1 |     8 |  2621   (1)| 00:00:32 |
Predicate Information (identified by operation id):
2 - access(ROWID=ROWID)
3 - filter("C"<>0)
1  recursive calls
0  db block gets
3930  consistent gets
3911  physical reads
0  redo size
586  bytes sent via SQL*Net to client
523  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

Explain Plan should be interpreted as a tree using the indentation to identify parent/child relationships. Thus it has a hierarchical structure, and the Bytes, Cost, Time etc. is a sum of corresponding metrics for all its children plus the metrics of the operation itself. In each tree level, it shows the following:

  • Id is the Operation identifier
  • Operation is the type of operation perfomed by the execution engine
  • Name is the name of the operation
  • Rows is the estimated number of rows processed
  • Bytes is the estimated amount of data (in bytes) returned by the operation
  • Cost is the estimated cost of the operation. This is a unitless number used by the optimizer. %CPU is percent share of the Cost attributed to CPU
  • Time is the estimated total elapsed time spent processing the operation. It is in mm:ss:millisecond

Predicate Information is pretty self explanatory

Statistics are as follows:

  • recursive calls is the number of SQL calls that are generated in User and System levels on behalf of the main SQL. Mostly these are internally generated SQL calls by Oracle
  • db block gets shows data from blocks read as-of-now (Current data). It seems Oracle uses db block get only for fetching internal information such as reading segment header information for a table in FULL TABLE SCAN.
  • consistent gets is to do with read consistency in Oracle. Oracle must provide a *read consistent* set of data to the user. This is to ensure that, unless the update is committed, any user who queries the table’s data, see only the original data value and not the updated one (uncommitted update). For this, when required, Oracle takes the original values of the changed data from the UNDO segment and unchanged data (un-updated rows) from the  buffer cache to generate the full set of output.
  • physical reads is the number of physical block reads of buffers from disk. Not to be confused with Physical IO
  • redo size is the redo Log generated sized in bytes.
  • bytes sent via SQL*Net to client is the total amount of data in bytes sent to the client over the network
  • bytes received via SQL*Net from client is the total amount of data in bytes received from the client over the network
  • SQL*Net roundtrips to/from client is total number of messages sent to and received from the client via network
  • sorts (memory) is the number of sorts that didn’t write to disk
  • sorts (disk) is the number of sorts that has written at least once to disk
  • rows processed is the number of rows processed

SET TIMING ON shows the elapsed time taken by the query.It displays elapsed time in: Hours:Minutes:Seconds.Milliseconds

Now back to the example above, the query took 290 milliseconds to run. Looking at the excecution plan we see two  INDEX FAST FULL SCAN on both indexes followed by a hash join. One should note that the optimizer can use an index with not equals,  but it would not in general and it would not make sense in general unless the data was so massively skewed (as is the case with column c).  Perhaps it would be easier to point out that the optimizer will not use an index range scan for a not equals. In all cases with not equals, if the optimizer decides to use an index , then it will do an index fast full scan. With an index fast full scan, the execution engine reads the index as if it were a table using multiblock IO, does not read it in sorted order, just reads the blocks as it hits them. In our example, the execution engine does a scan of the two indexes extracting the row IDs (ROWID) from both indexes. The leaf level of index  stores the index key value and the ROWID. In this case ROWID from both indexes will be the same.  Then it can get the result by doing a hash join between these two result sets on the ROWID values because any time the two rows have a ROWID in common, that points to the data row. The base table blocks thus need not be touched. The important point to note is that this requires the optimizer to be able to establish that every column in the query can be found in indexes on the table and the cost of index join would be cheaper than using the leading index and getting data from the data blocks.

As an example if we look at at the operation identified by Id = 4 and INDEX FAST FULL SCAN, we can see that the operation was done on index  IND_T1_K1 which returned 1 row with 8 bytes. The optimizer costing was 2621 units, of which 1% was attributed to CPU usage. It took 320 milliseconds to perform this operation.

In-memory databases: Oracle TimesTen vs. SAP/Sybase ASE March 30, 2012

Posted by Mich Talebzadeh in In-Memory databases.
add a comment

This is two parts article by me published in techtarget website http://searchoracle.techtarget.com/. Part one covers Oracle TimesTen and part 2 covers ASE-IMDB



You need to be registered to log in to techtarget website.


Get every new post delivered to your Inbox.