Tuesday, July 8, 2014

DB File Sequential Read wait event is High on Oracle Database - Performance Engineering to Find out where is Bottleneck?

Found one interesting problem during performance testing , wanted to share here.

Observed there is a drop on server throughput, Correlated Load runner Throughput graph with Average Response time graph:

From the above Graph for the first 25mins ART was so high and that time throughput was low , after that both the graphs are stable.

So what happened on first 25mins of the load test , Lets go and do some analysis on server.

So we should start checking the servers ( web/app/DB ) to find out where is the problem , most likely throughput related issues occurs from DB side , lets start investigating from Data base server first.

Started checking with Database first and observed there is a high user I/O activity during high response time (low throughput).
So from OEM console, observed high user I/O.

So we can see the problematic SQL, which is making high User I/O.




So click on above SQL ID and see what event is making this high user I/O ( so by now at least we know which SQL is the culprit)



So above graph shows DB file sequential read event is more  , so get either AWR or ADDM report or Talk to DBA to know why this is so high?

So when I generate ADDM report for Oracle recommendations below is what I got.

The performance of some data and temp files was significantly worse than others. If striping all files using the SAME methodology is not possible, consider striping these file over multiple disks.

For file +DATA_XXXXXXXX/datafile/XXXX_tab_04.dbf, the average response time for single block reads was 184 milliseconds, and the total excess I/O wait was 4156 seconds.

Below would the mitigations we can follow to overcome this issue:

DB File Sequential Read wait event occurs when we are trying to access data using index and oracle is waiting for the read of index block from disk to buffer cache to complete.  A sequential read is a single-block read.Single block I/Os are usually the result of using indexes. Rarely, full table scan calls could get truncated to a single block call due to extent boundaries, or buffers already present in the buffer cache.Db file sequential read wait events may also appear when undo blocks are read from disk in order to provide a consistent get(rarely).

To determine the actual object being waited can be checked by the p1, p2, p3 info in v$session_wait .  A sequential read is usually a single-block read, although it is possible to see sequential reads for more than one block (See P3). This wait may also be seen for reads from datafile headers (P2 indicates a file header read) ,where p1,p2 and p3 gives the the absolute file number ,the block being read ,and  the number of blocks (i.e, P3 should be 1) respectively. 

Block reads are fairly inevitable so the aim should be to minimise un-necessary IO. This is best achieved by good application design and efficient execution plans. Changes to execution plans can yield orders of magnitude changes in performance.Hence to reduce this wait event follow the below points .

1.)
 Tune Oracle - tuning SQL statements to reduce unnecessary I/O request is the only guaranteed way to reduce "db file sequential read" wait time.
2.)
 Tune Physical Devices - Distribute(stripe) the data on diferent disk to reduce the i/o . Logical distribution is useless. "Physical" I/O performance is only governed by "independency of devices".
3.)
 Faster Disk - Buy the faster disk to reduce the unnecessary I/O request .
4.)
 Increase db_block_buffers - A larger buffer cache can (not will, "might") help.


Hope This helps :) 



Get CPU utilization of Linux based machine using 

Vmstat.


Generally time stamp may not be available by default for vmstat command in Linux ( check if vmstat -t works ?).

Check the below command for monitoring CPU usage of your server with time stamp using vmstat.

vmstat 60 60 | awk '{now=strftime("%Y-%m-%d %T "); print now,100-$15}' >CPUUsage.txt

60 60 -> Every 60 secs, one sample will be taken and total 60 times reading will be taken
now -> variable where we are storing the time
print now -> Time Stamp will be printed along with vmstat output
$15 -> 15th Column output ( which is ID value on vmstat , idle time, so 100-idle time with give non-idle time of CPU).


if you want to copy the output to a separate text file using (  >CPUUsage.txt ) any file name.  so output will be available at CPUUsage.txt text file.

Output:

$ vmstat 60 60 | awk '{now=strftime("%Y-%m-%d %T "); print now,100-$15}'
2014-06-18 11:43:10  13
2014-06-18 11:43:10  10
2014-06-18 11:43:10  8
2014-06-18 11:44:10  12

So here first column is for time stamp and second column is for CPU utilization , take this values and draw a graph from XL :) happy learning.

How to monitor Linux server memory usage by using 

vmstat?


During performance testing monitoring, we are supposed to check how memory is being utilized for your Linux server, but how?
If you can use simple vmstat (have a look at attached SS). You may not get accurate used memory utilization, if you use vmstat –s will facilities this need but we need to iterate this to get memory usage during entire tenure of your performance tests.


Below small shell script will print the used memory along with time stamp.

for i in {1..120}
do
vmstat -s | sed "s/$/$(date)/" | awk '/used memory/'
sleep 30
done


Note: this will take the reading for every 30 seconds and  for 120 times.
If you want to print the output into a file use the below command.
./sampleloop.sh > UsedMemory1.txt
Where sampleloop.sh is your shell script name (copy past the above code in the file named as sampleloop.sh) and UsedMemory1.txt is the text file where you want to print the output.

Sample Output:
      1674660  used memoryWed Jun 18 09:52:46 UTC 2014
      1504104  used memoryWed Jun 18 09:53:16 UTC 2014
      1623140  used memoryWed Jun 18 09:53:46 UTC 2014
      1675956  used memoryWed Jun 18 09:54:16 UTC 2014
      1675624  used memoryWed Jun 18 09:54:46 UTC 2014
      1591800  used memoryWed Jun 18 09:55:16 UTC 2014
      1676364  used memoryWed Jun 18 09:55:46 UTC 2014
      1685188  used memoryWed Jun 18 09:56:16 UTC 2014
      1558236  used memoryWed Jun 18 09:56:46 UTC 2014