Jagjeet's Oracle Blog!

May 13, 2012

Sqlplus Dashboard For RAC

I have been passionate to work with CLI due to flexibility, lightweight and easy access. Long back, I had an idea to write a sqlplus script to provide a dashboard kind of utility but could never figure out the auto refresh functionality until unless Tanel & Adrian provided moats.

I got the Idea and solution of auto refresh functionality in sqlplus. Big thanks to great Tanel & Adrian. Majority of my time I spent doing bench-marking on Exadata, so I decided to write a sqlplus dashboard kind of utility for RAC, idea was to get some critical metrics for performance benchmarks on one screen and have a automatic refresh capability ( similar EM “TOP Activity” page )

It took almost 3 weeks for me to write this utility from scratch, It tells you info about Exadata smart scan/storage index saving/Flash cache/ CPU usage/TOP sqls/waits/Cluster interconnect traffic/Top segments by GC* waits and a lot more info. which can be useful to monitor system performance. Based on current formatting, It can support up to 8 node cluster information easily.
I tried my best to make this utility lightweight by reading data from gv$ views once and store in plsql/collections for subsequent requests, Could not find session in EM top activity page running this utility. As of today, it will only work with 11g RAC instances or single instance with gv$* views [catclust.sql].

here is video link… video quality isn’t that great. Must be seen in HD mode.

I have tried to explain about metrics below.


############################################
-- Below section shows metrics instance wise
############################################


+Inst-------+CPUIDL%--IO%-USR%--SYS%+--Tprse/s--+Hprse/s+--PhyWIO/s-+-PhyWMB/s-+--PhyRIO/s--+-PhyRMB/s-+-SessLIO/s--+---Exec/s-+RedoMB/s+Commit/s+-ExSSMB/s-+-ExSIMB/s+-ExFCRh/s+
|AIA1       | 67.8     0  31.2    .7|       2714|      0|       2623|        21|        1183|         9|      370001|      8471|      11|    2001|         0|        0|      980|
|AIA2       | 68.2     0  31.1    .6|       2709|      0|       2491|        22|         928|         7|      340040|      8609|      11|    2004|         0|        0|      703|
+-----------+-----------------------+-----------+-------+-----------+----------+------------+----------+------------+----------+--------+--------+----------+---------+---------+


+Inst-------+ <-- Instance name     +CPUIDL%--IO%-USR%--SYS%+ 
|AIA1       |                       | 67.8     0  31.2    .7| -- This sections shows CPU usage IDLE% IO%, USER% SYS% 
|AIA2       |                       | 68.2     0  31.1    .6| -- For Exadata, we may not see IO Usage as Grid disks exposed to ASM only 
+-----------+                       +-----------------------+




+--Tprse/s--+Hprse/s+--PhyWIO/s-+-PhyWMB/s-+--PhyRIO/s--+-PhyRMB/s-+-SessLIO/s--+
|       2714|      0|       2623|        21|        1183|         9|      370001|
|       2709|      0|       2491|        22|         928|         7|      340040|
+-----------+-------+-----------+----------+------------+----------+------------+

--  Tprse/s   : Total parse per second
--  Hprse/s   : Hard parse per second
--  PhyWIO/s  : Physical Write IO per second
--  PhyWMB/s  : Physical Write MB per second                                                         
--  PhyRIO/s  : Physical Read IO per second								
--  PhyRMB/s  : Physical Read MB per second								
--  SessLIO/s : Session Logical IO per second								
		


+---Exec/s-+RedoMB/s+Commit/s+
|      8471|      11|    2001|
|      8609|      11|    2004|
+----------+--------+--------+

-- Exec/s   : Total Executions for the Instance per second
-- RedoMB/s : Redo Generation MB per second 
-- Commit/s : User commit per second






-- Exadata Related Stats
+-ExSSMB/s-+-ExSIMB/s+-ExFCRh/s+
|         0|        0|      980|
|         0|        0|      703|
+----------+---------+---------+

-- ExSSMB/s  : cell physical IO interconnect bytes returned by smart scan MB per second 
-- ExSIMB/s  : cell physical IO bytes saved by storage index MB per second
-- ExFCRh/s  : cell flash cache read hits per second 

--     Note : Above snapshot was taken for OLTP kind of workload, hence no 
--            smartscan/storage index was in picture, only Flash Cache

 
#########################################
-- Below section shows Cluster Level info
#########################################

+IMPACT%-+--TOP WAIT EVENTS-----------------------+-WAIT CLASS--------+ 
|  51.2% | transaction                            | Other             | 
|  28.2% | enq: TX - index contention             | Concurrency       | 
|   8.7% | gc buffer busy acquire                 | Cluster           | 
|     5% | cell single block physical read        | User I/O          | 
|   2.2% | gc current grant busy                  | Cluster           | 
+--------+----------------------------------------+-------------------+ 

 Above sections shows TOP 5 wait events for the last sample [ default 6 second ] 	

-- IMPACT%    :  Impact was calulcated based on AWR formula, Time spent on individual wait event / Total Wait Time * 100
-- WAIT EVENT :  Name of the Wait Event 
-- WAIT CLASS :  Name of the Wait Class

  
 * * Cluster Level info * *
+IMPACT%-+ TOP SQLS -(child)-+-TOP SESSIONS-------INST:SID----------------------------------------+
|  26.5% | bk2mk4dh179y7(2)  | 1:1043, 1:1174, 1:1239, 1:1304, ......                             |
|  17.6% | bk2mk4dh179y7(0)  | 1:1045, 1:1109, 1:1110, 1:1174, .......                            |
|  11.8% | 7jjp97nb9h2up(1)  | 2:1048, 2:1109, 2:1112, 2:1174,                                    |
|   7.5% | 0r5xv5d42p3p6(2)  | 1:653, 1:979, 2:1047, 2:1501, 2:588, 2:68  .....                   |
|   7.3% | apgu0yan9pb6m(5)  | 1:653, 1:979, 2:1047, 2:1501, 2:588, 2:68  ......                  |
+--------+-------------------+--------------------------------------------------------------------+

 Above sections shows TOP 5 SQLs for the last sample [ default 6 second ] 	

-- IMPACT%    :  Impact was calculated based on AWR formula, Time spent on individual sql / Total Wait Time * 100
-- TOP SQL - (child) : Sql id along with child info
-- INST:SID   :  Instance ID and SID for the session running sql, due to space limitation, only few sessions will shown






   * * Cluster Level info * *

           -------+                                 ACTIVE SESSIONS GRAPH                              +----
     Active   375 |                                                                                    | 375
   Sessions   349 |                                                                                    | 349
              322 |                            #                    #                                  | 322
              295 |                            #   # ##       #   # #                                  | 295
              268 |                            #   # ##       #   # #                  #          #_288| 268
              242 |                            #   # ###  #   #   # #                 ### #       #    | 242
              215 |                            #   # ###  #   #   # #                 #####       #    | 215
              188 |                            #   # ###  #   #   # #                 #####  #    #    | 188
              161 |                           ##################  #########     ##    #####  ######    | 161
              134 |                           #############################     #### ##############    | 134
              109 |                           ##############################    ###################    | 109
     	       81 |                           ##############################  #####################    | 81
               24 |             -MAX CPUs [24]-####################################################    | 24
                1 |                           #####################################################    | 1
                0 +------------------------------------------------------------------------------------+ 0
		  ^ 01:42:22                     01:49:02 ^                             01:55:42 ^



-- Above sections shows total active sessions across the cluster, time window calculated based on sample time [6 second default ]
-- total 80 samples are printed in graph, one line represents one sample time. Active Session range will change automatically
-- based on active sessions and graph will adjust accordingly. Idea was to have some trend about the active session history.
-- MAX CPU will also be shown to have some idea about Active sessions Vs CPUs available.

-- Note : if value falls in between 2 range buckets then it will be shown in front of lower value, for example - in above screenshot 
--        current active sessions were 288 and it was displayed in front of 268, same applies to MAX CPU info as well
 

   * * Instance level info * *

    |  Global  |  Global  | Estd.     | 
    |  Cache   |  Cache   | Intercnt  | 
Inst|  Blocks  |  Blocks  | Traffic   | 
  ID|  Sent/s  |  Rcvd/s  | MB/s      | 
+---+----------+----------+-----------+ 
  1 |     10831|     10188|      173.3| 
  2 |     10650|     10586|        175| 
                                        

-- Above sections shows Global Cache info instance wise

-- InstID : Instance ID no.
-- GC Blocks Sents/s :  Global cache blocks sent by instance per second
-- GC Blocks Rcvd/s  :  Global cache blocks received by instance per second
-- Estd. Interconnect Traffic MB :  Estimated traffic MB for interconnect by instance





 * * Cluster Level info * *
                             
 | TOP Segments by GC*     |    
 | Waits                   |    
 | IMPACT% [Type:Segment]  |   
 +-------------------------+    
 | 40.8% IND:REFERENCE_INSTANCE 
 | 23.3% IND:MEDIATOR_INSTANCE  
 | 17.4% IND:REFERENCE_INSTANCE 
 |  5.8% IND:S_DOCK_TXN_LOG_P1  
    4.2% IND:COMPOSITE_INSTANCE                                                                         

-- TOP Segments based on all Global Cache* events, Impact is calculated based on GC* waits on individual
-- Segment / Total GC* waits * 100





  * * Cluster level info * * 

+--SqlID--------+--SqlText----------------------------------------------------------------------------+-LongstDur-+-InstCnt-+-Cnt-+--CPU%--+--CONC%--+--CLUS%--+-IO%-+---ReadMb-+
| 0r5xv5d42p3p6 | INSERT INTO REFERENCE_INSTANCE (ID, BINDING_TYPE, REFERENCE_NAME, UPDATED_TIME, PRO |           |    2    |  12 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------+-----------+---------+-----+--------+---------+---------+-----+----------+
| bk2mk4dh179y7 | select VALUE from XREF_DATA  where XREF_COLUMN_NAME = :1  and XREF_TABLE_NAME = :2  | 00:00:09  |    2    |  46 |    59% |     3%  |      0  |   4%|       227|
+---------------+-------------------------------------------------------------------------------------+-----------+---------+-----+--------+---------+---------+-----+----------+
| 7jjp97nb9h2up | INSERT INTO COMPOSITE_INSTANCE (ID, COMPOSITE_DN, CONVERSATION_ID, UPDATED_BY, PARE |           |    2    |  14 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------|-----------+---------+-----+--------+---------+---------+-----+----------+
| apgu0yan9pb6m | insert into  XREF_DATA  ( XREF_TABLE_NAME, XREF_COLUMN_NAME, ROW_NUMBER, VALUE, IS  |           |    2    |  14 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------|-----------+---------+-----+--------+---------+---------+-----+----------+

-- Above sections shows cummulative info for the TOP Sqls 

-- LongstDur : if multiple sessions are running same sql, this column would tell timing for the first query, longest duration.
-- InstCnt   : Total no. of Instance running same sql at time of sample [ including parallel and non-parallel query ]
-- Cnt       : Total sessions running same query across all the instances [ including parallel and non-parallel ] 

-- CPU%  : %time spent on CPU
-- CONC% : %time spent on Concurrency
-- CLU%  : %time spent on Cluster waits
-- IO%   : %time spent on IO

--ReadMB : Total physical read mb by the query, cummulative for all running queries across the cluster

-- Note: Most of the columns are fetched from gV$sql_Monitor, will only be populated if query running more than 6 seconds on CPU/IO


Code can be downloaded here.

Please do not forget to share your suggestions/feedback. Thanks –

Note: This utility uses ASH and v$sql_monitor which requires diagnostics and tuning pack !

December 12, 2011

Library cache: mutex X

Filed under: Oracle 11g — Tags: , , — Jagjeet Singh @ 5:24 pm

Recently, I was engaged in Exadata POV for benchmarking for Ebiz R12 on Exa* boxes. We had quarter rack running 11.2.0 db with 2 node RAC Cluster (24 CPU) and all other components/Middle Teir including Form, Web, CM were configured on Exalogic Server. Objective was to achieve best possible scalability numbers on Exa* boxes for few Ebiz modules.

Both boxes were connected with Infiniband network, Test scenarios were given for Order2Cash and Self-services modules. Idea was to utilized Load runner to generate the load using few hundred users. After struggling a while we could streamlined our testing scripts and now were ready to focus on performance.

We had several hiccups while running performance tests but issues were eroded by following generic guidelines on performance tuning, we were doing good progress until we stuck with “library cache: mutex X” issues which was kind of road block for us. Each time we touch 600 concurrent users for Order 2 Cash, we had high waits on mutex issue, CPU utilization were constantly touching 100%.  Have a look to below screenshot, most of the sessions were waiting on concurrency.

One of my colleague immediately pointed out about known bug in Oracle 11g for mutex issues, We could find several hits for the bug but apply a patch was not an option due to conflict with

Exadata Bundle Patch. Latest BP did not solve the issue and we were left out to do something by own ( if we can ).

Plan B – as always, Google is the only option for people like me ( may not be the good solution for production like instances ), I was lucky to came across with http://andreynikolaev.wordpress.com/2011/05/01/divide-and-conquer-the-true-mutex-contention/    Andrey S. Nikolaev has a great post sharing his experience to handle mutex issues. I was excited to see if out database is also suffering with same issues.

I used script from his post “library_cache_objects_heavily_accessed.sql” and below was output from our DB.

Select * from (
Select case when (kglhdadr = kglhdpar) then 'Parent' else 'Child '||kglobt09 end cursor,
kglhdadr ADDRESS,substr(kglnaobj,1,20) name, kglnahsh hash_value,kglobtyd type,kglobt23 LOCKED_TOTAL,kglobt24 PINNED_TOTAL,kglhdexc EXECUTIONS,kglhdnsp NAMESPACE
from x$kglob
order by kglobt24 desc)
where rownum <= 20
/



ADDRESS          NAME                 HASH_VALUE TYPE    Locked    Pinned  Executed  Nsp
---------------- -------------------- ---------- ------- ------ --------- --------- ----
0000000547522910 IGI_GEN_VERT          729966110 PACKAGE ######  10501405         0    1
0000000557C98F28 MO_GLOBAL            3732637996 PACKAGE ######   7413354     30388    1
000000054FF4ABA8 IGI_GEN_VERT         1541933742 PACKAGE   2617   5250401         0    2
                                                  BODY

000000054F798C58 MO_GLOBAL            3743427754 PACKAGE  13501   5069333  31299942    2
                                                  BODY

000000054B903E78 OE_MSG_PUB            247466199 PACKAGE ######   1338812      2692    1
000000054BBBFD70 OE_SYS_PARAMETERS    2653012139 PACKAGE ######   1132706     71580    1
000000054F381F10 OE_OE_FORM_LINE      3689477926 PACKAGE ######   1015546         0    1
000000054BE04E50 DBMS_UTILITY         1023687597 PACKAGE  12098   1011637         0    1
000000055773BBA0 SELECT PROCESS_ENABL 1617001630 CURSOR     326   1007630   1007478    0
000000055B047128 DBMS_UTILITY         1295727999 PACKAGE   3748   1003292  45896957    2
                                                  BODY

000000054B904C58 OE_DEBUG_PUB         2263617911 PACKAGE ######    958636         0    1
000000054B904A18 OE_SYS_PARAMETERS_PV 2314508377 PACKAGE    953    950874         0    1
00000005472CDE20 OE_SYS_PARAMETERS    2476839672 PACKAGE    953    950874   8680985    2
                                                  BODY

00000005472CD068 OE_SYS_PARAMETERS_PV 2626471498 PACKAGE    953    950874   8862791    2
                                                  BODY

00000005472C2DC8 OE_MSG_PUB           1210870744 PACKAGE    635    937875   3267991    2
                                                  BODY

I was interested to *PINNED* column, it tells high number of calls for IGI_GEN_VERT and MO_GLOBAL packages.By default, Oracle creates multiple copies of object based on CPU Core. But still this could lead to contention if you have highly concurrent users using same package. I came to know about undocumented dbms_shared_pool.markhot() procedure Markhot procedure creates multiple copies of object in shared pool which can eliminate the contention.

Since, it was not an production instance, I decided to give it a try by marking both packages as hot.

exec dbms_shared_pool.markhot('APPS','MO_GLOBAL',1);
exec dbms_shared_pool.markhot('APPS','MO_GLOBAL',2);
exec dbms_shared_pool.markhot(hash=>3732637996,NAMESPACE=>0);

exec dbms_shared_pool.markhot('APPS','MO_GLOBAL',1);
exec dbms_shared_pool.markhot('APPS','MO_GLOBAL',2);
exec dbms_shared_pool.markhot(hash=>3743427754,NAMESPACE=>0);

exec dbms_shared_pool.markhot('APPS','IGI_GEN_VERT',1);
exec dbms_shared_pool.markhot('APPS','IGI_GEN_VERT',2);
exec dbms_shared_pool.markhot(hash=>729966110,NAMESPACE=>0);

exec dbms_shared_pool.markhot('APPS','IGI_GEN_VERT',1);
exec dbms_shared_pool.markhot('APPS','IGI_GEN_VERT',2);
exec dbms_shared_pool.markhot(hash=>1541933742,NAMESPACE=>0);

Markhot procedure needs to be called for Package and Package body individually, and after that

Wolla .. mutex X waits were reduced to 3.3%..


Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                           33,055          84.1
library cache: mutex X              209,094       1,307      6    3.3 Concurrenc
SQL*Net more data from client     5,516,878       1,220      0    3.1 Network
enq: UL - contention                     97         521   5367    1.3 Applicatio
log file sync                        72,433         514      7    1.3 Commit

For me- that was a lifesaver and it reduced the contetntion, Thanks again to Andrey S. Nikolaev

October 3, 2009

How to identify tracefile in 11g

Filed under: Oracle 11g — Tags: , — Jagjeet Singh @ 11:36 am


It’s quite often that DBA needs to enable tracing for a particular session to diagnose the performance or any other issue, which create trace file(s) in user_dump_directory. Usually this directory contains hundreds of files on a medium/large user base system, DBA needs to identify the OS process id to identify the trace file.


Naming Convention of user trace file is <<instance_name>>_ora_<<process id>>.trc under user_dump_directory. Oracle also provide option to use tracefile_identifire which can be utilized to make file name unique to identify easily.


Tracefile identifire will be appended in the name of file.In 11g, It has become very easy to identify the trace file directly from v$process view. Column named “Tracefile” has been added to this view.


SQL> select tracefile from v$process join v$session on (addr=paddr)
2 where sid = 76;

TRACEFILE
——————————————————————-
/u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817.trc

SID 76 is my session id.

After setting Tracefile Identifier.

SQL> alter session set tracefile_identifier='JS';

Session altered.

SQL> select tracefile from v$process join v$session on (addr=paddr)
2 where sys_context(‘userenv’,’sessionid’) = audsid
3 /

TRACEFILE
——————————————————————-
/u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817_JS.trc


Note – I have removed SID and using sys_context… function to identify the current session. SID is useful to identify end-user’s sessions.


This column will always be populated with name of trace file even there has been no file generated.

SQL> ! ls -l /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817_JS.trc
ls: /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817_JS.trc: No such file or directory


After enabling trace.

SQL> alter session set sql_trace=true;

Session altered.

SQL> !ls -l /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817_JS.trc
-rw-r----- 1 oracle dba 1101 Oct 3 07:10 /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_27817_JS.trc


before 11g, below query can be utilized to identify the trace file.


SQL> select u_dump.value || '/' || instance.value || '_ora_' || v$process.spid
|| nvl2(v$process.traceid, '_' || v$process.traceid, null ) || '.trc'"Trace File"
from V$PARAMETER u_dump
cross join V$PARAMETER instance
cross join V$PROCESS
join V$SESSION on v$process.addr = V$SESSION.paddr
where u_dump.name = 'user_dump_dest'
and instance.name = 'instance_name'
and V$SESSION.audsid=sys_context('userenv','sessionid');

Create a free website or blog at WordPress.com.