Jagjeet's Oracle Blog!

May 13, 2012

Sqlplus Dashboard For RAC

I have been passionate to work with CLI due to flexibility, lightweight and easy access. Long back, I had an idea to write a sqlplus script to provide a dashboard kind of utility but could never figure out the auto refresh functionality until unless Tanel & Adrian provided moats.

I got the Idea and solution of auto refresh functionality in sqlplus. Big thanks to great Tanel & Adrian. Majority of my time I spent doing bench-marking on Exadata, so I decided to write a sqlplus dashboard kind of utility for RAC, idea was to get some critical metrics for performance benchmarks on one screen and have a automatic refresh capability ( similar EM “TOP Activity” page )

It took almost 3 weeks for me to write this utility from scratch, It tells you info about Exadata smart scan/storage index saving/Flash cache/ CPU usage/TOP sqls/waits/Cluster interconnect traffic/Top segments by GC* waits and a lot more info. which can be useful to monitor system performance. Based on current formatting, It can support up to 8 node cluster information easily.
I tried my best to make this utility lightweight by reading data from gv$ views once and store in plsql/collections for subsequent requests, Could not find session in EM top activity page running this utility. As of today, it will only work with 11g RAC instances or single instance with gv$* views [catclust.sql].

here is video link… video quality isn’t that great. Must be seen in HD mode.

I have tried to explain about metrics below.


############################################
-- Below section shows metrics instance wise
############################################


+Inst-------+CPUIDL%--IO%-USR%--SYS%+--Tprse/s--+Hprse/s+--PhyWIO/s-+-PhyWMB/s-+--PhyRIO/s--+-PhyRMB/s-+-SessLIO/s--+---Exec/s-+RedoMB/s+Commit/s+-ExSSMB/s-+-ExSIMB/s+-ExFCRh/s+
|AIA1       | 67.8     0  31.2    .7|       2714|      0|       2623|        21|        1183|         9|      370001|      8471|      11|    2001|         0|        0|      980|
|AIA2       | 68.2     0  31.1    .6|       2709|      0|       2491|        22|         928|         7|      340040|      8609|      11|    2004|         0|        0|      703|
+-----------+-----------------------+-----------+-------+-----------+----------+------------+----------+------------+----------+--------+--------+----------+---------+---------+


+Inst-------+ <-- Instance name     +CPUIDL%--IO%-USR%--SYS%+ 
|AIA1       |                       | 67.8     0  31.2    .7| -- This sections shows CPU usage IDLE% IO%, USER% SYS% 
|AIA2       |                       | 68.2     0  31.1    .6| -- For Exadata, we may not see IO Usage as Grid disks exposed to ASM only 
+-----------+                       +-----------------------+




+--Tprse/s--+Hprse/s+--PhyWIO/s-+-PhyWMB/s-+--PhyRIO/s--+-PhyRMB/s-+-SessLIO/s--+
|       2714|      0|       2623|        21|        1183|         9|      370001|
|       2709|      0|       2491|        22|         928|         7|      340040|
+-----------+-------+-----------+----------+------------+----------+------------+

--  Tprse/s   : Total parse per second
--  Hprse/s   : Hard parse per second
--  PhyWIO/s  : Physical Write IO per second
--  PhyWMB/s  : Physical Write MB per second                                                         
--  PhyRIO/s  : Physical Read IO per second								
--  PhyRMB/s  : Physical Read MB per second								
--  SessLIO/s : Session Logical IO per second								
		


+---Exec/s-+RedoMB/s+Commit/s+
|      8471|      11|    2001|
|      8609|      11|    2004|
+----------+--------+--------+

-- Exec/s   : Total Executions for the Instance per second
-- RedoMB/s : Redo Generation MB per second 
-- Commit/s : User commit per second






-- Exadata Related Stats
+-ExSSMB/s-+-ExSIMB/s+-ExFCRh/s+
|         0|        0|      980|
|         0|        0|      703|
+----------+---------+---------+

-- ExSSMB/s  : cell physical IO interconnect bytes returned by smart scan MB per second 
-- ExSIMB/s  : cell physical IO bytes saved by storage index MB per second
-- ExFCRh/s  : cell flash cache read hits per second 

--     Note : Above snapshot was taken for OLTP kind of workload, hence no 
--            smartscan/storage index was in picture, only Flash Cache

 
#########################################
-- Below section shows Cluster Level info
#########################################

+IMPACT%-+--TOP WAIT EVENTS-----------------------+-WAIT CLASS--------+ 
|  51.2% | transaction                            | Other             | 
|  28.2% | enq: TX - index contention             | Concurrency       | 
|   8.7% | gc buffer busy acquire                 | Cluster           | 
|     5% | cell single block physical read        | User I/O          | 
|   2.2% | gc current grant busy                  | Cluster           | 
+--------+----------------------------------------+-------------------+ 

 Above sections shows TOP 5 wait events for the last sample [ default 6 second ] 	

-- IMPACT%    :  Impact was calulcated based on AWR formula, Time spent on individual wait event / Total Wait Time * 100
-- WAIT EVENT :  Name of the Wait Event 
-- WAIT CLASS :  Name of the Wait Class

  
 * * Cluster Level info * *
+IMPACT%-+ TOP SQLS -(child)-+-TOP SESSIONS-------INST:SID----------------------------------------+
|  26.5% | bk2mk4dh179y7(2)  | 1:1043, 1:1174, 1:1239, 1:1304, ......                             |
|  17.6% | bk2mk4dh179y7(0)  | 1:1045, 1:1109, 1:1110, 1:1174, .......                            |
|  11.8% | 7jjp97nb9h2up(1)  | 2:1048, 2:1109, 2:1112, 2:1174,                                    |
|   7.5% | 0r5xv5d42p3p6(2)  | 1:653, 1:979, 2:1047, 2:1501, 2:588, 2:68  .....                   |
|   7.3% | apgu0yan9pb6m(5)  | 1:653, 1:979, 2:1047, 2:1501, 2:588, 2:68  ......                  |
+--------+-------------------+--------------------------------------------------------------------+

 Above sections shows TOP 5 SQLs for the last sample [ default 6 second ] 	

-- IMPACT%    :  Impact was calculated based on AWR formula, Time spent on individual sql / Total Wait Time * 100
-- TOP SQL - (child) : Sql id along with child info
-- INST:SID   :  Instance ID and SID for the session running sql, due to space limitation, only few sessions will shown






   * * Cluster Level info * *

           -------+                                 ACTIVE SESSIONS GRAPH                              +----
     Active   375 |                                                                                    | 375
   Sessions   349 |                                                                                    | 349
              322 |                            #                    #                                  | 322
              295 |                            #   # ##       #   # #                                  | 295
              268 |                            #   # ##       #   # #                  #          #_288| 268
              242 |                            #   # ###  #   #   # #                 ### #       #    | 242
              215 |                            #   # ###  #   #   # #                 #####       #    | 215
              188 |                            #   # ###  #   #   # #                 #####  #    #    | 188
              161 |                           ##################  #########     ##    #####  ######    | 161
              134 |                           #############################     #### ##############    | 134
              109 |                           ##############################    ###################    | 109
     	       81 |                           ##############################  #####################    | 81
               24 |             -MAX CPUs [24]-####################################################    | 24
                1 |                           #####################################################    | 1
                0 +------------------------------------------------------------------------------------+ 0
		  ^ 01:42:22                     01:49:02 ^                             01:55:42 ^



-- Above sections shows total active sessions across the cluster, time window calculated based on sample time [6 second default ]
-- total 80 samples are printed in graph, one line represents one sample time. Active Session range will change automatically
-- based on active sessions and graph will adjust accordingly. Idea was to have some trend about the active session history.
-- MAX CPU will also be shown to have some idea about Active sessions Vs CPUs available.

-- Note : if value falls in between 2 range buckets then it will be shown in front of lower value, for example - in above screenshot 
--        current active sessions were 288 and it was displayed in front of 268, same applies to MAX CPU info as well
 

   * * Instance level info * *

    |  Global  |  Global  | Estd.     | 
    |  Cache   |  Cache   | Intercnt  | 
Inst|  Blocks  |  Blocks  | Traffic   | 
  ID|  Sent/s  |  Rcvd/s  | MB/s      | 
+---+----------+----------+-----------+ 
  1 |     10831|     10188|      173.3| 
  2 |     10650|     10586|        175| 
                                        

-- Above sections shows Global Cache info instance wise

-- InstID : Instance ID no.
-- GC Blocks Sents/s :  Global cache blocks sent by instance per second
-- GC Blocks Rcvd/s  :  Global cache blocks received by instance per second
-- Estd. Interconnect Traffic MB :  Estimated traffic MB for interconnect by instance





 * * Cluster Level info * *
                             
 | TOP Segments by GC*     |    
 | Waits                   |    
 | IMPACT% [Type:Segment]  |   
 +-------------------------+    
 | 40.8% IND:REFERENCE_INSTANCE 
 | 23.3% IND:MEDIATOR_INSTANCE  
 | 17.4% IND:REFERENCE_INSTANCE 
 |  5.8% IND:S_DOCK_TXN_LOG_P1  
    4.2% IND:COMPOSITE_INSTANCE                                                                         

-- TOP Segments based on all Global Cache* events, Impact is calculated based on GC* waits on individual
-- Segment / Total GC* waits * 100





  * * Cluster level info * * 

+--SqlID--------+--SqlText----------------------------------------------------------------------------+-LongstDur-+-InstCnt-+-Cnt-+--CPU%--+--CONC%--+--CLUS%--+-IO%-+---ReadMb-+
| 0r5xv5d42p3p6 | INSERT INTO REFERENCE_INSTANCE (ID, BINDING_TYPE, REFERENCE_NAME, UPDATED_TIME, PRO |           |    2    |  12 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------+-----------+---------+-----+--------+---------+---------+-----+----------+
| bk2mk4dh179y7 | select VALUE from XREF_DATA  where XREF_COLUMN_NAME = :1  and XREF_TABLE_NAME = :2  | 00:00:09  |    2    |  46 |    59% |     3%  |      0  |   4%|       227|
+---------------+-------------------------------------------------------------------------------------+-----------+---------+-----+--------+---------+---------+-----+----------+
| 7jjp97nb9h2up | INSERT INTO COMPOSITE_INSTANCE (ID, COMPOSITE_DN, CONVERSATION_ID, UPDATED_BY, PARE |           |    2    |  14 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------|-----------+---------+-----+--------+---------+---------+-----+----------+
| apgu0yan9pb6m | insert into  XREF_DATA  ( XREF_TABLE_NAME, XREF_COLUMN_NAME, ROW_NUMBER, VALUE, IS  |           |    2    |  14 |      0 |      0  |      0  |    0|         0|
+---------------+-------------------------------------------------------------------------------------|-----------+---------+-----+--------+---------+---------+-----+----------+

-- Above sections shows cummulative info for the TOP Sqls 

-- LongstDur : if multiple sessions are running same sql, this column would tell timing for the first query, longest duration.
-- InstCnt   : Total no. of Instance running same sql at time of sample [ including parallel and non-parallel query ]
-- Cnt       : Total sessions running same query across all the instances [ including parallel and non-parallel ] 

-- CPU%  : %time spent on CPU
-- CONC% : %time spent on Concurrency
-- CLU%  : %time spent on Cluster waits
-- IO%   : %time spent on IO

--ReadMB : Total physical read mb by the query, cummulative for all running queries across the cluster

-- Note: Most of the columns are fetched from gV$sql_Monitor, will only be populated if query running more than 6 seconds on CPU/IO


Code can be downloaded here.

Please do not forget to share your suggestions/feedback. Thanks –

Note: This utility uses ASH and v$sql_monitor which requires diagnostics and tuning pack !

Advertisements

December 13, 2009

How to convert RAC Enabled binaries for Non-RAC usage

Filed under: Uncategorized — Tags: , , — Jagjeet Singh @ 7:46 am


I have been using Oracle on Virtualized Environment, hence it’s very frequent for me to build a new virtual machine to test out new features.


I had to copy Oracle 10g binaries from one of my RAC instance to a Non-RAC Server. Normal practice I follow is to copy  the Oracle binaries and relink again with “relink all” option. if required, we can attach copied oracle home with central  inventory using “./runInstaller -silent -attachHome” option.

After copying and relinking, when I tried to start my instance I got below error.

SQL*Plus: Release 10.2.0.1.0 – Production on Sat Oct 17 16:30:46 2009

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Enter user-name: / as sysdba
Connected to an idle instance.

SQL> startup
ORA-29702: error occurred in Cluster Group Service operation

SQL> ho oerr ora 29702

29702, 00000, “error occurred in Cluster Group Service operation”
// *Cause: An unexpected error occurred while performing a CGS operation.
// *Action: Verify that the LMON process is still active. Also,
// check the Oracle LMON trace files for errors.


This error message indicates that it can’t start the instance since CRS is not available. I realized this is because of RAC-Enabled binaries.

Below option can be used to check if software binaries are RAC-Enabled or not. Output as 1 indicate RAC-Enabled binaries.

[oracle@js lib]$ nm -r $ORACLE_HOME/rdbms/lib/libknlopt.a | grep -c kcsm.o
1

To turn off RAC


$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_off
# Relink Oracle
$ make -f ins_rdbms.mk ioracle

[oracle@js lib]$ make -f ins_rdbms.mk rac_off

rm -f /u01/app/oracle/product/10g/lib/libskgxp10.so
cp /u01/app/oracle/product/10g/lib//libskgxpd.so /u01/app/oracle/product/10g/lib/libskgxp10.so
rm -f /u01/app/oracle/product/10g/lib/libskgxn2.so
cp /u01/app/oracle/product/10g/lib//libskgxns.so \
/u01/app/oracle/product/10g/lib/libskgxn2.so
/usr/bin/ar d /u01/app/oracle/product/10g/rdbms/lib/libknlopt.a kcsm.o
/usr/bin/ar cr /u01/app/oracle/product/10g/rdbms/lib/libknlopt.a /u01/app/oracle/product/10g/rdbms/lib/ksnkcs.o


[oracle@js lib]$ make -f ins_rdbms.mk ioracle
chmod 755 /u01/app/oracle/product/10g/bin
– Linking Oracle
………


Now it’s returning 0 which indicates it’s not RAC-Enabled binaries.

oracle@js lib]$ nm -r libknlopt.a | grep -c kcsm.o
0


There could be another situation where because of network problem ( or any other .. ) you are not able to start-up the CRS (this does happen !!).  Your first priority is to make database up & available for critical applications irrespective of it’s mode (RAC or Non-RAC).

This option can be used to make database up quickly. you also need to set Cluster_Database=false in parameter file.

To turn On RAC

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_on
# Relink Oracle
$ make -f ins_rdbms.mk ioracle

Create a free website or blog at WordPress.com.