Pages

Monday, July 7, 2014

B Tree Index with heavy DML

================
General
================
Indexes for columns that are effected by DML operations, might be badly effected in two main ways:

A. The order of data in Index leafs is out of sequence with order of data in Table blocks. 
   As a result, the Index Clustering Factor is becoming large, and Index range scans are more costly.
   This is related only for Non-Unique indexes.

B. In case the Index is on a date, sequence, etc, the Inserts are only on right side of the Index. 
   In case there are Delete or Update operations on the underlying table, the table would remain roughly in the same size, but the Index would continue to grow, as the space from the deleted/updated leafs is never reused.  

The post includes:
- Short description of related technical terms.
- SQLs to identify problematic Indexes.
- Real case scenarios
- Solution, i.e. running REBUILD INDEX as a scheduled Job.


Short description of related technical terms
  CF - Clustering Factor.
  PCTFREE and PCTUSED in Indexes.
 DML operations effect Index storage.


================
CF - Clustering Factor.
================
   A number that describes how table data stored on disk with correlation to index data.
   Index values are stored in order on disk, so Index block would have entries such as 1,2,3,4,5,6, etc...

Option Table data could be correlated to Index data.    Option A - Table values are stored in order on disk, so Table block would have entries such as 1,2,3,4,5,6, etc... This could be achieved if a sorted data was loaded.

   Option B - Table values are stored in disorder on disk, so Table block would have entries such as 1, 43444, 4565565645, 32, etc...

Number of I/O to read the whole Table:
In Option A - Read one Index block would fetch a range of Table ROWID, that are stored on one Table block. - One I/O operation to fetch Table data is enough.
In Option B - Read one Index block would fetch a range of Table ROWID, that are stored in several Table blocks. - Many I/O operations to fetch Table data are needed.

In Option A - the max value of I/O operations is the number of Table blocks.
In Option B - the max value of I/O operations is the number of Table rows.

Of course, Oracle would not choose option B - and the access to Index is skipped.

Clustering Factor Value
In Option A - The Clustering Factor is Low - It value would be close to the number of Index blocks.
In Option B - The Clustering Factor is High - it value would be close to the number of rows in the table.

Clustering Factor is stored in DBA_INDEXES.clustering_factor.

It correct value is depends upon statistics gathering, via DBMS_STATS.GATHER_TABLE_STATS.

DBMS_STATS.GATHER_TABLE_STATS usage:
EXEC DBMS_STATS.GATHER_TABLE_STATS(MY_USER, 'MY_TABLE', estimate_percent => 100, method_opt=> 'for all indexed columns size 254');

Clustering Factor and Performance.
As clustering factor is getting higher, Oracle should not prefer using index range scan, and instead use full table scan.



Top Kyte has a very nice explanation on Clustering Factor:
On Clustering Factor and Validating Keys

Another nice reference:
Clustering Factor demystified


==========================
PCTFREE and PCTUSED in Indexes.
==========================
The meaning of pctfree and pctused:
PCTFREE - The percent of the block, that is reserved free during Insert.
PCTUSED - The percent of the block, that set the limit, below which the block is considered to be free. (i.e. Goes into freelist).
In Indexes, these parameters are quite meaningless.
Per Oracle documentation:

PCTFREE for Indexes.
Indexes infrequently require the use of free space for updates to index data. 
Therefore, the PCTFREE value for index segment data blocks is normally very low (for example, 5 or less).

PCTUSED for Indexes.
PCTUSED is meaningless for Indexes. Data is inserted as needed, and if there is no space, the block is split. 
The default value for PCTUSED is 0.

Query existing values:
For Table: USER_TALES.pct_free and USER_TABLES.pct_used.
For Index: USER_INDEXES.pct_free.

More on PCTFREE and PCTUSED:
Managing Space in Data Blocks

===========================
DML operations effect Index storage.
===========================
Consider following scenario:
There is a TRANSACTIONS table, with index on create_tx_date field.
New transactions are constantly being added, and the index is growing on the right side of B Tree.
Transaction could be deleted from the table.
In this case, the Index blocks are not being re-used, as Index is growing only on the right side of B Tree.
So, the Table stays relatively constant in size, as blocks are deleted and then reused.
The Index is growing bigger and bigger.

The solution in these case - perform ongoing REBUILD INDEX operation.

ALTER INDEX MY_INDEX COALESCE;
ALTER INDEX MY_INDEX REBUILD ONLINE;

Oracle documentation on rebuilding indexes:
Costs and Benefits of Coalescing or Rebuilding Indexes


SQLs to identify problematic Indexes.
How to identify indexes that should be rebuild?
There is no clear cut answer.
Possible logic could be:

Option A - Find indexes that use more than X percent of the underlying table.
                        In the SQL below, a threshold of 50% is used.

Option B - Find indexes that have a large value for Clustering Factor.

Option C - based on table ALL_TAB_MODIFICATIONS, find the tables that have heavy DML activity, and then check the Indexes on these tables.

=====================================================
SQL Query to list Index Info.
=====================================================

SELECT USER_IND_COLUMNS.table_name,
       USER_IND_COLUMNS.index_name,
       USER_IND_COLUMNS.column_name,
       USER_IND_COLUMNS.column_position,
       USER_TAB_COLS.data_type,
       USER_TAB_COLS.last_analyzed,
       USER_TAB_COLS.sample_size,
       USER_TAB_COLS.num_buckets,
       USER_TAB_COLS.histogram,
       USER_TAB_COLS.nullable
 FROM  USER_IND_COLUMNS, USER_TAB_COLS
 WHERE USER_IND_COLUMNS.table_name = USER_TAB_COLS.table_name
   AND USER_IND_COLUMNS.column_name = USER_TAB_COLS.column_name
   AND USER_IND_COLUMNS.table_name = 'DATA_HISTORY'   
ORDER BY USER_IND_COLUMNS.table_name, 
         USER_IND_COLUMNS.index_name,  
         USER_IND_COLUMNS.column_position  


TABLE_NAME    INDEX_NAME           COLUMN_NAME    COL_POSITION DATA_TYPE  LAST_ANALYZED
------------- -------------------- -------------- ------------ ---------- -------------
DATA_HISTORY  DATA_HISTORY_IX1     GATE_ID                   1 VARCHAR2   20/05/2007 10
DATA_HISTORY  DATA_HISTORY_IX1     DATE_OF_CALL              2 VARCHAR2   20/05/2007 10
DATA_HISTORY  DATA_HISTORY_IX1     INTERVAL_ID               3 NUMBER     20/05/2007 10
DATA_HISTORY  DATA_HISTORY_IX2     DATE_OF_CALL              1 VARCHAR2   20/05/2007 10
DATA_HISTORY  DATA_HISTORY_IX2     INTERVAL_ID               2 NUMBER     20/05/2007 10

=====================================================
SQL Query to find Indexes that use more space than the Table they Index.
======================================================
Two possible situation for this scenario:
- The table has very few columns, and all these columns are part of the Index.
- The table undergoes frequent DML operations.

 SELECT TABLE_SEGMENTS.segment_name   AS table_name,
       TABLE_SEGMENTS.partition_name AS tab_partition,
       TABLE_SEGMENTS.table_size_mb AS tab_Mb,
       INDEX_SEGMENTS.segment_name AS index_name,
       INDEX_SEGMENTS.partition_name AS ind_partition,
       INDEX_SEGMENTS.index_size_mb AS ind_Mb,
       INDEX_SEGMENTS.num_rows,
       ROUND((INDEX_SEGMENTS.index_size_mb/TABLE_SEGMENTS.table_size_mb)*100) pct
FROM
(
SELECT USER_INDEXES.table_name            AS master_table,
       USER_SEGMENTS.segment_name         AS segment_name,
       NVL(USER_SEGMENTS.partition_name,'NONE') AS partition_name,
       USER_SEGMENTS.segment_type         AS segment_type,
       USER_INDEXES.num_rows              AS num_rows,
       SUM(USER_SEGMENTS.bytes)/1024/1024 AS table_size_mb  
FROM USER_INDEXES,
     USER_SEGMENTS
WHERE USER_SEGMENTS.segment_name = USER_INDEXES.table_name
  AND USER_SEGMENTS.segment_type = 'TABLE'
  AND USER_INDEXES.num_rows > 10000
GROUP BY USER_INDEXES.table_name, USER_INDEXES.table_name, USER_SEGMENTS.partition_name, USER_SEGMENTS.segment_name, USER_SEGMENTS.segment_type, USER_INDEXES.num_rows
)TABLE_SEGMENTS,    
(
SELECT USER_INDEXES.table_name             AS master_table,
       USER_SEGMENTS.segment_name          AS segment_name,    
       NVL(USER_SEGMENTS.partition_name,'NONE')  AS partition_name,    
       USER_SEGMENTS.segment_type          AS segment_type,
       USER_INDEXES.num_rows               AS num_rows,
       SUM(bytes)/1024/1024                AS index_size_mb
FROM USER_INDEXES,
     USER_SEGMENTS
WHERE USER_SEGMENTS.segment_name = USER_INDEXES.index_name
  AND USER_SEGMENTS.segment_type IN ('INDEX', 'INDEX PARTITION')
  AND USER_INDEXES.num_rows > 10000
GROUP BY USER_INDEXES.table_name, USER_SEGMENTS.segment_name, USER_SEGMENTS.partition_name,  USER_SEGMENTS.segment_type, USER_INDEXES.num_rows
)INDEX_SEGMENTS    
WHERE TABLE_SEGMENTS.master_table = INDEX_SEGMENTS.master_table
  AND ROUND((INDEX_SEGMENTS.index_size_mb/TABLE_SEGMENTS.table_size_mb)*100) > 49
ORDER BY table_name, index_name;

 TABLE_NAME       TAB_PARTITION   TAB_MB INDEX_NAME IND_PARTITION IND_MB NUM_ROWS PCT
----------------- ------------- -------- ---------- ------------- ------ -------- ----
DIST_SUBS         NONE                37 DIST_PK    NONE              48  1830878  130
EXT_TRACE         NONE                50 EXT_IDX    NONE              33   312831   66
TRAN_TYPE1        NONE               0.9 TRAN1_IX1  NONE          0.9375    17990  100
IRE_FILES         NONE                 2 IRE_PK     NONE               2    15408  100
TRAN_RECIEVED     NONE                45 TRNREC_IX1 NONE              57  1618488  127
SCH_JOB_LOGGING   NONE                 8 EL_PK      NONE               4   125197   50
SERVICE_PROVIDER  NONE                 6 SRVPROV_PK NONE               4    74371   67
TYPE2_TRANS       NONE                72 TYPE2_IX1  NONE               39 1223880   54
TYPE2_TRANS       NONE                72 TYPE2_IX2  NONE               39 1223880   54
TYPE2_TRANS       NONE                72 TYPE2_IX3  NONE               80 1229510  111
TYPE3_TRANS       NONE               800 TYPE3_PK   NONE             1603  619779  200

11 rows selected

=====================================================
SQL Query to find Indexes with high Clustering Factor.
=====================================================
SELECT INDEXES.table_name, 
       TABLES.num_rows as table_rows,        
--     TABLE_SEGMENTS.blocks as table_blocks,
       INDEXES.index_name, 
       INDEXES.tablespace_name AS index_tablespace,
       TABLE_SEGMENTS.BYTES/1024/1024 TABLE_SIZE_MB,   
       INDEX_SEGMENTS.bytes/1024/1024 INDEX_SIZE_MB,
--     INDEX_SEGMENTS.blocks as index_blocks,
--     INDEXES.avg_data_blocks_per_key, 
--     INDEXES.avg_leaf_blocks_per_key,
       ROUND((INDEX_SEGMENTS.blocks/TABLE_SEGMENTS.blocks)*100) AS ind_size_pct,
       INDEXES.clustering_factor
       --ROUND(INDEXES.clustering_factor / TABLES.num_rows) * 100 AS index_grade,
  /*
       CASE when nvl(INDEXES.clustering_factor,0) = 0 then '0-No Stats' 
         WHEN NVL(TABLES.num_rows,0) = 0 then '0-No Stats'
         WHEN (ROUND(INDEXES.clustering_factor / TABLES.num_rows) * 100) < 10 then '1-Very Good'
         WHEN (ROUND(INDEXES.clustering_factor / TABLES.num_rows) * 100) between 10 and 30 then '2-Good'
         ELSE '3-Poor' 
       END Index_Quality
  */
FROM USER_INDEXES INDEXES, 
     USER_SEGMENTS INDEX_SEGMENTS, 
     USER_SEGMENTS TABLE_SEGMENTS,
     USER_TABLES TABLES
WHERE 1=1 AND
       rownum < 101 and
--     INDEXES.index_name like upper ('%CDR%') and
--     INDEXES.owner=TABLES.owner and
     INDEXES.table_name = TABLES.table_name and 
--     INDEXES.owner=SEGMENTS.owner and
     INDEX_SEGMENTS.segment_name = INDEXES.index_name and
     INDEX_SEGMENTS.segment_type = 'INDEX' and
     INDEX_SEGMENTS.partition_name IS NULL and
     TABLE_SEGMENTS.segment_name = TABLES.table_name and     
     TABLE_SEGMENTS.segment_type = 'TABLE' and
     TABLE_SEGMENTS.partition_name IS NULL and        
     TABLES.num_rows > 1000000 and
ROUND((INDEX_SEGMENTS.blocks/TABLE_SEGMENTS.blocks)*100) > 50
--     ROUND(INDEXES.clustering_factor / TABLES.num_rows) * 100 > 1
ORDER BY TABLE_NAME, INDEX_NAME;



TABLE_NAME         TABLE_ROWS INDEX_NAME         INDEX_TABLESPACE TAB_SIZE_MB IND_SIZE_MB CLSTR_FACTOR
------------------ ---------- ------------------ ---------------- ----------- ----------- ----------
DIST_SUBS          1869166 DIST_SUBS_PK          COLLECT_TABLE             37          48       9083
EXT_TAB_TRACE      7373633 EXT_TAB_TRACE_PK      COLLECT_TABLE            506         128      71672
TRNS_RECIEVED      1663523 OSMS_RECEVED_A1       COLLECT_TABLE             45          57    1616777
TYPE1_TRNSACTIONS  1610955 TYPE1_TRAN_IX1        COLLECT_INDEX             312         45      38565
SUBSCRIBER         1892134 SUBSCRIBER_PK         COLLECT_TABLE             496         48    1920935
SUBSCRIBER         1892134 SUB_IX_1              COLLECT_TABLE             496         30     223116
SUBSCRIBER         1892134 SUB_IX_2              COLLECT_TABLE             496         49     204876
TYPE2_TRNSACTIONS  1214240 TYPE2_IX1             COLLECT_INDEX              72         39      18710
TYPE2_TRNSACTIONS  1214240 TYPE2_IX2             COLLECT_INDEX              72         34     558378
TYPE2_TRNSACTIONS  1214240 TYPE2_IX3             COLLECT_INDEX              72         80     568291
TYPE3_TRNSACTIONS  1098599 TYPE2_IX1             COLLECT_INDEX             400         34         61
TYPE3_TRNSACTIONS  1098599 TYPE2_IX2             COLLECT_INDEX             400         10         61
TYPE3_TRNSACTIONS  1098599 TYPE2_IX3             COLLECT_INDEX             400         51     138566

13 rows selected


===================================
Identify Tables, that have heavy DML operations.
===================================
SELECT table_name, partition_name, inserts, updates, deletes 
  FROM ALL_TAB_MODIFICATIONS 
 WHERE table_owner <> 'SYS'
   AND (deletes > 100000 OR updates > 100000);

TABLE_NAME              PARTITION_NAME  INSERTS    UPDATES    DELETES
----------------------- --------------- ---------- ---------- ----------
MY_TABLE_A                                  660011   59694632          0
MY_TABLE_B                                 7909445   68541766          0
MY_TABLE_B                                36742403   71139884     552220
MY_TABLE_B              SYS_P21            4581729    8868736      68962
MY_TABLE_B              SYS_P22            4603103    8905426      69738
MY_TABLE_B              SYS_P23            4592981    8895739      67836
MY_TABLE_B              SYS_P24            4610398    8919283      68524


=========
Real case I.
=========
SQL> SELECT segment_name, blocks 
       FROM USER_SEGMENTS 
      WHERE segment_name IN ('MY_TABLE_A_IX1', 'MY_TABLE_A_IX2', 'MY_TABLE_A_IX3', 'SSU_PSMS_MSISDN_IDX', 'MY_TABLE_A');

SEGMENT_NAME                BLOCKS
----------------------- ----------
MY_TABLE_A                  406016
MY_TABLE_A_IX1               25728
MY_TABLE_A_IX2              180224
MY_TABLE_A_IX3               20608
MY_TABLE_A_IX4               22528

As we can see, the Index MY_TABLE_A_IX2 is quite big, which is surprising.
The index MY_TABLE_A_IX2 is on a date column in table MY_TABLE_A.


SQL> ALTER INDEX MY_TABLE_A_IX2 COALESCE;

SQL> SELECT segment_name, blocks 
       FROM USER_SEGMENTS 
      WHERE segment_name IN ('MY_TABLE_A_IX1', 'MY_TABLE_A_IX2', 'MY_TABLE_A_IX3', 'SSU_PSMS_MSISDN_IDX', 'MY_TABLE_A');

SEGMENT_NAME            BLOCKS
----------------------- ----------
MY_TABLE_A                  406016
MY_TABLE_A_IX1               25728
MY_TABLE_A_IX2              180224
MY_TABLE_A_IX3               20608
MY_TABLE_A_IX4               22528

No change in disk usage of index MY_TABLE_A_IX2.

SQL> ALTER INDEX SSU_PSMS_LAST_LU_IDX REBUILD ONLINE;
Index altered

SQL> SELECT segment_name, blocks 
       FROM USER_SEGMENTS 
      WHERE segment_name IN ('MY_TABLE_A_IX1', 'MY_TABLE_A_IX2', 'MY_TABLE_A_IX3', 'SSU_PSMS_MSISDN_IDX', 'MY_TABLE_A');

SEGMENT_NAME            BLOCKS
----------------------- ----------
MY_TABLE_A                  406016
MY_TABLE_A_IX1               25728
MY_TABLE_A_IX2               14336
MY_TABLE_A_IX3               20608
MY_TABLE_A_IX4               22528



The size from Index MY_TABLE_A_IX2 decreased from 180224 locks to 14336, a 92% decrease!



=========
Real case II.
=========

Table MY_SUBSCRIBER got an index on a column, last_contact_date, defined as DATETIME.
New entries were constantly added, to the table.
Each time user made an inquire, the last_contact_date field was updated to current date.
As a result, the Index on  last_contact_date was growing constantly on the right side, reaching 8Gb of memory, and filling up all index tablespace, whereas the table was pretty much at constant size of 1500Mb.

Here is a sample of data from that table.
As one can see, every few seconds, a few dozen records are created.

SELECT * 
FROM 
  (SELECT TO_CHAR(last_contact_date,'YYYYMMDD hh:mm:ss') last_contact_date,
  count(*)
  FROM MY_SUBSCRIBER 
  GROUP BY last_contact_date
  ORDER BY COUNT(*) DESC)
WHERE ROWNUM < 21;

last_contact_date    COUNT(*)
-------------------- ----------
19700101 12:01:00           106
20140707 04:07:17            69
20140707 04:07:24            64
20140707 04:07:18            63
20140707 04:07:09            59
20140707 04:07:19            59
20140707 04:07:22            58
20140707 07:07:48            58
20140707 04:07:16            58
20140707 04:07:08            58
20140707 04:07:02            57
20140707 07:07:12            56
20140707 07:07:14            56
20140707 07:07:27            56
20140707 07:07:34            55
20140707 04:07:06            55
20140707 04:07:54            55
20140707 04:07:23            55
20140707 07:07:00            54
20140707 04:07:11            53
20 rows selected

The solution was to create a scheduled job, that did 
ALTER INDEX last_contact_date_IX REBUILD ONLINE on a daily basis.

=========
Appendix.
=========
How to monitor Index creation/rebuild
V$SESSION_LONGOPS

Performance Issues
Rebuild Index is a heavy operation. By default journal table is created on the same tablespace, where the original Index exists.
1. Need to verify that there is enough physical disk space.
2. Need to verify that there is enough space in tablespace.
3. To speed up the rebuild process, consider run in parallel.
ALTER INDEX MY_INDEX REBUILD PARALLEL 8 ONLINE

===============================
ALTER INDEX REBUILD ONLINE
===============================

Perform ALTER INDEX REBUILD ONLINE from PL/SQL Procedure.

 This Procedure might be called from a scheduled Job.

PROCEDURE rebuildIndexes IS
  v_exception_code NUMBER := -20101;
  v_sql_str VARCHAR2(4000);
  exp_msg VARCHAR2(4000);
    
--FIRST - Need explicit privilege GRANT CREATE TABLE TO <user>;
  sqlStr VARCHAR2(2000):='';
  sqlStr1 VARCHAR2(20):='ALTER INDEX';
  sqlStr2 VARCHAR2(20):='REBUILD ONLINE';   
    
  CURSOR indexListCur IS 
  SELECT 'MY_INDEX_1_IDX' AS index_name FROM DUAL
  UNION ALL
  SELECT 'MY_INDEX_2_IDX' AS index_name FROM DUAL;
BEGIN
  FOR indexListRec IN indexListCur LOOP
    sqlStr := sqlStr1||' '||indexListRec.index_name||' '||sqlStr2;
    EXECUTE IMMEDIATE sqlStr;
  END LOOP;
    
 EXCEPTION
  WHEN OTHERS THEN
     exp_msg := 'Unexpected Error in rebuildIndexes.'||CHR(10)||'Error Details: '||SQLERRM||CHR(10);
       RAISE_APPLICATION_ERROR(v_exception_code, exp_msg); 
END;

Note - Permission issue when doing REBUILD ONLINE.
   When issuing ALTER INDEX REBUILD ONLINE, Oracle is creating journal table to handle the changes that occur during index rebuild phase. 
   As stated above, the index would be rebuild in a new Segment.

   Normally that would not be an issue, but when the ALTER INDEX REBUILD ONLINE code is places inside PL/SQL, it is needed to grant explicit permissions to the PL/SQL code owner. Permission granted via Roles are not effective here. 
   In this case, the correct permission is 'GRANT CREATE TABLE TO <USER>'   as to allow the creation of journal table. ( NOT index related privileges as one might expect)

Locking issues during ALTER INDEX REBUILD ONLINE
Here is a link to a very good post.
Index Create and Rebuild Locking Improvements in 11g
Index Locking by Example

Oracle requires a table lock on the index base table in two point in time:
- At the start of the CREATE or REBUILD process, to guarantee DD information.
 - At the end of the process, to merge index changes made during the rebuild into the final index structure.

Oracle 11G has made some improvements in the locking implications regarding creating or rebuilding indexes online.

In Oracle 11G, if there’s an active transaction on the base table at the time one of these locks is required, the indexing process will still hang as its done previously until all these prior active transactions have completed. No change so far.

However, if the indexing process has been locked out and subsequent transactions relating to the base table start afterwards, these transactions will no longer in turn be locked out by the indexing table locks and are able to complete successfully. 
The indexing process no longer impacts other concurrent transactions on the base table, as it was in versions prior to Oracle 11G.

So during period of transactions activity, the REBUILD ONLINE process would probably take longer, at it would spend some time waiting for requested lock. The good part it will not "stuck" subsequent transactions on the base table.

================================
Rebuild Index for Partitioned Tables.
================================
Scenario
Drop Partition for a table with Global Indexes.
If index was a range-partitioned global index, then all partitions of the index would require rebuilding. 
It is not possible to rebuild all partitions of an index in one statement.
Need to write a separate 
REBUILD statement for each partition in the index. 


Option I
- DROP PARTITION
SELECT index_name FROM USER_TABLES WHERE table_name = 'MY_TABLE';
SELECT index_name, partition_name FROM USER_IND_PARTITIONS WHERE index_name = 'MY_INDEX';
ALTER INDEX <index_name> REBUILD PARTITION <partition_name> ONLINE;
 Repeat this step for ALL partitions.

Option II.
First Delete all records, so that the index statistics would be updated in global index.
Only then drop the partition.

DELETE FROM my_table WHERE start_date LIKE '2008%'
ALTER TABLE my_table DROP PARTITION P_2008;

Useful Links
Oracle Metalink Note: Rebuild schema indexes (Doc ID 555284.1)

No comments:

Post a Comment