Pages

Sunday, June 4, 2023

buffer busy waits vs free buffer waits

buffer busy waits vs free buffer waits

buffer busy waits - waites to application process to commit/rollback on block
free buffer waits - waites to DBWR process to wtite dirty blocks

=================
Buffer busy waits (%)
=================
This wait happens when a session wants to access a database block in the buffer cache but it cannot because the buffer is busy. 

Another session is modifying the block and the contents of the block are in flux during the modification. 
The session modifying the block marks the block header with a flag letting other users know a change is taking place and to wait until the complete change is applied.
While the block is being changed, the block is marked as unreadable by others.

The problem is when there is a hot block, such as the first block on the free list of a table, with high concurrent inserts.

The two main cases where this wait can occur are:
Another session is reading the block into the buffer.
Another session holds the buffer in an incompatible mode to our request.

Solution:
1. Tune the UNDO tablespace
2. Check I/O

===================
Estimating for UNDO size
===================
1. SQL 1
  
V$UNDOSTAT
SELECT 'Required Undo Tablespace Size Using Statistics: '
 || DBMS_UNDO_ADV.required_undo_size(3600)||'Mb' required_undo_size FROM DUAL;

Required Undo Tablespace Size Using Statistics: 36936 MB

2. SQL 2
SELECT ROUND(d.undo_size/(1024*1024)) "ACTUAL UNDO SIZE [MByte]",
      SUBSTR(e.value,1,25) "UNDO RETENTION [Sec]",
      ROUND((TO_NUMBER(e.value) * TO_NUMBER(f.value) * g.undo_block_per_sec) / (1024*1024))  "NEEDED UNDO SIZE [MByte]"
 FROM (
   SELECT SUM(a.bytes) undo_size
     FROM v$datafile a,
          v$tablespace b,
          dba_tablespaces c
    WHERE c.contents = 'UNDO'
      AND c.status = 'ONLINE'
      AND b.name = c.tablespace_name
      AND a.ts# = b.ts#
  ) d,
 v$parameter e,
 v$parameter f,
 (
 SELECT MAX(undoblks/((end_time-begin_time)*3600*24)) undo_block_per_sec FROM v$undostat
 ) g
 WHERE e.name = 'undo_retention'
 AND f.name = 'db_block_size';

 
ACTUAL UNDO SIZE [Mb] UNDO RETENTION [Sec]     NEEDED UNDO SIZE [MB]
--------------------- ------------------------ ---------------------
                32768                   3600                 32562


Actual size of UNDO tablespace
SELECT maxbytes/1024/1024 AS max_mb  
  FROM DBA_DATA_FILES 
 WHERE tablespace_name LIKE '%UNDO%';

max_mb
------
32767

Check UNDO sergments status
SELECT status,
       ROUND (sum_bytes / (1024*1024), 0) as MB,
       
ROUND ((sum_bytes / undo_size) * 100, 0) as PERC
  FROM
(
  
SELECT status, sum(bytes) sum_bytes
    
FROM DBA_UNDO_EXTENTS
  group by status
),
(
  
SELECT sum(a.bytes) undo_size
    FROM DBA_TABLESPACES c
    join v$tablespace b on b.name = c.tablespace_name
    join v$datafile a on a.ts# = b.ts#
   WHERE c.contents = 'UNDO'
     AND c.status = 'ONLINE'
);

STATUS            MB       PERC
--------- ---------- ----------
ACTIVE            63          0
EXPIRED         6837         21
UNEXPIRED      20246         62

ACTIVE extents
- the system is using 
UNEXPIRED extents - are used for read consistency 
EXPIRED extents - can be reused.

SELECT status,
       ROUND(sum_bytes / (1024*1024), 0) as MB,
       ROUND((sum_bytes / undo_size) * 100, 0) as PERC,
       DECODE(status, 
       'UNEXPIRED', ROUND((sum_bytes / undo_size * factor) * 100, 0),
       'EXPIRED',   0,
       ROUND((sum_bytes / undo_size) * 100, 0)) FULL
FROM
(
 
SELECT status, sum(bytes) sum_bytes
   FROM DBA_UNDO_EXTENTS
 GROUP BY status
),
(
 select sum(a.bytes) undo_size
 from dba_tablespaces c
 join v$tablespace b on b.name = c.tablespace_name
 join v$datafile a on a.ts# = b.ts#
 where c.contents = 'UNDO'
 and c.status = 'ONLINE'
),
(
 select tuned_undoretention, u.value, u.value/tuned_undoretention factor
 from v$undostat us
 join (select max(end_time) end_time from v$undostat) usm
    on usm.end_time = us.end_time
 join (select name, value from v$parameter) u
    on u.name = 'undo_retention'
);

STATUS            MB       PERC       FULL
--------- ---------- ---------- ----------
UNEXPIRED      19997         61         61
EXPIRED         7214         22          0
ACTIVE           102          0          0


SELECT tuned_undoretention, u.value, 
       u.value/tuned_undoretention factor
FROM V$UNDOSTAT US
JOIN (select max(end_time) end_time FROM V$UNDOSTAT) usm
on usm.end_time = us.end_time
JOIN (select name, value FROM V$PARAMETER) u
on u.name = 'undo_retention';

tuned_undoretention      value     factor
------------------- ---------- ---------- 
               3600       3600          1

--Longest running query
SELECT max(maxquerylen)logest_query_sec 
  FROM v$undostat;

logest_query_sec
----------------
            1939

   
--Estimate UNDO based on logest_query_sec - estimate to avoid ORA-01555
SELECT
  ROUND(MAX(undoblks/600)*8192*max(maxquerylen)/(1024*1024)) 
  AS "UNDO in MB"
FROM V$UNDOSTAT;

UNDO in MB
----------
     17538
 

Find SQL responsible for buffer busy waits
 
SELECT SESS.sql_hash_value, 
       SW.p1 file#, 
       SW.p2 block#, 
       SW.p3 reason
  FROM V$SESSION_WAIT SW,
       V$SESSION SESS
 WHERE 
SW.event = 'buffer busy waits'
   AND 
SW.sid = SESS.sid;


===============================
By example
===============================

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait       Avg   % DB Wait
Event                                Waits Time (sec)      Wait   time Class
------------------------------ ----------- ---------- --------- ------ --------
DB CPU                                         3764.6             63.3
db file sequential read          1,020,186     1066.9    1.05ms   17.9 User I/O
log file sync                       42,985      826.1   19.22ms   13.9 Commit
log buffer space                       383      155.6  406.23ms    2.6 Configur
buffer busy waits                  200,170      113.6  567.55us    1.9 Concurre
enq: TX - index contention          15,792       52.3    3.31ms     .9 Concurre
SQL*Net more data from client    3,418,373       48.5   14.18us     .8 Network
direct path read                    13,588       40.7    3.00ms     .7 User I/O
SQL*Net message from dblink          2,580       34.9   13.53ms     .6 Network
cursor: mutex S                     31,054         34    1.10ms     .6 Concurre



=============
log buffer space
=============
The log buffer space wait event occurs when server processes write data into the log buffer faster than the LGWR process can write it out
As the LGWR process writes entries to disk, user processes can reuse the space in the log buffer for new entries. 
If the Log Buffer is too small, user processes wait for Log Buffer Space until the LGWR flushes the redo information in memory to disk.
The Log Buffer Space wait event could be an indication of slow disks and/or slow log file switches.

SELECT name, value FROM V$PARAMETER 
 WHERE name LIKE 'log_buffer%' 
 ORDER BY by name;

name        value
----------- --------------
log_buffer  111370240

=============
Log File Sync
=============
"log file sync" event is the time it takes for the log writer (LGWR) to write to the redo log file. 
The Oracle "log file sync" wait event is triggered when a user session issues a commit (or a rollback). 
The user session will signal or post the LGWR to write the log buffer to the redo log file. 
When the LGWR has finished writing, it will post the user session. 
The wait is entirely dependent on LGWR to write out the necessary redo blocks and send confirmation of its completion back to the user session. 
The wait time is sometimes called "commit latency".

The P1 parameter in V$SESSION_WAIT is defined as follows for the log file sync wait event:

P1 = buffer#
All changes up to this buffer number (in the log buffer) must be flushed to disk and the writes confirmed.
The wait is for LGWR to flush up to this buffer#.

=================
Reducing Log File Sync
=================
What to check?

Average time for this event should be examined. 
If the average wait time is low, but the number of waits is high, then the application might be committing after every row, rather than batching COMMITs. 

Reducing the overall number of commits by batching transactions can be very beneficial.

If the SQL statement is a SELECT statement, review the Oracle Auditing settings. 
If Auditing is enabled for SELECT statements, Oracle could be spending time writing and commit data to the AUDIT$ table.

SELECT * FROM V$SESSION_WAIT 
 WHERE event LIKE '%sync%';

If the average wait time is high, then examine the other log related waits for the session, to see where the session is spending most of its time. 

If a session continues to wait on the same buffer# then the SEQ# column of V$SESSION_WAIT should increment every second. 

If not then the local session has a problem with wait event timeouts. 

If the SEQ# column is incrementing then the blocking process is the LGWR process. 
Check to see what LGWR is waiting on as it may be stuck.

Try to reduce resource contention. 
Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.
SELECT * FROM V$SYSSTAT WHERE name LIKE '%transac%' AND value > 0

Increase the size for  log files, up to several Gb, say 8Gb, can reduce the time and occurrence of wait events and improve performance.
Oracle recommends 3 to 4 switches per hour. 
Higher frequency can increase log file sync waits.

Is the application committing to often?
If the application commits to often it can cause high waits on log file sync since each commit flushes redo data from the redo buffer to the redo logs. 
Oracle has a recommendation that user calls per (commits+rollbacks) should not be lower than 30, if it is the application is committing to frequent. 

For example:

Key Instance Activity Stats               DB/Inst: IGT/igt  Snaps: 31891-31892
-> Ordered by statistic name
Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
db block changes                         73,448,532       20,359.6         977.8
execute count                             9,482,394        2,628.5         126.2
logons cumulative                             2,521            0.7           0.0
opened cursors cumulative                   320,466           88.8           4.3
parse count (total)                       2,151,065          596.3          28.6
parse time elapsed                           13,731            3.8           0.2
physical reads                            3,944,428        1,093.4          52.5
physical writes                           8,680,011        2,406.1         115.6
redo size                            39,938,539,788   11,070,768.7     531,663.2
session cursor cache hits                   338,288           93.8           4.5
session logical reads                   165,008,626       45,739.6       2,196.6
user calls                               15,048,551        4,171.4         200.3
user commits                                 75,116           20.8           1.0
user rollbacks                                    4            0.0           0.0
workarea executions - onepass                     0            0.0           0.0
workarea executions - optimal               390,480          108.2           5.2

15,048,551/(75,116+4)=200
avg user calls per commit.

No comments:

Post a Comment