Oracle DBA by Example. Code samples, Scripts, Reference.: buffer busy waits vs free buffer waits

buffer busy waits vs free buffer waits

buffer busy waits - waites to application process to commit/rollback on block
free buffer waits - waites to DBWR process to wtite dirty blocks

=================
Buffer busy waits (%)

=================
This wait happens when a session wants to access a database block in the buffer cache but it cannot because the buffer is busy.

Another session is modifying the block and the contents of the block are in flux during the modification.
The session modifying the block marks the block header with a flag letting other users know a change is taking place and to wait until the complete change is applied.

While the block is being changed, the block is marked as unreadable by others.

The problem is when there is a hot block, such as the first block on the free list of a table, with high concurrent inserts.

The two main cases where this wait can occur are:
Another session is reading the block into the buffer.
Another session holds the buffer in an incompatible mode to our request.

Solution:
1. Tune the UNDO tablespace
2. Check I/O

===================
Estimating for UNDO size

===================
1. SQL 1
  V$UNDOSTAT
SELECT 'Required Undo Tablespace Size Using Statistics: '
|| DBMS_UNDO_ADV.required_undo_size(3600)||'Mb' required_undo_size FROM DUAL;
Required Undo Tablespace Size Using Statistics: 36936 MB

2. SQL 2
SELECT ROUND(d.undo_size/(1024*1024)) "ACTUAL UNDO SIZE [MByte]",
SUBSTR(e.value,1,25) "UNDO RETENTION [Sec]",
ROUND((TO_NUMBER(e.value) * TO_NUMBER(f.value) * g.undo_block_per_sec) / (1024*1024)) "NEEDED UNDO SIZE [MByte]"
FROM (
SELECT SUM(a.bytes) undo_size
FROM v$datafile a,
v$tablespace b,
dba_tablespaces c
WHERE c.contents = 'UNDO'
AND c.status = 'ONLINE'
AND b.name = c.tablespace_name
AND a.ts# = b.ts#
) d,
v$parameter e,
v$parameter f,
(
SELECT MAX(undoblks/((end_time-begin_time)*3600*24)) undo_block_per_sec FROM v$undostat
) g
WHERE e.name = 'undo_retention'
AND f.name = 'db_block_size';

ACTUAL UNDO SIZE [Mb] UNDO RETENTION [Sec] NEEDED UNDO SIZE [MB]
--------------------- ------------------------ ---------------------
32768 3600   32562

Actual size of UNDO tablespace
SELECT maxbytes/1024/1024 AS max_mb

FROM DBA_DATA_FILES

WHERE tablespace_name LIKE '%UNDO%';

max_mb
------
32767

Check UNDO sergments status
SELECT status,
ROUND (sum_bytes / (1024*1024), 0) as MB,
ROUND ((sum_bytes / undo_size) * 100, 0) as PERC
FROM
(
  SELECT status, sum(bytes) sum_bytes
  FROM DBA_UNDO_EXTENTS
group by status
),
(
  SELECT sum(a.bytes) undo_size
FROM DBA_TABLESPACES c
join v$tablespace b on b.name = c.tablespace_name
join v$datafile a on a.ts# = b.ts#
WHERE c.contents = 'UNDO'
AND c.status = 'ONLINE'
);

STATUS MB PERC
--------- ---------- ----------
ACTIVE 63 0
EXPIRED 6837 21
UNEXPIRED 20246 62

ACTIVE extents - the system is using
UNEXPIRED extents - are used for read consistency
EXPIRED extents - can be reused.

SELECT status,
ROUND(sum_bytes / (1024*1024), 0) as MB,
ROUND((sum_bytes / undo_size) * 100, 0) as PERC,
DECODE(status,
'UNEXPIRED', ROUND((sum_bytes / undo_size * factor) * 100, 0),
'EXPIRED', 0,
ROUND((sum_bytes / undo_size) * 100, 0)) FULL
FROM
(
SELECT status, sum(bytes) sum_bytes
FROM DBA_UNDO_EXTENTS
GROUP BY status
),
(
select sum(a.bytes) undo_size
from dba_tablespaces c
join v$tablespace b on b.name = c.tablespace_name
join v$datafile a on a.ts# = b.ts#
where c.contents = 'UNDO'
and c.status = 'ONLINE'
),
(
select tuned_undoretention, u.value, u.value/tuned_undoretention factor
from v$undostat us
join (select max(end_time) end_time from v$undostat) usm
on usm.end_time = us.end_time
join (select name, value from v$parameter) u
on u.name = 'undo_retention'
);

STATUS MB PERC FULL
--------- ---------- ---------- ----------
UNEXPIRED 19997 61 61
EXPIRED 7214 22 0
ACTIVE 102 0 0

SELECT tuned_undoretention, u.value,

u.value/tuned_undoretention factor
FROM V$UNDOSTAT US
JOIN (select max(end_time) end_time FROM V$UNDOSTAT) usm
on usm.end_time = us.end_time
JOIN (select name, value FROM V$PARAMETER) u
on u.name = 'undo_retention';

tuned_undoretention value factor
------------------- ---------- ----------
3600 3600 1

--Longest running query
SELECT max(maxquerylen)logest_query_sec

FROM v$undostat;

logest_query_sec
----------------
1939

--Estimate UNDO based on logest_query_sec - estimate to avoid ORA-01555
SELECT
ROUND(MAX(undoblks/600)*8192*max(maxquerylen)/(1024*1024))

AS "UNDO in MB"
FROM V$UNDOSTAT;

UNDO in MB
----------
17538

Find SQL responsible for buffer busy waits

SELECT SESS.sql_hash_value,

SW.p1 file#,

SW.p2 block#,

SW.p3 reason
FROM V$SESSION_WAIT SW,

V$SESSION SESS
WHERE SW.event = 'buffer busy waits'
AND SW.sid = SESS.sid;

===============================
By example
===============================

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total Wait Avg % DB Wait
Event Waits Time (sec) Wait time Class
------------------------------ ----------- ---------- --------- ------ --------
DB CPU 3764.6 63.3
db file sequential read 1,020,186 1066.9 1.05ms 17.9 User I/O
log file sync 42,985 826.1 19.22ms 13.9 Commit
log buffer space 383 155.6 406.23ms 2.6 Configur
buffer busy waits 200,170 113.6 567.55us 1.9 Concurre
enq: TX - index contention 15,792 52.3 3.31ms .9 Concurre
SQL*Net more data from client 3,418,373 48.5 14.18us .8 Network
direct path read 13,588 40.7 3.00ms .7 User I/O
SQL*Net message from dblink 2,580 34.9 13.53ms .6 Network
cursor: mutex S 31,054 34 1.10ms .6 Concurre

=============
log buffer space

=============
The log buffer space wait event occurs when server processes write data into the log buffer faster than the LGWR process can write it out
As the LGWR process writes entries to disk, user processes can reuse the space in the log buffer for new entries.
If the Log Buffer is too small, user processes wait for Log Buffer Space until the LGWR flushes the redo information in memory to disk.
The Log Buffer Space wait event could be an indication of slow disks and/or slow log file switches.

SELECT name, value FROM V$PARAMETER

WHERE name LIKE 'log_buffer%'

ORDER BY by name;

name value
----------- --------------
log_buffer 111370240

=============

Log File Sync
=============

"log file sync" event is the time it takes for the log writer (LGWR) to write to the redo log file.

The Oracle "log file sync" wait event is triggered when a user session issues a commit (or a rollback).
The user session will signal or post the LGWR to write the log buffer to the redo log file.
When the LGWR has finished writing, it will post the user session.
The wait is entirely dependent on LGWR to write out the necessary redo blocks and send confirmation of its completion back to the user session.
The wait time is sometimes called "commit latency".

The P1 parameter in V$SESSION_WAIT is defined as follows for the log file sync wait event:
P1 = buffer#
All changes up to this buffer number (in the log buffer) must be flushed to disk and the writes confirmed.

The wait is for LGWR to flush up to this buffer#.

=================
Reducing Log File Sync

=================

What to check?

Average time for this event should be examined.
If the average wait time is low, but the number of waits is high, then the application might be committing after every row, rather than batching COMMITs.

Reducing the overall number of commits by batching transactions can be very beneficial.

If the SQL statement is a SELECT statement, review the Oracle Auditing settings.
If Auditing is enabled for SELECT statements, Oracle could be spending time writing and commit data to the AUDIT$ table.

SELECT * FROM V$SESSION_WAIT

WHERE event LIKE '%sync%';

If the average wait time is high, then examine the other log related waits for the session, to see where the session is spending most of its time.

If a session continues to wait on the same buffer# then the SEQ# column of V$SESSION_WAIT should increment every second.

If not then the local session has a problem with wait event timeouts.
If the SEQ# column is incrementing then the blocking process is the LGWR process.
Check to see what LGWR is waiting on as it may be stuck.

Try to reduce resource contention.

Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.
SELECT * FROM V$SYSSTAT WHERE name LIKE '%transac%' AND value > 0

Increase the size for log files, up to several Gb, say 8Gb, can reduce the time and occurrence of wait events and improve performance.
Oracle recommends 3 to 4 switches per hour.

Higher frequency can increase log file sync waits.

Is the application committing to often?
If the application commits to often it can cause high waits on log file sync since each commit flushes redo data from the redo buffer to the redo logs.
Oracle has a recommendation that user calls per (commits+rollbacks) should not be lower than 30, if it is the application is committing to frequent.

For example:

Key Instance Activity Stats DB/Inst: IGT/igt Snaps: 31891-31892
-> Ordered by statistic name
Statistic Total per Second per Trans
-------------------------------- ------------------ -------------- -------------
db block changes 73,448,532 20,359.6 977.8
execute count 9,482,394 2,628.5 126.2
logons cumulative 2,521 0.7 0.0
opened cursors cumulative 320,466 88.8 4.3
parse count (total) 2,151,065 596.3 28.6
parse time elapsed 13,731 3.8 0.2
physical reads 3,944,428 1,093.4 52.5
physical writes 8,680,011 2,406.1 115.6
redo size 39,938,539,788 11,070,768.7 531,663.2
session cursor cache hits 338,288 93.8 4.5
session logical reads 165,008,626 45,739.6 2,196.6
user calls 15,048,551 4,171.4 200.3
user commits 75,116 20.8 1.0
user rollbacks 4 0.0 0.0
workarea executions - onepass 0 0.0 0.0
workarea executions - optimal 390,480 108.2 5.2

15,048,551/(75,116+4)=200 avg user calls per commit.

Oracle DBA by Example. Code samples, Scripts, Reference.

Pages

Sunday, June 4, 2023

buffer busy waits vs free buffer waits

No comments:

Post a Comment