Pages

Monday, January 18, 2021

Memory Usage Monitoring script. For SWAP using processes, Golden Gate, Top 20 oracle processes.

crontab

 0,15,30,45 * * * * /software/oracle/oracle/scripts/memory_usage/gg_mem_usage_wrapper.sh

gg_mem_usage_wrapper.sh
#!/bin/bash
WORK_DIR=/software/oracle/oracle/scripts/memory_usage
LOG_FILE=gg_memory.log
RUN_DATE=`date +"%Y%m%d"_"%H%M%S"`

cd ${WORK_DIR}
touch ${LOG_FILE}
echo " " >> ${LOG_FILE}
echo "=================" >> ${LOG_FILE}
echo ${RUN_DATE} >> ${LOG_FILE}
./gg_mem_usage.sh >> ${LOG_FILE}
echo " " >> ${LOG_FILE}
free -m >> ${LOG_FILE}
echo " " >> ${LOG_FILE}

echo "Top 20 Processes" >> ${LOG_FILE}
ps -eo rss,pid,euser,lstart,args:100 --sort %mem | grep -v grep | sort -n | tail -20 | awk '{printf $1/1024 "MB"; $1=""; print }' >> ${LOG_FILE}
echo " " >> ${LOG_FILE}

echo "Top SWAP Using Processes" >> ${LOG_FILE}
echo " " >> ${LOG_FILE}
echo "SWAP Usage from free -m" >> ${LOG_FILE}
free -m | grep Swap >> ${LOG_FILE}
echo " " >> ${LOG_FILE}
echo "SWAP Usage from /proc" >> ${LOG_FILE}
find /proc -maxdepth 2 -path "/proc/[0-9]*/status" -readable -exec awk -v FS=":" '{process[$1]=$2;sub(/^[ \t]+/,"",process[$1]);} END {if(process["VmSwap"] && process["VmSwap"] != "0 kB") printf "%10s %-30s %20s\n",process["Pid"],process["Name"],process["VmSwap"]}' '{}' \; | awk '{print $(NF-1),$0}' | sort -hr | head | cut -d " " -f2- >> ${LOG_FILE}
echo " " >> ${LOG_FILE}

exit

gg_mem_usage.sh
#!/bin/bash
###############################
# determine the OS type
###############################
OSNAME=`uname`
case "$OSNAME" in
  "SunOS")
    echo "OSNAME = $OSNAME"
    ;;
  "Linux")
    echo "OSNAME = $OSNAME"
    ;;
  "*")
    echo "This script has not been verified on $OSNAME"
    exit 1
    ;;
esac
###############################
# set the temp file
###############################
TMPFILE=/tmp/pmem.tmp
if [ -f $TMPFILE ]
then
  rm -f $TMPFILE
fi
################################
# loop over the gg process types
################################
PROCESSES="extract replicat"
for PROCESS in $PROCESSES
do
  FLAG=""
  FLAG=`ps -ef | grep $PROCESS`
  if [ -z "FLAG" ]
  then
    echo "No $PROCESS processes found"
  else
    echo
    echo "#####################################"
    echo "# Individual $PROCESS Process Usage #"
    echo "#####################################"
    case "$OSNAME" in
      "Linux")
        ps -C $PROCESS -O rss > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{print $2/1024, "MB", $12}' | sort -k 2
        ;;
      "SunOS")
        ps -efo vsz,uid,pid,ppid,pcpu,args | grep -v grep | grep $PROCESS > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{print $1/1024, "MB", $8}' | sort -k 2
        ;;
      "*")
        echo "This script has not been verified on $OSNAME"
        exit 1
        ;;
    esac
    rm -f $TMPFILE
    echo
    echo "#####################################"
    echo "#   Total $PROCESS Process Usage    #"
    echo "#####################################"
    case "$OSNAME" in
      "Linux")
        ps -C $PROCESS -O rss > $TMPFILE
        cat $TMPFILE | grep $PROCESS | awk '{count ++; sum=sum+$2; } END \
          { print "Number of processes      =",count; \
          print "AVG Memory usage/process =",sum/1024/count, "MB"; \
          print "Total memory usage       =", sum/1024,  " MB"}'
        ;;
      "SunOS")
        ps -efo vsz,uid,pid,ppid,pcpu,comm | grep -v grep | grep $PROCESS > $TMPFILE
        cat $TMPFILE | awk '{count ++; sum=sum+$1; } END \
          { print "Number of processes      =",count; \
          print "AVG Memory usage/process =",sum/1024/count, "MB"; \
          print "Total memory usage       =", sum/1024,  " MB"}'
        ;;
      "*")
        echo "This script has not been verified on $OSNAME"
        exit 1
        ;;
    esac
    rm -f $TMPFILE
  fi
done
exit

=================
Sample output
=================
20210118_181502
OSNAME = Linux

#####################################
# Individual extract Process Usage #
#####################################
38.7109 MB DPM_I_01
42.125 MB DPM_P_01
41.8672 MB DPM_S_01
70.8867 MB EXT_I_01
78.9297 MB EXT_P_01
361.016 MB EXT_S_01

#####################################
#   Total extract Process Usage    #
#####################################
Number of processes      = 6
AVG Memory usage/process = 105.589 MB
Total memory usage       = 633.535  MB

#####################################
# Individual replicat Process Usage #
#####################################
32.4219 MB REP_I_01
37.2383 MB REP_P_01
34.8789 MB REP_S_01

#####################################
#   Total replicat Process Usage    #
#####################################
Number of processes      = 3
AVG Memory usage/process = 34.8464 MB
Total memory usage       = 104.539  MB

             total       used       free     shared    buffers     cached
Mem:         64175      62568       1607       9373        384      57889
-/+ buffers/cache:       4294      59880
Swap:        10239        122      10117

Top 20 Processes
361.016MB 30432 oracle Sat Jan 16 20:46:19 2021 /software/ogg/1212/extract PARAMFILE /software/ogg/1212/dirprm/ext_s_01.prm REPORTFILE /software/ogg/1212/dirrpt/EXT_S_01.rpt PROCESSID EXT_S_01 USESUBDIRS
371.129MB 30853 oracle Sat Jan 16 20:46:24 2021 ora_ms03_igt
392.086MB 30851 oracle Sat Jan 16 20:46:24 2021 ora_ms02_igt
488.648MB 28170 oracle Sat Jan 16 20:46:05 2021 ora_smon_igt
512.262MB 28454 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
607.949MB 28162 oracle Sat Jan 16 20:46:05 2021 ora_mman_igt
682.445MB 28392 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
724.746MB 29669 oracle Sat Jan 16 20:46:13 2021 oracleigt (LOCAL=NO)
972.762MB 30606 oracle Sat Jan 16 20:46:19 2021 oracleigt (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
1046.21MB 29663 oracle Sat Jan 16 20:46:13 2021 oracleigt (LOCAL=NO)
1173.61MB 32712 oracle Sat Jan 16 20:46:47 2021 oracleigt (LOCAL=NO)
1208.2MB 20575 oracle Sun Jan 17 09:59:59 2021 oracleigt (LOCAL=NO)
1278.24MB 28830 oracle Sat Jan 16 20:46:11 2021 oracleigt (LOCAL=NO)
1934.22MB 30150 oracle Sat Jan 16 20:46:16 2021 oracleigt (LOCAL=NO)
2697.98MB 28386 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
3556.99MB 28390 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
5853.8MB 28388 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
5938.82MB 28443 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
6263.12MB 28298 oracle Sat Jan 16 20:46:10 2021 oracleigt (LOCAL=NO)
6995.96MB 28164 oracle Sat Jan 16 20:46:05 2021 ora_dbw0_igt

Top SWAP Using Processes

SWAP Usage from free -m
Swap:        10239        122      10117

SWAP Usage from /proc
     30649 splunkd                                    49264 kB
      8260 vxconfigd                                  24152 kB


Thursday, January 14, 2021

Golden Gate Extract Error: ERROR OGG-01091 Unable to open file

=========================
General
=========================
 Golden Gate Pump Extract Error
2021-01-14 14:29:10  ERROR   OGG-01091  Unable to open file "/software/ogg/191/dirdat/01/out/ep000000065" (error 2, No such file or directory)
Indeed, there is no such file. There is only file ep000000066.
Need to advance extract to read from existing file.
=========================
Solution
=========================
GGSCI (qanfv-2-dbs-01) 4> INFO ALL
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING
EXTRACT     ABENDED     DPM_I_01    00:00:00      00:01:38
EXTRACT     STOPPED     DPM_P_01    00:00:00      00:00:19
> ALTER EXTRACT DPM_P_01, EXTSEQNO 66
EXTRACT altered.
> START DPM_P_01
Sending START request to MANAGER ...
EXTRACT DPM_P_01 starting
> INFO DPM_P_01
EXTRACT    DPM_P_01  Last Started 2021-01-14 14:34   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:07 ago)
Process ID           16129
Log Read Checkpoint  File /software/ogg/191/dirdat/01/out/ep000000066
                     2020-12-17 13:55:12.649300  RBA 294731942

Datapump
DBLOGIN USERID ogg PASSWORD pass
STOP DPM_I_01
ALTER EXTRACT DPM_I_01, EXTSEQNO 9789, EXTRBA 0
ALTER EXTRACT DPM_I_01 ETROLLOVER
START DPM_I_01

Sunday, January 10, 2021

ORA-00354: corrupt redo log block header

=================
General
=================
There are large trace files with same error:

-rw-r----- 1 oracle dba 658132992 Jan  8 18:57 ./diag/rdbms/igt/igt/trace/igt_arc1_30193.trc
-rw-r----- 1 oracle dba 649527296 Jan  8 18:57 ./diag/rdbms/igt/igt/trace/igt_arc0_30185.trc
-rw-r----- 1 oracle dba 537001984 Jan 11 00:46 ./diag/rdbms/igt/igt/trace/igt_arc0_30446.trc

=================
Error in trace files
=================

============================================================
/software/oracle/diag/rdbms/igt/igt/trace/igt_arc1_30193.trc
============================================================

Trace file /software/oracle/diag/rdbms/igt/igt/trace/igt_arc1_30193.trc

krsv_proc_add: Request to add process to V$MANAGED_STANDBY [krss.c:2851]
*** 2021-01-08T17:30:31.928588+05:45
Corrupt redo block 691858 detected: BAD BLOCK HEADER
 header size = 16, block size = 512
Seq: 0x00000000 Block: 0x00000000 Time: 134217728 Beg: 0x0 Cks: 0x0
Dump of memory from 0x00007FBE4B0FE400 to 0x00007FBE4B0FE600
7FBE4B0FE400 00000000 00000000 08000000 00000000  [................]

Corrupt redo block 691858 detected: BAD BLOCK HEADER
 header size = 16, block size = 512
Seq: 0x00000000 Block: 0x00000000 Time: 134217728 Beg: 0x0 Cks: 0x0

DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

2021-01-08T17:30:32.068886+05:45
Incident 128315 created, dump file: /software/oracle/diag/rdbms/igt/igt/incident/incdir_128315/igt_arc1_30193_i128315.trc

ORA-00353: log corruption near block 691858 change 38994289 time 01/08/2021 17:00:25
ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

<error barrier> at 0x7fff9511adc8 placed krse.c@7386
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 691858 change 38994289 time 01/08/2021 17:00:25
ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'
*** 2021-01-08 17:30:33.201232 [krsh.c:6348]
Error 354  while archiving
DDE: Problem Key 'ORA 312' was flood controlled (0x1) (no incident)
ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'
<error barrier> at 0x7fff9511f820 placed krse.c@1876
ORA-16038: log 3 sequence# 4695 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

*** 2021-01-08T17:30:37.150427+05:45

Corrupt redo block 691858 detected: BAD BLOCK HEADER
 header size = 16, block size = 512
Seq: 0x00000000 Block: 0x00000000 Time: 134217728 Beg: 0x0 Cks: 0x0

=================
Check Current Status
=================

SET LINESIZE 160

COL STATUS FOR A20
COL REDOLOG_FILE_NAME FOR A60
SELECT
a.GROUP#,
a.THREAD#,
a.SEQUENCE#,
a.ARCHIVED,
a.STATUS,
b.MEMBER AS REDOLOG_FILE_NAME,
(a.BYTES/1024/1024) AS SIZE_MB
FROM v$log a
JOIN v$logfile b ON a.Group#=b.Group#
ORDER BY a.GROUP#;

    GROUP#    THREAD#  SEQUENCE# ARCHIVED     STATUS               REDOLOG_FILE_NAME                           SIZE_MB

---------- ---------- ---------- ------------ -------------------- ---------------------------------------- ----------
         1          1       4782 YES          ACTIVE               /oracle_db/db1/db_igt/ora_redo_01_a.rdo        1024

         2          1       4783 NO           CURRENT              /oracle_db/db1/db_igt/ora_redo_02_a.rdo        1024

         3          1       4695 NO           INACTIVE             /oracle_db/db1/db_igt/ora_redo_03_a.rdo        1024


All errors are from redo 03 group

oracle@my_server:/software/oracle/diag/rdbms/igt/igt/trace>% grep ORA-00312 *.trc | sort -u

igt_arc0_30446.trc:ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

igt_arc1_30466.trc:ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

igt_arc2_30476.trc:ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

igt_arc3_30487.trc:ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'


================
Solution
================
SQL>ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 3;

Database altered.

SQL>SET LINESIZE 160

COL STATUS FOR A20
COL REDOLOG_FILE_NAME FOR A60
SELECT
a.GROUP#,
a.THREAD#,
a.SEQUENCE#,
a.ARCHIVED,
a.STATUS,
b.MEMBER AS REDOLOG_FILE_NAME,
(a.BYTES/1024/1024) AS SIZE_MB
FROM V$LOG a
JOIN V$LOGFILE b ON a.Group#=b.Group#
ORDER BY a.GROUP#;

    GROUP#    THREAD#  SEQUENCE# ARCHIVED     STATUS               REDOLOG_FILE_NAME                                               SIZE_MB
---------- ---------- ---------- ------------ -------------------- ------------------------------------------------------------ ----------
         1          1       4782 YES          INACTIVE             /oracle_db/db1/db_igt/ora_redo_01_a.rdo                            1024

         2          1       4783 NO           CURRENT              /oracle_db/db1/db_igt/ora_redo_02_a.rdo                            1024

         3          1          0 YES          UNUSED               /oracle_db/db1/db_igt/ora_redo_03_a.rdo                            1024


Group Log status meaning:
CURRENT - Current redo log. This implies that the redo log is active. 

ACTIVE - 
Log is active but is not the current log. It is needed for crash recovery. 

INACTIVE - 
Log is no longer needed for instance recovery. 

UNUSED
- Online redo log has never been written to. 
                      This is the state of a redo log that was just added, or just after a RESETLOGS, when it is not the current redo log, or after CLEAR UNARCHIVED LOGFILE command

CLEARING
- Log is being re-created as an empty log after an ALTER DATABASE CLEAR LOGFILE statement. After the log is cleared, the status changes to UNUSED.

CLEARING_CURRENT
- Current log is being cleared of a closed thread. The log can stay in this status if there is some failure in the switch such as an I/O error writing the new log header.


In alert.log:

Clearing online log 3 of thread 1 sequence number 4695

2021-01-11T01:17:54.055476+05:45

Errors in file /software/oracle/diag/rdbms/igt/igt/trace/igt_arc3_30487.trc:

ORA-00354: corrupt redo log block header

ORA-00353: log corruption near block 691858 change 38994289 time 01/08/2021 17:00:25

ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

2021-01-11T01:17:54.055551+05:45

ARC3 (PID:30487): Error 354  while archiving

2021-01-11T01:17:54.055628+05:45

Closing local archive destination LOG_ARCHIVE_DEST_1 '/oracle_db/db2/db_igt/arch/arch0001_4695_1040477025.arc', error=354 (igt)

ARC3 (PID:30487): Committing creation of archive log '/oracle_db/db2/db_igt/arch/arch0001_4695_1040477025.arc', error=354

ARC3 (PID:30487): LNO:3 mismatch, expected sequence 4695 found T-1.S-0

2021-01-11T01:17:54.078378+05:45

Errors in file /software/oracle/diag/rdbms/igt/igt/trace/igt_arc3_30487.trc:

ORA-16038: log 3 sequence# 4695 cannot be archived

ORA-00354: corrupt redo log block header

ORA-00312: online log 3 thread 1: '/oracle_db/db1/db_igt/ora_redo_03_a.rdo'

ARC3 (PID:30487): Archival error occurred on a closed thread, archiver continuing

2021-01-11T01:17:54.078511+05:45

ORACLE Instance igt, archival error, archiver continuing

2021-01-11T01:17:57.357756+05:45

Completed: ALTER DATABASE CLEAR UNARCHIVED LOGFILE GROUP 3

2021-01-11T01:22:42.413279+05:45

ALTER SYSTEM ARCHIVE LOG

2021-01-11T01:22:42.483945+05:45

Thread 1 advanced to log sequence 4784 (LGWR switch)

  Current log# 3 seq# 4784 mem# 0: /oracle_db/db1/db_igt/ora_redo_03_a.rdo

2021-01-11T01:22:44.306443+05:45

NET  (PID:7901): Archived Log entry 4417 added for T-1.S-4783 ID 0x4c386ae1 LAD:1

2021-01-11T01:22:48.774541+05:45

Control autobackup written to DISK device



SQL> /

    GROUP#    THREAD#  SEQUENCE# ARCHIVED     STATUS               REDOLOG_FILE_NAME                                               SIZE_MB

---------- ---------- ---------- ------------ -------------------- ------------------------------------------------------------ ----------

         1          1       4782 YES          INACTIVE             /oracle_db/db1/db_igt/ora_redo_01_a.rdo                            1024

         2          1       4783 YES          ACTIVE               /oracle_db/db1/db_igt/ora_redo_02_a.rdo                            1024

         3          1       4784 NO           CURRENT              /oracle_db/db1/db_igt/ora_redo_03_a.rdo                            1024

Now, the redo log group 3 could be archived.
Issue fixed.


======================
Force Oracle to switch log file
======================
Option A - ALTER SYSTEM  SWITCH LOGFILE
This command is asynchronous.
This command is fast to return to the invoking program because the writing of the redo log to the OS filesystem is done in the background.  

Option B - ALTER SYSTEM ARCHIVE LOG CURRENT
This command is synchronous:  
This command waits until the online redo log has completed the writing of the redo log file to the filesystem.  
This command is safer because it waits for the OS to acknowledge (ACK) that the redo log has been successfully written.