Oracle DBA by Example. Code samples, Scripts, Reference.: February 2018

Tuesday, February 6, 2018

Basic bash script Example: input parameters validation, get run_date, replace sting in a file, backup original file.

===========================
General
===========================
This script does search and remove a string inside a file.

===========================
Code
===========================

my_user@my_server:~/scripts>% less remove_table_from_rep.sh

#!/bin/bash
echo
echo $0 Starting...........................
echo

#----------------------------
#Validate Input Parameters
#----------------------------
PARAM_NUMBER=$#
echo Input Parameters Number: $PARAM_NUMBER
if [[ $PARAM_NUMBER -ne 3 ]]; then
echo "Usage $0 FOLDER_NAME FILE_NAME TABLE_NAME"
exit 0
fi

FOLDER_NAME=$1
FILE_NAME=$2
TABLE_NAME=$3

BACKUP_FOLDER=/some/path/scripts/backup_ogg_param
RUN_DATE=`date +"%Y%m%d_%H%M%S"`

#----------------------------
#Replace String
#----------------------------

grep -v ${TABLE_NAME} ${FOLDER_NAME}/${FILE_NAME} > ${FOLDER_NAME}/${FILE_NAME}_bak

cp ${FOLDER_NAME}/${FILE_NAME} ${BACKUP_FOLDER}/${FILE_NAME}_${RUN_DATE}
mv ${FOLDER_NAME}/${FILE_NAME}_bak ${FOLDER_NAME}/${FILE_NAME}

echo
echo File ${FOLDER_NAME}/${FILE_NAME} was copied to ${BACKUP_FOLDER}/${FILE_NAME}_${RUN_DATE}
echo $0 Finished...........................
echo

Monday, February 5, 2018

Killing Zombie Jobs in DBA_JOBS RUNNING

================================
General
================================
For unknown reason, jobs stopped from being run on a scheduler from USER_JOBS.
When checking for broken - it is in 'N' value
When checking application log, there is no error, the job seems to have been stuck in the middle of execution, and since the a new job was not submitted.

================================
What to Check
================================

DBA_JOBS

DBA_JOBS_RUNNING

ora_j0xx_oraigt Processes on Linux

Step1.
Checking DBA_JOBS_RUNNING - no job is related to an entry in V$SESSION.
job value is allways NULL.
Seems that these are "Zombie" jobs.

SELECT 'kill -9 '||proc.SPID AS "Linux Kill",
jobs.JOB as job,
sess.sid||'.'||sess.serial# as sid_pid,
sess.username
FROM V$SESSION sess,
V$PROCESS proc,
DBA_JOBS_RUNNING jobs
WHERE proc.addr = sess.paddr
AND sess.type <> 'BACKGRGROUND'
AND sess.logon_time < SYSDATE -1
AND sess.username = 'MYS_CELCO_MOCOQ'
AND sess.sid = jobs.sid(+)

Linux Kill job sid_pid username
------------- --------- --------------- ---------------
kill -9 22776 979.61739 MYS_CELCO_MOCOQ
kill -9 22765 790.2795 MYS_CELCO_MOCOQ
kill -9 4104 54.20611 MYS_CELCO_MOCOQ
kill -9 4096 30.27007 MYS_CELCO_MOCOQ
kill -9 4110 228.21157 MYS_CELCO_MOCOQ
kill -9 20859 810.58589 MYS_CELCO_MOCOQ
kill -9 4106 991.32265 MYS_CELCO_MOCOQ
kill -9 4118 98.38363 MYS_CELCO_MOCOQ
kill -9 4108 843.2089 MYS_CELCO_MOCOQ
kill -9 4092 106.45951 MYS_CELCO_MOCOQ
kill -9 1161 850.55871 MYS_CELCO_MOCOQ
kill -9 1169 212.38989 MYS_CELCO_MOCOQ
kill -9 30451 828.6607 MYS_CELCO_MOCOQ
kill -9 4100 178.61219 MYS_CELCO_MOCOQ
kill -9 30790 215.36845 MYS_CELCO_MOCOQ
kill -9 4090 195.21231 MYS_CELCO_MOCOQ
kill -9 30801 838.15399 MYS_CELCO_MOCOQ
kill -9 4098 959.63365 MYS_CELCO_MOCOQ
kill -9 30796 899.31015 MYS_CELCO_MOCOQ
kill -9 4114 817.4443 MYS_CELCO_MOCOQ
kill -9 4088 771.53083 MYS_CELCO_MOCOQ
kill -9 1165 148.39835 MYS_CELCO_MOCOQ
kill -9 22771 849.23091 MYS_CELCO_MOCOQ
kill -9 6087 138.29993 MYS_CELCO_MOCOQ
kill -9 30793 230.13119 MYS_CELCO_MOCOQ
kill -9 30798 199.4211 MYS_CELCO_MOCOQ
kill -9 4094 804.13365 MYS_CELCO_MOCOQ
kill -9 4112 963.23979 MYS_CELCO_MOCOQ
kill -9 4085 126.8799 MYS_CELCO_MOCOQ
kill -9 1145 977.23831 MYS_CELCO_MOCOQ

Step 2.

Checking How long ago these jobs were run.

The check was done at 05/02/2018

SELECT DBA_JOBS_RUNNING.sid,

DBA_JOBS_RUNNING.job,

DBA_JOBS_RUNNING.last_date,

ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run

FROM DBA_JOBS_RUNNING, DBA_JOBS

WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB

ORDER BY DBA_JOBS.SCHEMA_USER;

sid job last_date days_since_last_run

------- ------- ------------------- -------------------

890 281 28/10/2017 00:00:13 101

971 283 27/10/2017 23:50:13 101

198 284 27/10/2017 21:00:48 101

787 282 28/10/2017 00:00:38 101

186 1382

953 1261

224 41 28/10/2017 00:00:23 101

226 43 27/10/2017 23:50:06 101

100 42 28/10/2017 00:00:43 101

124 44 27/10/2017 21:01:22 101

188 1361

941 123 27/10/2017 23:00:18 101

888 121 28/10/2017 00:01:48 101

39 243 27/10/2017 00:00:04 102

135 2585 27/10/2017 23:01:12 101

944 461 28/10/2017 00:01:54 101

924 1044

806 1043

837 1042

882 1041

207 1221

803 1241

156 85 28/10/2017 00:00:05 101

909 86 28/10/2017 00:00:31 101

812 88 27/10/2017 21:03:15 101

153 64 27/10/2017 21:03:01 101

945 62 28/10/2017 00:04:48 101

...

SELECT COUNT(*) FROM DBA_JOBS_RUNNING;

Step 3.

On Linux,

There are many ora_j0xx_igt processes.

oracle@my_server:~/scripts>% ps -ef | grep j0 | wc -l

36

ps -ef | grep j0

oracle   12331     1  0 12:46 ?        00:00:00 ora_j030_igt

oracle   12511     1  0 12:46 ?        00:00:00 ora_j031_igt

oracle   12519     1  0 12:46 ?        00:00:00 ora_j032_igt

oracle   12540     1  0 12:46 ?        00:00:00 ora_j033_igt

oracle   12562     1  0 12:46 ?        00:00:00 ora_j034_igt

oracle   12590     1  0 12:46 ?        00:00:00 ora_j039_igt

oracle   12672     1  0 12:46 ?        00:00:00 ora_j040_igt

oracle   12676     1  0 12:46 ?        00:00:00 ora_j042_igt

oracle   16114     1  0 12:49 ?        00:00:00 ora_j005_igt

oracle   18766     1  0 12:25 ?        00:00:00 ora_j013_igt

oracle   19570     1  0 12:25 ?        00:00:00 ora_j011_igt

oracle   19572     1  0 12:25 ?        00:00:00 ora_j014_igt

oracle   20697     1  0 12:26 ?        00:00:00 ora_j016_igt

oracle   20700     1  0 12:26 ?        00:00:00 ora_j022_igt

oracle   20702     1  0 12:26 ?        00:00:00 ora_j023_igt

oracle   20705     1  0 12:26 ?        00:00:00 ora_j026_igt

oracle   23072     1  0 12:55 ?        00:00:00 ora_j007_igt

oracle   27217     1  0 12:59 ?        00:00:00 ora_j003_igt

oracle   27938     1  0  2017 ?        00:00:00 ora_j010_igt

oracle   29689     1  0 12:07 ?        00:00:00 ora_j024_igt

oracle   29691     1  0 12:07 ?        00:00:00 ora_j025_igt

oracle   29829     1  0  2017 ?        00:03:10 ora_j015_igt

oracle   32684     1  0 13:03 ?        00:00:00 ora_j021_igt

oracle   32699     1  0 13:03 ?        00:00:00 ora_j017_igt

ps -ef | grep j0 | awk '{print "kill -9 "$2}'

Execute the generated commands:

kill -9 5285

kill -9 5291

kill -9 5469

kill -9 9288

kill -9 9290

kill -9 9294

kill -9 12590

kill -9 14892

kill -9 16114

kill -9 18766

etc...

SELECT DBA_JOBS_RUNNING.sid,

       DBA_JOBS_RUNNING.job,

       DBA_JOBS_RUNNING.last_date,

       ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run

  FROM DBA_JOBS_RUNNING, DBA_JOBS

WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB

ORDER BY DBA_JOBS.SCHEMA_USER

Step 4.

Most of the jobs should have been cleared by kill -9 command.

This might take some time to reflect in the Database.

Step 5.

If there are still jobs remaining: via DBMS_JOBS remove and recreate it.

Sunday, February 4, 2018

Huge Pages and Oracle

=====================
What are HugePages
=====================
HugePages is a feature integrated into the Linux kernel 2.6.
By default Linux memory page size is 4K
With HugePages, the memory page may be between 2Mb and 1024Mb - depending on OS version and HW.

=====================
HugePages and Oracle
=====================
For large SGA sizes, HugePages can give substantial benefits in virtual memory management.

Without HugePages, the memory of the SGA is divided into 4K pages, which have to be managed by the Linux kernel.
With HugePages, the page size is increased, thereby reducing the total number of pages to be managed by the kernel.
Therefore reducing the amount of memory required to hold the page table in memory.
In addition to these changes, the memory associated with HugePages can not be swapped out, which forces the SGA to stay memory resident.
This maked HugePages pretty much standard for Oracle 11g systems running on Linux x86-64.

=====================
TLB and HugePages
=====================
TLB - Translation Lookaside Buffer hit ratio.
TLB is a memory cache that is used to reduce the time taken to access a user memory location
It is a part of the chip’s memory-management unit (MMU)
When using HugePages, the kernel has fewer pages whose lifecyle must be monitored.
It increases the TLB hit ratio, since operating system maintenance of page states is reduced.

Oracle Documentation: Overview of HugePages

=====================
How see current settings
=====================

Current HugeSpaces usage:

New HugePages Metrics

AnonHugePages - The number of pages that the newer Transparent Huge Page mechanism currently has in use.

Old HugePages mechanism metrics:

HugePages_Total - How many HugePages could be (MAX Value).

The HugePages_Free - How many pages are available for allocation.

The number of used HugePages is HugePages_Total–HugePagesFree.

>% grep -i HugePages /proc/meminfo

AnonHugePages: 81920 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

AnonHugePages - The AnonHugePages entry lists the number of pages that are currently has in use. In this example: 81920/2048=40 pages

Hugepagesize - The size of a single Huge Pages

Another option to see current HugePages usage:

>% less /proc/vmstat | grep trans

nr_anon_transparent_hugepages 40

Is HugePages enabled:

>% less /sys/kernel/mm/redhat_transparent_hugepage/enabled

always madvise [never]

always or madvise - used

never - not used

Linux Processes kswapd0 and khugepaged

kswapd0

The process kswapd0 is the process than manages virtual memory.

When you run low on memory, kswapd0 moves programs that are LRU to the SWAP.

This would causes extreme lag on those processes.

khugepaged

Huge page collapse operation.

It converts a set of normal-sized pages into a single huge page.

It is desirable to have a range of addresses need fewer TLB entries, but the conversion process is expensive because the system needs to find a candidate set of pages to group together and then copy all the memory from the possibly scattered normal-sized pages into a single huge page.

The khugepaged kernel thread searches for candidates pages to collapse into a single huge page.

Even if khugepaged is not successful converting normal-sized pages into huge pages it may still be taking processor time to search for candidate pages.

You can see if the khugepaged kernel thread is taking a significant amount of processor time with:

top -p `pidof khugepaged`

=====================
HugePages and SWAP
=====================

When allocating space with HugePages enabled, and in case of a buggy application, which allocate memory, the server might run out of memory, including out of SWAP space.

Examining HugePages or Transparent HugePages performance

Thursday, February 1, 2018

FLASHBACK and Recovery in Oracle

Flashback database is new from 10G.

Rather than having to restore the database, flashback database allows the database to be rewound very quickly to a previous point in time.

This feature is particularly useful for tests, and deployment.

Flashback database can be effectively run in two modes.
One mode where you log every change to every block, which allows you to flahsback the database to any previous point in time.
Another mode, where Oracle only tracks the "before" images of changed blocks, so that you can flashback to the start of the restore point.

======================================
TABLE
======================================
ALTER TABLE my_table ENABLE ROW MOVEMENT;

FLASHBACK TABLE my_table
TO TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' minute);

FLASHBACK TABLE my_table
TO TIMESTAMP TO_TIMESTAMP('20051201 09:30:00', 'YYYYMMDD hh24:mi:ss')

SELECT CURRENT_SCN FROM V$DATABASE;
FLASHBACK TABLE my_table
TO SCN 123456;

======================================

DATABASE
======================================
How to Set-up Flashback Database
1. Ensure db_recovery_file_dest is set.

sqlplus '/ as sysdba'
SQL> alter system set db_recovery_file_dest='+<FRA Diskgroup>' SCOPE=spfile;

2. Ensure db_recovery_file_dest_size is set

SQL> alter system set db_recovery_file_dest_size=100G SCOPE=spfile;

3. Stop and start the database

sqlplus '/ as sysdba'
SQL> shutdown immediate;
SQL> startup mount;

If flashback to any previous point in time is required, then turn flashback on using the following command
SQL> alter database flashback on;
SQL> alter database open;
SQL> alter system set db_flashback_retention_target=2880;

NOTES
Set the db_recovery_file_dest to an appropriate location for the flashback recovery files.
Set the db_recovery_file_dest_size to an appropriate size for the amount and size of the testing required.
Set the db_flashback_retention_target to an appropriate time, in mins, to retain flashbackability.

Only run alter database flashback on; if there is a requirement to flashback to ANY previous point in time.

Determine if Flashback Database is Already Enabled
1. Run the following commands to determing Flashback is turned on.

sqlplus '/ as sysdba'
SQL> select flashback_on from v$database;

Creating and Using Flashback Restore points.
This worked example assumes the database is using ASM to manage its storage.

Creating a Restore point
Create a restore point whenever the database is at a state that it may needed to be flashed back to.
Use the optional GUARANTEE FLASHBACK DATABASE clause to ensure that the restore point is not aged out of the flashback recovery area (FRA) as dictated by the db_flashback_retention_target parameter.

1. You may want to create the restore point in mount mode. If so, put the database into mount mode now.

2. Create a restore point

sqlplus '/ as sysdba'
SQL> create restore point <restore point name> [GUARANTEE FLASHBACK DATABASE];

Rolling Back to a Restore Point
1. Identify the Restore point

sqlplus '/ as sysdba'
SQL> select name, time,guarantee_flashback_databse from v$restore_point;

2. For a non RAC environment use the following commands to flashback to a restore point.

sqlplus '/ as sysdba'
SQL> shutdown immediate;
SQL> startup mount;
SQL> flashback database to restore point <restore point name>;
SQL> alter database open resetlogs;

3. For RAC instances use the following commands.

One one of the nodes run, srvctl stop database -d <database name> -o immediate
sqlplus '/ as sysdba'
SQL> startup mount;
SQL> flashback database to restore point <restore point name>;
SQL> alter database open resetlogs;
SQL> shutdown immediate;
SQL> quit
srvctl start database -d <database name>
Run crs_stat -t to confirm that the database is backup okay.

NOTES
Any tables created and updated without the LOGGING option will be suseptable to block curruption errors when the database is flashed back. These can be remedied by issuing the TRUNCATE TABLE command against the affected object(s).

Dropping a Restore Point
1. Restore points can be dropped with the database open using the following commands

sqlplus '/ as sysdba'
SQL> drop restore poijnt <restore point name>;

Monitoring Flashback Logging
After enabling flashback logging, Oracle keeps track of the amount of logging generated. This can be queried from v$flashback_database_log, the estimate gets better with age.
Note that this is the size of the flashback logs only and does not include space used by archive logs and RMAN backups.

1. Monitor flashback logs

sqlplus '/ as sysdba'
SQL> select estimated_flashback_size/1024/1024/1024 "EST_FLASHBACK_SIZE(GB)" from v$flashback_database_log;

Finding the Earliest Flashback Point
Querying V$flashback_database_log will show you the earliest point you can flashback your database to based on the size of the FRA and the currently available flashback logs.

1. Find the earliest flashback point

sqlplus '/ as sysdba'
SQL> alter session set nls_date_format='dd/mm/yy hh24:mi:ss';
SQL> select oldest_flashback_scn,oldest_flashback_time from v$flashback_database_log;

Disabling Flashback Database
Full any previous point in time flashback can be disabled with the database open. Any unused Flashback logs will be automatically removed at this point and a message detailing the file deletion written to the alert log.

1. Disabling flashback

sqlplus '/ as sysdba'
SQL> ALTER DATABASE FLASHBACK OFF;

Troubleshooting
The following common errors can occur.

Message
ORA-38754: FLASHBACK DATABASE not started; required redo log is not available
Cause
Oracle needs to have the required archive logs in the archive destination at the time of flashback.

Solution
Use rman to restore the missing archive logs. The sequence and thread numbers are in the error message.

rman target /
run{
allocate channel t1 device type 'sbt_tape';
FLASHBACK DATABASE TO RESTORE POINT <restore point name>;
}
Now resume the flashback process from the RESETLOGS step.