Pages

Tuesday, February 6, 2018

Basic bash script Example: input parameters validation, get run_date, replace sting in a file, backup original file.

===========================
General
===========================
This script does search and remove a string inside a file.

===========================
Code
===========================

my_user@my_server:~/scripts>% less remove_table_from_rep.sh 
#!/bin/bash
echo
echo $0 Starting...........................
echo


#----------------------------
#Validate Input Parameters
#----------------------------
PARAM_NUMBER=$#
echo Input Parameters Number: $PARAM_NUMBER
if [[ $PARAM_NUMBER -ne 3 ]]; then
  echo "Usage $0 FOLDER_NAME FILE_NAME TABLE_NAME"
  exit 0
fi

FOLDER_NAME=$1
FILE_NAME=$2
TABLE_NAME=$3

BACKUP_FOLDER=/some/path/scripts/backup_ogg_param
RUN_DATE=`date +"%Y%m%d_%H%M%S"`

#----------------------------
#Replace String
#----------------------------
grep -v ${TABLE_NAME} ${FOLDER_NAME}/${FILE_NAME} > ${FOLDER_NAME}/${FILE_NAME}_bak
cp ${FOLDER_NAME}/${FILE_NAME} ${BACKUP_FOLDER}/${FILE_NAME}_${RUN_DATE}
mv ${FOLDER_NAME}/${FILE_NAME}_bak ${FOLDER_NAME}/${FILE_NAME}

echo 
echo File ${FOLDER_NAME}/${FILE_NAME} was copied to  ${BACKUP_FOLDER}/${FILE_NAME}_${RUN_DATE}
echo $0 Finished...........................
echo


Monday, February 5, 2018

Killing Zombie Jobs in DBA_JOBS RUNNING

================================
General
================================
For unknown reason, jobs stopped from being run on a scheduler from USER_JOBS.
When checking for broken - it is in 'N' value
When checking application log, there is no error, the job seems to have been stuck in the middle of execution, and since the a new job was not submitted.

================================
What to Check
================================
DBA_JOBS
DBA_JOBS_RUNNING
ora_j0xx_oraigt Processes on Linux

Step1.
Checking DBA_JOBS_RUNNING - no job is related to an entry in V$SESSION.

job value is allways NULL.
Seems that these are "Zombie" jobs.

SELECT 'kill -9 '||proc.SPID AS "Linux Kill", 
       jobs.JOB as job,
       sess.sid||'.'||sess.serial# as sid_pid,
       sess.username 
 FROM V$SESSION sess, 
      V$PROCESS proc, 
      DBA_JOBS_RUNNING jobs
WHERE proc.addr = sess.paddr
  AND sess.type <> 'BACKGRGROUND'
  AND sess.logon_time < SYSDATE -1
  AND sess.username = 'MYS_CELCO_MOCOQ'
  AND sess.sid = jobs.sid(+)

Linux Kill          job sid_pid         username
------------- --------- --------------- ---------------
kill -9 22776 979.61739 MYS_CELCO_MOCOQ
kill -9 22765 790.2795 MYS_CELCO_MOCOQ
kill -9 4104 54.20611 MYS_CELCO_MOCOQ
kill -9 4096 30.27007 MYS_CELCO_MOCOQ
kill -9 4110 228.21157 MYS_CELCO_MOCOQ
kill -9 20859 810.58589 MYS_CELCO_MOCOQ
kill -9 4106 991.32265 MYS_CELCO_MOCOQ
kill -9 4118 98.38363 MYS_CELCO_MOCOQ
kill -9 4108 843.2089 MYS_CELCO_MOCOQ
kill -9 4092 106.45951 MYS_CELCO_MOCOQ
kill -9 1161 850.55871 MYS_CELCO_MOCOQ
kill -9 1169 212.38989 MYS_CELCO_MOCOQ
kill -9 30451 828.6607 MYS_CELCO_MOCOQ
kill -9 4100 178.61219 MYS_CELCO_MOCOQ
kill -9 30790 215.36845 MYS_CELCO_MOCOQ
kill -9 4090 195.21231 MYS_CELCO_MOCOQ
kill -9 30801 838.15399 MYS_CELCO_MOCOQ
kill -9 4098 959.63365 MYS_CELCO_MOCOQ
kill -9 30796 899.31015 MYS_CELCO_MOCOQ
kill -9 4114 817.4443 MYS_CELCO_MOCOQ
kill -9 4088 771.53083 MYS_CELCO_MOCOQ
kill -9 1165 148.39835 MYS_CELCO_MOCOQ
kill -9 22771 849.23091 MYS_CELCO_MOCOQ
kill -9 6087 138.29993 MYS_CELCO_MOCOQ
kill -9 30793 230.13119 MYS_CELCO_MOCOQ
kill -9 30798 199.4211 MYS_CELCO_MOCOQ
kill -9 4094 804.13365 MYS_CELCO_MOCOQ
kill -9 4112 963.23979 MYS_CELCO_MOCOQ
kill -9 4085 126.8799 MYS_CELCO_MOCOQ
kill -9 1145 977.23831 MYS_CELCO_MOCOQ


Step 2.
Checking How long ago these jobs were run.
The check was done at 05/02/2018

SELECT DBA_JOBS_RUNNING.sid,
       DBA_JOBS_RUNNING.job,
       DBA_JOBS_RUNNING.last_date,
       ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run
  FROM DBA_JOBS_RUNNING, DBA_JOBS
WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB
ORDER BY DBA_JOBS.SCHEMA_USER;

sid     job     last_date           days_since_last_run
------- ------- ------------------- -------------------
890 281 28/10/2017 00:00:13             101
971 283 27/10/2017 23:50:13             101            
198 284 27/10/2017 21:00:48                        101
787 282 28/10/2017 00:00:38             101
186 1382
953 1261
224 41 28/10/2017 00:00:23             101
226 43 27/10/2017 23:50:06             101
100 42 28/10/2017 00:00:43             101
124 44 27/10/2017 21:01:22             101
188 1361
941 123 27/10/2017 23:00:18             101
888 121 28/10/2017 00:01:48             101
39 243 27/10/2017 00:00:04             102
135 2585 27/10/2017 23:01:12             101
944 461 28/10/2017 00:01:54             101
924 1044
806 1043
837 1042
882 1041
207 1221
803 1241
156 85 28/10/2017 00:00:05             101
909 86 28/10/2017 00:00:31             101
812 88 27/10/2017 21:03:15             101
153 64 27/10/2017 21:03:01             101
945 62 28/10/2017 00:04:48             101
...
...

SELECT COUNT(*) FROM DBA_JOBS_RUNNING;
18

Step 3.
On Linux,
There are many ora_j0xx_igt processes.
oracle@my_server:~/scripts>% ps -ef | grep j0 | wc -l
36


ps -ef | grep j0
oracle   12331     1  0 12:46 ?        00:00:00 ora_j030_igt
oracle   12511     1  0 12:46 ?        00:00:00 ora_j031_igt
oracle   12519     1  0 12:46 ?        00:00:00 ora_j032_igt
oracle   12540     1  0 12:46 ?        00:00:00 ora_j033_igt
oracle   12562     1  0 12:46 ?        00:00:00 ora_j034_igt
oracle   12590     1  0 12:46 ?        00:00:00 ora_j039_igt
oracle   12672     1  0 12:46 ?        00:00:00 ora_j040_igt
oracle   12676     1  0 12:46 ?        00:00:00 ora_j042_igt
oracle   16114     1  0 12:49 ?        00:00:00 ora_j005_igt
oracle   18766     1  0 12:25 ?        00:00:00 ora_j013_igt
oracle   19570     1  0 12:25 ?        00:00:00 ora_j011_igt
oracle   19572     1  0 12:25 ?        00:00:00 ora_j014_igt
oracle   20697     1  0 12:26 ?        00:00:00 ora_j016_igt
oracle   20700     1  0 12:26 ?        00:00:00 ora_j022_igt
oracle   20702     1  0 12:26 ?        00:00:00 ora_j023_igt
oracle   20705     1  0 12:26 ?        00:00:00 ora_j026_igt
oracle   23072     1  0 12:55 ?        00:00:00 ora_j007_igt
oracle   27217     1  0 12:59 ?        00:00:00 ora_j003_igt
oracle   27938     1  0  2017 ?        00:00:00 ora_j010_igt
oracle   29689     1  0 12:07 ?        00:00:00 ora_j024_igt
oracle   29691     1  0 12:07 ?        00:00:00 ora_j025_igt
oracle   29829     1  0  2017 ?        00:03:10 ora_j015_igt
oracle   32684     1  0 13:03 ?        00:00:00 ora_j021_igt
oracle   32699     1  0 13:03 ?        00:00:00 ora_j017_igt


ps -ef | grep j0 | awk '{print "kill -9 "$2}'
Execute the generated commands:

kill -9 5285
kill -9 5291
kill -9 5469
kill -9 9288
kill -9 9290
kill -9 9294
kill -9 12590
kill -9 14892
kill -9 16114
kill -9 18766
etc...

SELECT DBA_JOBS_RUNNING.sid,
       DBA_JOBS_RUNNING.job,
       DBA_JOBS_RUNNING.last_date,
       ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run
  FROM DBA_JOBS_RUNNING, DBA_JOBS
WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB
ORDER BY DBA_JOBS.SCHEMA_USER


Step 4.
Most of the jobs should have been cleared by kill -9 command.
This might take some time to reflect in the Database.

Step 5.
If there are still jobs remaining: via DBMS_JOBS remove and recreate it.

Sunday, February 4, 2018

Huge Pages and Oracle

=====================
What are HugePages
=====================
HugePages is a feature integrated into the Linux kernel 2.6. 
By default Linux memory page size is 4K
With HugePages, the memory page may be between 2Mb and 1024Mb - depending on OS version and HW.

=====================
HugePages and Oracle
=====================
For large SGA sizes, HugePages can give substantial benefits in virtual memory management. 

Without HugePages, the memory of the SGA is divided into 4K pages, which have to be managed by the Linux kernel.
With HugePages, the page size is increased, thereby reducing the total number of pages to be managed by the kernel. 

Therefore reducing the amount of memory required to hold the page table in memory. 
In addition to these changes, the memory associated with HugePages can not be swapped out, which forces the SGA to stay memory resident.
This maked HugePages pretty much standard for Oracle 11g systems running on Linux x86-64.

=====================
TLB and HugePages
=====================
TLB - Translation Lookaside Buffer hit ratio.
TLB is a memory cache that is used to reduce the time taken to access a user memory location
It is a part of the chip’s memory-management unit (MMU)
When using HugePages, the kernel has fewer pages whose lifecyle must be monitored.
It increases the TLB hit ratio, since operating system maintenance of page states is reduced. 

Oracle Documentation: Overview of HugePages




=====================
How see current settings
=====================
Current HugeSpaces usage:

New HugePages Metrics
AnonHugePagesThe number of pages that the newer Transparent Huge Page mechanism currently has in use.


Old HugePages mechanism metrics:
HugePages_Total - How many HugePages could be (MAX Value).
The HugePages_Free - How many pages are available for allocation.
The number of used HugePages is HugePages_Total–HugePagesFree.


>% grep -i HugePages /proc/meminfo
AnonHugePages:     81920 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

AnonHugePages - The AnonHugePages entry lists the number of pages that are currently has in use. In this example: 81920/2048=40 pages

Hugepagesize - The size of a single Huge Pages

Another option to see current HugePages usage:
>% less /proc/vmstat | grep trans
nr_anon_transparent_hugepages 40

Is HugePages enabled:
>% less /sys/kernel/mm/redhat_transparent_hugepage/enabled
always madvise [never]
always or madvise - used
never - not used


Linux Processes kswapd0 and khugepaged 
kswapd0
The process kswapd0 is the process than manages virtual memory. 
When you run low on memory, kswapd0 moves programs that are LRU to the SWAP.
This would causes extreme lag on those processes. 

khugepaged 
Huge page collapse operation.
It converts a set of normal-sized pages into a single huge page. 
It is desirable to have a range of addresses need fewer TLB entries, but the conversion process is expensive because the system needs to find a candidate set of pages to group together and then copy all the memory from the possibly scattered normal-sized pages into a single huge page. 
The khugepaged kernel thread searches for candidates pages to collapse into a single huge page. 
Even if khugepaged is not successful converting normal-sized pages into huge pages it may still be taking processor time to search for candidate pages. 
You can see if the khugepaged kernel thread is taking a significant amount of processor time with:
top -p `pidof khugepaged`

=====================
HugePages and SWAP 
=====================
When allocating space with HugePages enabled, and in case of a buggy application, which allocate memory, the server might run out of memory, including out of SWAP space.

Thursday, February 1, 2018

FLASHBACK and Recovery in Oracle

Flashback database is new from 10G.


Rather than having to restore the database, flashback database allows the database to be rewound very quickly to a previous point in time.

This feature is particularly useful for tests, and deployment.


Flashback database can be effectively run in two modes. 
One mode where you log every change to every block, which allows you to flahsback the database to any previous point in time. 
Another mode, where Oracle only tracks the "before" images of changed blocks, so that you can flashback to the start of the restore point.




======================================
TABLE
======================================

ALTER TABLE my_table ENABLE ROW MOVEMENT;

FLASHBACK TABLE my_table 
TO TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' minute);

FLASHBACK TABLE my_table 
TO TIMESTAMP TO_TIMESTAMP('20051201 09:30:00', 'YYYYMMDD hh24:mi:ss')

SELECT CURRENT_SCN FROM V$DATABASE;
FLASHBACK TABLE my_table 
TO SCN 123456;


======================================

DATABASE
======================================

How to Set-up Flashback Database
1. Ensure db_recovery_file_dest is set.

sqlplus '/ as sysdba'
SQL> alter system set db_recovery_file_dest='+<FRA Diskgroup>' SCOPE=spfile;

2. Ensure db_recovery_file_dest_size is set

SQL> alter system set db_recovery_file_dest_size=100G SCOPE=spfile;

3. Stop and start the database

sqlplus '/ as sysdba'
SQL> shutdown immediate;
SQL> startup mount;

If flashback to any previous point in time is required, then turn flashback on using the following command
SQL> alter database flashback on;
SQL> alter database open;
SQL> alter system set db_flashback_retention_target=2880;

NOTES
Set the db_recovery_file_dest to an appropriate location for the flashback recovery files.
Set the db_recovery_file_dest_size to an appropriate size for the amount and size of the testing required.
Set the db_flashback_retention_target to an appropriate time, in mins, to retain flashbackability.

Only run alter database flashback on; if there is a requirement to flashback to ANY previous point in time.

Determine if Flashback Database is Already Enabled
1. Run the following commands to determing Flashback is turned on.

sqlplus '/ as sysdba'
SQL> select flashback_on from v$database;

Creating and Using Flashback Restore points.
This worked example assumes the database is using ASM to manage its storage.

Creating a Restore point
Create a restore point whenever the database is at a state that it may needed to be flashed back to. 
Use the optional GUARANTEE FLASHBACK DATABASE clause to ensure that the restore point is not aged out of the flashback recovery area (FRA) as dictated by the db_flashback_retention_target parameter.

1. You may want to create the restore point in mount mode. If so, put the database into mount mode now.

2. Create a restore point

sqlplus '/ as sysdba'
SQL> create restore point <restore point name> [GUARANTEE FLASHBACK DATABASE];

Rolling Back to a Restore Point
1. Identify the Restore point

sqlplus '/ as sysdba'
SQL> select name, time,guarantee_flashback_databse from v$restore_point;

2. For a non RAC environment use the following commands to flashback to a restore point.

sqlplus '/ as sysdba'
SQL> shutdown immediate;
SQL> startup mount;
SQL> flashback database to restore point <restore point name>;
SQL> alter database open resetlogs;

3. For RAC instances use the following commands.

One one of the nodes run, srvctl stop database -d <database name> -o immediate
sqlplus '/ as sysdba'
SQL> startup mount;
SQL> flashback database to restore point <restore point name>;
SQL> alter database open resetlogs;
SQL> shutdown immediate;
SQL> quit
srvctl start database -d <database name>
Run crs_stat -t to confirm that the database is backup okay.

NOTES
Any tables created and updated without the LOGGING option will be suseptable to block curruption errors when the database is flashed back. These can be remedied by issuing the TRUNCATE TABLE command against the affected object(s).

Dropping a Restore Point
1. Restore points can be dropped with the database open using the following commands

sqlplus '/ as sysdba'
SQL> drop restore poijnt <restore point name>;


Monitoring Flashback Logging
After enabling flashback logging, Oracle keeps track of the amount of logging generated. This can be queried from v$flashback_database_log, the estimate gets better with age. 
Note that this is the size of the flashback logs only and does not include space used by archive logs and RMAN backups.

1. Monitor flashback logs

sqlplus '/ as sysdba'
SQL> select estimated_flashback_size/1024/1024/1024 "EST_FLASHBACK_SIZE(GB)" from v$flashback_database_log;

Finding the Earliest Flashback Point
Querying V$flashback_database_log will show you the earliest point you can flashback your database to based on the size of the FRA and the currently available flashback logs.

1. Find the earliest flashback point

sqlplus '/ as sysdba'
SQL> alter session set nls_date_format='dd/mm/yy hh24:mi:ss';
SQL> select oldest_flashback_scn,oldest_flashback_time from v$flashback_database_log;

Disabling Flashback Database
Full any previous point in time flashback can be disabled with the database open. Any unused Flashback logs will be automatically removed at this point and a message detailing the file deletion written to the alert log.

1. Disabling flashback

sqlplus '/ as sysdba'
SQL> ALTER DATABASE FLASHBACK OFF;

Troubleshooting
The following common errors can occur.

Message
ORA-38754: FLASHBACK DATABASE not started; required redo log is not available
Cause
Oracle needs to have the required archive logs in the archive destination at the time of flashback.

Solution
Use rman to restore the missing archive logs. The sequence and thread numbers are in the error message.

      rman target /
      run{
         allocate channel t1 device type 'sbt_tape';
         FLASHBACK DATABASE TO RESTORE POINT <restore point name>;
         }
Now resume the flashback process from the RESETLOGS step.