Pages

Monday, February 5, 2018

Killing Zombie Jobs in DBA_JOBS RUNNING

================================
General
================================
For unknown reason, jobs stopped from being run on a scheduler from USER_JOBS.
When checking for broken - it is in 'N' value
When checking application log, there is no error, the job seems to have been stuck in the middle of execution, and since the a new job was not submitted.

================================
What to Check
================================
DBA_JOBS
DBA_JOBS_RUNNING
ora_j0xx_oraigt Processes on Linux

Step1.
Checking DBA_JOBS_RUNNING - no job is related to an entry in V$SESSION.

job value is allways NULL.
Seems that these are "Zombie" jobs.

SELECT 'kill -9 '||proc.SPID AS "Linux Kill", 
       jobs.JOB as job,
       sess.sid||'.'||sess.serial# as sid_pid,
       sess.username 
 FROM V$SESSION sess, 
      V$PROCESS proc, 
      DBA_JOBS_RUNNING jobs
WHERE proc.addr = sess.paddr
  AND sess.type <> 'BACKGRGROUND'
  AND sess.logon_time < SYSDATE -1
  AND sess.username = 'MYS_CELCO_MOCOQ'
  AND sess.sid = jobs.sid(+)

Linux Kill          job sid_pid         username
------------- --------- --------------- ---------------
kill -9 22776 979.61739 MYS_CELCO_MOCOQ
kill -9 22765 790.2795 MYS_CELCO_MOCOQ
kill -9 4104 54.20611 MYS_CELCO_MOCOQ
kill -9 4096 30.27007 MYS_CELCO_MOCOQ
kill -9 4110 228.21157 MYS_CELCO_MOCOQ
kill -9 20859 810.58589 MYS_CELCO_MOCOQ
kill -9 4106 991.32265 MYS_CELCO_MOCOQ
kill -9 4118 98.38363 MYS_CELCO_MOCOQ
kill -9 4108 843.2089 MYS_CELCO_MOCOQ
kill -9 4092 106.45951 MYS_CELCO_MOCOQ
kill -9 1161 850.55871 MYS_CELCO_MOCOQ
kill -9 1169 212.38989 MYS_CELCO_MOCOQ
kill -9 30451 828.6607 MYS_CELCO_MOCOQ
kill -9 4100 178.61219 MYS_CELCO_MOCOQ
kill -9 30790 215.36845 MYS_CELCO_MOCOQ
kill -9 4090 195.21231 MYS_CELCO_MOCOQ
kill -9 30801 838.15399 MYS_CELCO_MOCOQ
kill -9 4098 959.63365 MYS_CELCO_MOCOQ
kill -9 30796 899.31015 MYS_CELCO_MOCOQ
kill -9 4114 817.4443 MYS_CELCO_MOCOQ
kill -9 4088 771.53083 MYS_CELCO_MOCOQ
kill -9 1165 148.39835 MYS_CELCO_MOCOQ
kill -9 22771 849.23091 MYS_CELCO_MOCOQ
kill -9 6087 138.29993 MYS_CELCO_MOCOQ
kill -9 30793 230.13119 MYS_CELCO_MOCOQ
kill -9 30798 199.4211 MYS_CELCO_MOCOQ
kill -9 4094 804.13365 MYS_CELCO_MOCOQ
kill -9 4112 963.23979 MYS_CELCO_MOCOQ
kill -9 4085 126.8799 MYS_CELCO_MOCOQ
kill -9 1145 977.23831 MYS_CELCO_MOCOQ


Step 2.
Checking How long ago these jobs were run.
The check was done at 05/02/2018

SELECT DBA_JOBS_RUNNING.sid,
       DBA_JOBS_RUNNING.job,
       DBA_JOBS_RUNNING.last_date,
       ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run
  FROM DBA_JOBS_RUNNING, DBA_JOBS
WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB
ORDER BY DBA_JOBS.SCHEMA_USER;

sid     job     last_date           days_since_last_run
------- ------- ------------------- -------------------
890 281 28/10/2017 00:00:13             101
971 283 27/10/2017 23:50:13             101            
198 284 27/10/2017 21:00:48                        101
787 282 28/10/2017 00:00:38             101
186 1382
953 1261
224 41 28/10/2017 00:00:23             101
226 43 27/10/2017 23:50:06             101
100 42 28/10/2017 00:00:43             101
124 44 27/10/2017 21:01:22             101
188 1361
941 123 27/10/2017 23:00:18             101
888 121 28/10/2017 00:01:48             101
39 243 27/10/2017 00:00:04             102
135 2585 27/10/2017 23:01:12             101
944 461 28/10/2017 00:01:54             101
924 1044
806 1043
837 1042
882 1041
207 1221
803 1241
156 85 28/10/2017 00:00:05             101
909 86 28/10/2017 00:00:31             101
812 88 27/10/2017 21:03:15             101
153 64 27/10/2017 21:03:01             101
945 62 28/10/2017 00:04:48             101
...
...

SELECT COUNT(*) FROM DBA_JOBS_RUNNING;
18

Step 3.
On Linux,
There are many ora_j0xx_igt processes.
oracle@my_server:~/scripts>% ps -ef | grep j0 | wc -l
36


ps -ef | grep j0
oracle   12331     1  0 12:46 ?        00:00:00 ora_j030_igt
oracle   12511     1  0 12:46 ?        00:00:00 ora_j031_igt
oracle   12519     1  0 12:46 ?        00:00:00 ora_j032_igt
oracle   12540     1  0 12:46 ?        00:00:00 ora_j033_igt
oracle   12562     1  0 12:46 ?        00:00:00 ora_j034_igt
oracle   12590     1  0 12:46 ?        00:00:00 ora_j039_igt
oracle   12672     1  0 12:46 ?        00:00:00 ora_j040_igt
oracle   12676     1  0 12:46 ?        00:00:00 ora_j042_igt
oracle   16114     1  0 12:49 ?        00:00:00 ora_j005_igt
oracle   18766     1  0 12:25 ?        00:00:00 ora_j013_igt
oracle   19570     1  0 12:25 ?        00:00:00 ora_j011_igt
oracle   19572     1  0 12:25 ?        00:00:00 ora_j014_igt
oracle   20697     1  0 12:26 ?        00:00:00 ora_j016_igt
oracle   20700     1  0 12:26 ?        00:00:00 ora_j022_igt
oracle   20702     1  0 12:26 ?        00:00:00 ora_j023_igt
oracle   20705     1  0 12:26 ?        00:00:00 ora_j026_igt
oracle   23072     1  0 12:55 ?        00:00:00 ora_j007_igt
oracle   27217     1  0 12:59 ?        00:00:00 ora_j003_igt
oracle   27938     1  0  2017 ?        00:00:00 ora_j010_igt
oracle   29689     1  0 12:07 ?        00:00:00 ora_j024_igt
oracle   29691     1  0 12:07 ?        00:00:00 ora_j025_igt
oracle   29829     1  0  2017 ?        00:03:10 ora_j015_igt
oracle   32684     1  0 13:03 ?        00:00:00 ora_j021_igt
oracle   32699     1  0 13:03 ?        00:00:00 ora_j017_igt


ps -ef | grep j0 | awk '{print "kill -9 "$2}'
Execute the generated commands:

kill -9 5285
kill -9 5291
kill -9 5469
kill -9 9288
kill -9 9290
kill -9 9294
kill -9 12590
kill -9 14892
kill -9 16114
kill -9 18766
etc...

SELECT DBA_JOBS_RUNNING.sid,
       DBA_JOBS_RUNNING.job,
       DBA_JOBS_RUNNING.last_date,
       ROUND(SYSDATE-DBA_JOBS_RUNNING.last_date) as days_since_last_run
  FROM DBA_JOBS_RUNNING, DBA_JOBS
WHERE dba_jobs.job = DBA_JOBS_RUNNING.JOB
ORDER BY DBA_JOBS.SCHEMA_USER


Step 4.
Most of the jobs should have been cleared by kill -9 command.
This might take some time to reflect in the Database.

Step 5.
If there are still jobs remaining: via DBMS_JOBS remove and recreate it.

No comments:

Post a Comment