Oracle DBA by Example. Code samples, Scripts, Reference.: Linux CPU Usage

==============================
Commands to get top CPU usage
==============================
ps
top
/proc filesystem
sar

pidstat report single process CPU usage

iostat reports CPU, disk I/O, and NFS statistics.
vmstat reports virtual memory statistics.
mpstat reports processors statictics.

URL Examples for iostat, vmstat, mpstat

==============================
Get number of CPUs
==============================
less /proc/cpuinfo | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7
processor : 8
processor : 9
processor : 10

processor : 11

==============================
ps
==============================

In ps output CPU usage is represented as percent.
per Linux man page:
CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.

Generally process can be in one of these two states: Using CPU or Not Using CPU (waiting, sleeping)
Linux is dividing the CPU time of the process by the entire process time thus giving actually the process Average CPU percent usage.

ps -eo pcpu,pid,user,args --sort pcpu | sort -nr | head -10
or
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10 %CPU PID USER COMMAND
98.1 7991 iu vip_monitor --PROCESSNAME=MEX-MOVSP-99_MON_01 --EVENTNAME=MEX-MOVSP-99_MON_01_XEVT --CONFIG=VIP.INI NOT_CREATE_PROCESS
12.8 17084 oracle oracleigt (LOCAL=NO)
12.2 17063 oracle oracleigt (LOCAL=NO)
11.6 17049 oracle oracleigt (LOCAL=NO)
11.1 9490 oracle oracleigt (LOCAL=NO)
11.1 9488 oracle oracleigt (LOCAL=NO)
11.1 17110 oracle oracleigt (LOCAL=NO)
11.0 9494 oracle oracleigt (LOCAL=NO)
11.0 9492 oracle oracleigt (LOCAL=NO)
10.9 9496 oracle oracleigt (LOCAL=NO)
10.7 17072 oracle oracleigt (LOCAL=NO)

or

ps axo pid,etime,%cpu,%mem,cmd,args | sort -rn -k3 | head -10
23590 06:57 1.1 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
6056 01:29 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24489 06:52 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24487 06:52 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24485 06:52 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24479 06:52 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24477 06:52 0.9 0.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)
24543 164-22:10:46 0.4 6.0 ora_dbw1_igt ora_dbw1_igt
24531 164-22:10:46 0.4 0.2 ora_dia0_igt ora_dia0_igt
9123 00:23 0.3 0.0 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

==============================

top

==============================
top - sort by memory
f -> select column to display
Shift+f -> select column to sort by

top: would provide information for the whole system
top -1: would provide information at the core level

top

Tasks: 511 total, 2 running, 509 sleeping, 0 stopped, 0 zombie

Cpu(s): 2.0%us, 5.7%sy, 0.0%ni, 80.6%id, 11.7%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 20544592k total, 20505760k used, 38832k free, 21680k buffers

Swap: 10485752k total, 1009580k used, 9476172k free, 7699116k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

7991 iu 25 0 104m 44m 10m R 100.0 0.2 959175:43 vip_monitor

1492 iu 22 0 7317m 3.8g 14m S 4.0 19.2 8780:06 java

23021 oracle 18 0 4443m 852m 714m D 3.3 4.2 7:39.41 oracle

650 root 10 -5 0 0 0 S 0.3 0.0 139:25.90 kswapd0

3595 root 10 -5 0 0 0 S 0.3 0.0 110:34.73 kjournald

5015 root 15 0 173m 9832 2648 S 0.3 0.0 88:54.38 snmpd

5674 root 15 0 2384 1016 708 S 0.3 0.0 84:48.11 cmasm2d

13512 root 15 0 13024 1440 828 R 0.3 0.0 0:00.05 top

17270 oracle 16 0 4316m 2.1g 2.0g D 0.3 10.5 1193:53 oracle

17274 oracle 16 0 4318m 2.1g 2.0g D 0.3 10.5 1211:16 oracle

1 root 15 0 10368 632 536 S 0.0 0.0 51:17.49 init

2 root RT -5 0 0 0 S 0.0 0.0 15:21.84 migration/0

3 root 34 19 0 0 0 S 0.0 0.0 0:24.12 ksoftirqd/0

4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0

5 root RT -5 0 0 0 S 0.0 0.0 11:48.20 migration/1

6 root 34 19 0 0 0 S 0.0 0.0 0:08.78 ksoftirqd/1

7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1

8 root RT -5 0 0 0 S 0.0 0.0 13:28.13 migration/2

9 root 34 19 0 0 0 S 0.0 0.0 0:31.18 ksoftirqd/2

10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2

How can %cpu of a single process be more than 100% ?

Tasks: 510 total, 28 running, 482 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.3%us, 5.8%sy, 0.0%ni, 84.7%id, 2.2%wa, 0.0%hi, 0.0%si, 0.0%st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1492 iu 22 0 12.5g 4.2g 14m S 109.0 21.6 29469:56 java
11536 oracle 16 0 4302m 1.1g 1.1g R 63.3 5.7 3:15.06 oracle
11600 oracle 16 0 4303m 1.1g 1.1g R 63.3 5.8 2:14.94 oracle
11581 oracle 16 0 4302m 1.1g 1.1g R 51.0 5.6 2:06.25 oracle
11439 oracle 16 0 4303m 1.2g 1.1g R 47.5 5.9 10:54.44 oracle

%CPU -- CPU Usage :

CPU data is displayed in percentage.
CPU time is divided between process various states:

CPU info is displayed in this line:

Cpu(s): 7.3%us, 5.8%sy, 0.0%ni, 84.7%id, 2.2%wa, 0.0%hi, 0.0%si, 0.0%st

%us - User Space:
This info is for any process that doesn't belong to the kernel.

%sy - System.
This is the percent of time that the CPU spent running the kernel.
There a system CPU usage when a user space process needs something from the system. For example memory allocations, perform I/O, or when creating child process.

%ni - nice. The percent of time that the CPU spent running user space processes that have been niced.

%id - idle

%wa - wait. Idle while waiting for an I/O operation to complete.

%hi - hardware interrupts.

%si - software interrupts.

hi and si: How much time the processor has spent servicing interrupts. Hardware interrupts are physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt will actually cause the CPU to stop what it is doing and go handle the interrupt. A software interrupt doesn't occur at the CPU level, but rather at the kernel level.

%st - Applies to virtual machines. When Linux is running as a virtual machine on a hypervisor, the st (short for stolen) statistic shows how long the virtual CPU has spent waiting for the hypervisor to service another virtual CPU.

How can cpu% be more than 100% ?
Consider following top output
1492 iu 22 0 12.5g 4.2g 14m S 109.0 21.6 29469:56 java
By default, top displays this as a percentage of a single CPU.
On multi-core systems, you can have percentages that are greater than 100%.

For example, if 2 cores are at 60% use, top will show a CPU use of 120%.

==============================
/proc filesystem
==============================
The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures.
It is commonly mounted at /proc.

Reference
==============================
/proc/stat
==============================
less /proc/stat
cpu# user nice system idle iowait irq softirq
---- ----------- ------ ---------- ------------ ---------- ------ --------
cpu 8681569546 20781 7500186849 107708594984 2544771039 3432741 26310998 0
cpu0 359643227 164 214697909 7299502792 29611065 0 600384 0
cpu1 321834812 163 235978841 7256042542 88853474 27294 1318276 0
cpu2 405713055 6237 225353119 6707407683 559505025 652466 5417931 0
cpu3 305060789 206 241572665 7322828222 33928786 32563 632171 0
cpu4 383078927 4489 222956973 6934212094 359472763 658643 3671609 0
cpu5 315596685 252 265718076 7301472241 20771256 28023 468875 0
cpu6 377777117 498 230329707 7124598350 169031743 364325 1953733 0
cpu7 298895376 223 261328034 7326041683 17090250 312030 387802 0
cpu8 592757961 450 325305289 6957671678 27858570 1 461516 0
cpu9 442884180 123 378368398 6994452907 87811179 8977 529606 0
cpu10 624129254 3624 371815238 6324644659 582304175 25986 1132519 0
cpu11 552014661 881 499794403 6803694480 46959840 124518 1466579 0
cpu12 764377170 2026 461568339 6335395961 341717633 66748 927570 0
cpu13 706913799 345 759805737 6412366366 22464769 549796 1954558 0
cpu14 1157943414 849 1148640534 5449024929 145227184 318891 2899630 0
cpu15 1072949110 245 1656953579 5159238389 12163321 262472 2488231 0

intr 26803333079 1744487151 85 0 3 283 2 0 0 1 0 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1497978303 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1525495804 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1351727636 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1327471158 0 0 0 0 0 0 0 1990263721 0 0 0 0 0 0 0 1847687028 0 0 0 0 0 0 0 1788738681 0 0 0 0 0 0 0 1695860299 0 0 0 0 0 0 0 1717259594 0 0 0 0 0 0 0 1726428623 0 0 0 0
ctxt 696305326360
btime 1378880020
processes 1672102393
procs_running 4
procs_blocked 0

less /proc/stat | grep cpu | awk '{print $1 " usage= " ($2+$4)*100/($2+$4+$5)"%"}'

cpu usage= 13.0613%
cpu0 usage= 7.29428%
cpu1 usage= 7.13873%
cpu2 usage= 8.59942%
cpu3 usage= 6.94624%
cpu4 usage= 8.03734%
cpu5 usage= 7.37443%
cpu6 usage= 7.86408%
cpu7 usage= 7.10376%
cpu8 usage= 11.6569%
cpu9 usage= 10.5077%
cpu10 usage= 13.6046%
cpu11 usage= 13.3894%
cpu12 usage= 16.2133%
cpu13 usage= 18.6153%
cpu14 usage= 29.7408%
cpu15 usage= 34.6033%

less /proc/stat | grep cpu | awk '{print $1 " idle%= " ($5*100)/($2+$3+$4+$5+$6+$7+$8)"%"}'

cpu idle%= 85.1687%

cpu0 idle%= 92.3513%

cpu1 idle%= 91.8013%

cpu2 idle%= 84.8603%

cpu3 idle%= 92.6464%

cpu4 idle%= 87.7298%

cpu5 idle%= 92.3761%

cpu6 idle%= 90.1385%

cpu7 idle%= 92.6871%

cpu8 idle%= 88.0265%

cpu9 idle%= 88.4917%

cpu10 idle%= 80.0176%

cpu11 idle%= 86.0783%

cpu12 idle%= 80.1533%

cpu13 idle%= 81.1274%

cpu14 idle%= 68.9396%

cpu15 idle%= 65.2732%

The first line "cpu" aggregates the numbers in all of the other "cpuN" lines.

Each cpuN line is for a different CPU processor.

The meanings of the cpu columns:

1st column : user = normal processes executing in user mode

2nd column : nice = niced processes executing in user mode

3rd column : system = processes executing in kernel mode

4th column : idle = idle

5th column : iowait = waiting for I/O to complete

6th column : irq = servicing interrupts

7th column : softirq = servicing softirqs

These numbers identify the amount of time the CPU has spent performing different kinds of work. Time units are in USER_HZ or Jiffies (typically hundredths of a second).

These numbers are aggregation since system startup.

To get the average idle percentage X % = ( idle * 100 ) / ( user + nice + system + idle + iowait + irq + softirq )

The other numbers are not interesting.

==============================

Get CPU processor(s) details.
==============================

less /proc/cpuinfo

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 44

model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

stepping : 2

cpu MHz : 2399.384

cache size : 12288 KB

physical id : 0

siblings : 8

core id : 0

cpu cores : 4

apicid : 0

fpu : yes

fpu_exception : yes

cpuid level : 11

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse

sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx

16 xtpr sse4_1 sse4_2 popcnt lahf_lm

bogomips : 4798.76

clflush size : 64

cache_alignment : 64

address sizes : 40 bits physical, 48 bits virtual

power management: [8]

less /proc/cpuinfo | grep processor

processor : 0

processor : 1

processor : 2

processor : 3

processor : 4

processor : 5

processor : 6

processor : 7

processor : 8

processor : 9

processor : 10

processor : 11

processor : 12

processor : 13

processor : 14

processor : 15

==============================
sar
==============================
sar logs are written to: /var/log/sa/ folder
These are binary files, that can be viewed by sar command.

To see last day data:
sar

To see 04 day data:
sar -f /var/log/sa/sa04

Example:
>% cd /var/log/sa/
>% ls -ltr
-rw-r--r-- 1 root root 875204 Apr 26 23:53 sar26
-rw-r--r-- 1 root root 894192 Apr 27 23:50 sa27
-rw-r--r-- 1 root root 875204 Apr 27 23:53 sar27
-rw-r--r-- 1 root root 894192 Apr 28 23:50 sa28
-rw-r--r-- 1 root root 875206 Apr 28 23:53 sar28
-rw-r--r-- 1 root root 894192 Apr 29 23:50 sa29
-rw-r--r-- 1 root root 875206 Apr 29 23:53 sar29
-rw-r--r-- 1 root root 894192 Apr 30 23:50 sa30
-rw-r--r-- 1 root root 875206 Apr 30 23:53 sar30
-rw-r--r-- 1 root root 894192 May 1 23:50 sa01
-rw-r--r-- 1 root root 875206 May 1 23:53 sar01
-rw-r--r-- 1 root root 894192 May 2 23:50 sa02
-rw-r--r-- 1 root root 875215 May 2 23:53 sar02
-rw-r--r-- 1 root root 894192 May 3 23:50 sa03
-rw-r--r-- 1 root root 875205 May 3 23:53 sar03
-rw-r--r-- 1 root root 894192 May 4 23:50 sa04
-rw-r--r-- 1 root root 875206 May 4 23:53 sar04
-rw-r--r-- 1 root root 292016 May 5 07:40 sa05

>% sar
Linux 2.6.18-274.el5 (my_server) 05/05/2016
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 21.21 0.00 3.18 5.98 0.00 69.63
12:20:01 AM all 16.86 0.00 2.68 6.24 0.00 74.21
12:30:01 AM all 22.11 0.00 3.72 5.79 0.00 68.38
12:40:01 AM all 17.10 0.00 2.61 6.17 0.00 74.12
12:50:01 AM all 21.81 0.00 3.50 6.35 0.00 68.35
01:00:01 AM all 15.50 0.00 2.57 6.24 0.00 75.69
01:10:01 AM all 19.68 0.00 2.94 6.01 0.00 71.38
01:20:02 AM all 17.21 0.00 2.59 6.09 0.00 74.11

sar -f sa04
Linux 2.6.18-274.el5 (my_server) 05/04/2016
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 15.80 0.00 2.16 6.25 0.00 75.80
12:20:01 AM all 14.11 0.00 2.14 6.29 0.00 77.46
12:30:01 AM all 15.91 0.00 2.61 6.33 0.00 75.15
12:40:01 AM all 15.67 0.00 2.27 6.24 0.00 75.82
12:50:01 AM all 16.44 0.00 2.27 6.28 0.00 75.01
01:00:01 AM all 15.51 0.00 2.28 6.45 0.00 75.75
01:10:01 AM all 16.21 0.00 2.33 6.16 0.00 75.30

CPU usage per core
>% sar -P ALL | more

>% sar -f /var/log/sa/sa17 -P ALL

Linux 2.6.18-274.el5 (my_server) 05/05/2016

12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 21.21 0.00 3.18 5.98 0.00 69.63
12:10:01 AM 0 15.18 0.00 4.38 0.19 0.00 80.25
12:10:01 AM 1 16.67 0.00 5.33 29.82 0.00 48.17
12:10:01 AM 2 24.79 0.00 3.87 0.15 0.00 71.20
12:10:01 AM 3 12.03 0.00 3.45 0.98 0.00 83.55
12:10:01 AM 4 16.14 0.00 3.55 0.35 0.00 79.97
12:10:01 AM 5 14.43 0.00 3.59 0.96 0.00 81.02
12:10:01 AM 6 13.43 0.00 2.98 0.26 0.00 83.33
12:10:01 AM 7 34.18 0.00 4.01 0.70 0.00 61.10
12:10:01 AM 8 35.98 0.00 1.44 0.11 0.00 62.46
12:10:01 AM 9 16.99 0.00 2.34 58.52 0.00 22.15
12:10:01 AM 10 22.20 0.00 2.20 0.31 0.00 75.29
12:10:01 AM 11 34.14 0.00 2.44 0.91 0.00 62.51
12:10:01 AM 12 19.57 0.00 2.24 0.26 0.00 77.93
12:10:01 AM 13 25.10 0.00 2.89 0.64 0.00 71.37
12:10:01 AM 14 19.29 0.00 2.30 0.29 0.00 78.12
12:10:01 AM 15 19.29 0.00 3.81 1.23 0.00 75.67
12:20:01 AM all 16.86 0.00 2.68 6.24 0.00 74.21
12:20:01 AM 0 12.43 0.00 4.36 0.08 0.00 83.13
12:20:01 AM 1 10.29 0.00 4.85 23.72 0.00 61.13
12:20:01 AM 2 13.89 0.00 3.49 3.13 0.00 79.49
12:20:01 AM 3 8.23 0.00 2.95 0.51 0.00 88.31
12:20:01 AM 4 12.44 0.00 3.83 3.10 0.00 80.62
12:20:01 AM 5 8.45 0.00 2.92 0.73 0.00 87.90
12:20:01 AM 6 10.47 0.00 2.87 0.20 0.00 86.47
12:20:01 AM 7 4.65 0.00 1.77 0.93 0.00 92.66
12:20:01 AM 8 10.15 0.00 1.40 0.07 0.00 88.38
12:20:01 AM 9 7.75 0.00 1.76 65.39 0.00 25.10
12:20:01 AM 10 12.92 0.00 1.76 0.33 0.00 84.99
12:20:01 AM 11 46.04 0.00 1.56 0.42 0.00 51.98
12:20:01 AM 12 13.37 0.00 1.67 0.24 0.00 84.73
12:20:01 AM 13 20.81 0.00 2.56 0.62 0.00 76.01
12:20:01 AM 14 20.57 0.00 4.11 0.20 0.00 75.12
12:20:01 AM 15 57.35 0.00 1.10 0.23 0.00 41.32

12:30:01 AM all 22.11 0.00 3.72 5.79 0.00 68.38

CPU usage for specific core.
In this example: sample for core 9, every 3 seconds, for 4 intervals.
>% sar -P 9 3 4 | more
Linux 2.6.18-274.el5 (my_server) 05/05/2016

08:32:02 AM CPU %user %nice %system %iowait %steal %idle
08:32:05 AM 9 0.33 0.00 0.00 0.00 0.00 99.67
08:32:08 AM 9 0.00 0.00 0.00 0.00 0.00 100.00
08:32:11 AM 9 0.00 0.00 0.33 0.00 0.00 99.67
08:32:14 AM 9 0.33 0.00 0.00 0.00 0.00 99.67
Average: 9 0.17 0.00 0.08 0.00 0.00 99.75

==============================
vmstat
==============================

Another option, not very detailed.

Example:

>% vmstat 2 6

procs -----------memory------------ ---swap-- -----io---- --system--- -----cpu------

r b swpd free buff cache si so bi bo in cs us sy id wa st

2 1 783628 182264 121292 12727872 0 0 54 74 0 0 7 6 85 2 0

32 1 783628 190512 121372 12728092 0 0 80 1736 1231 194318 11 4 78 7 0

45 4 783628 191120 121380 12728304 0 0 64 1944 1253 189737 16 1 77 6 0

76 1 783628 191500 121420 12728572 0 0 56 1764 1215 190488 8 1 85 7 0

33 1 783628 192172 121436 12728800 0 0 80 1578 1192 193625 7 1 85 7 0

95 2 783628 191472 121456 12728488 0 0 156 2280 1185 194941 29 3 62 5 0

vmstat 2 6

Means run at 2 seconds interval, 6 times.

The first line, is a result that is averages since the last reboot.
Subsequent lines display average for the sampling period of 2 seconds.

Output.

The output of CPU usage in in percent:
All of the CPU metrics are expressed as percentages.
So, all of the CPU values (us + sy + id + wa+st) will always sum to 100.

us: Used %

sy: System %

id: Idle %

wa: Wait %. Wait for I/O and other external resources.

st: Steal Time. Only related for VMs.
The percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.

==============================
pidstat

==============================
This command supposed to log process activity.
With -u flag, it will log in continues manner.
But looking into the data, it is on single CPU level, and not aggregated CPU, as in top.

pidstat | sort -ur -k4 | head -10

05:03:22 PM PID %usr %system %guest %CPU CPU Command
Linux 2.6.32-504.8.1.el6.x86_64 (deu-via-5-aps-10) 11/06/2016 _x86_64_ (12 CPU)
05:03:22 PM 13157 0.06 0.02 0.00 0.07 6 MountAgent
05:03:22 PM 10032 0.05 0.04 0.00 0.09 0 cmaperfd
05:03:22 PM 13153 0.05 0.02 0.00 0.07 6 DiskGroupAgent
05:03:22 PM 13151 0.05 0.02 0.00 0.07 6 ApplicationAgen
05:03:22 PM 13163 0.05 0.02 0.00 0.07 1 VolumeAgent
05:03:22 PM 13155 0.05 0.02 0.00 0.07 1 IPAgent
05:03:22 PM 13162 0.05 0.02 0.00 0.07 1 HostMonitor
05:03:22 PM 13159 0.05 0.02 0.00 0.07 0 NICAgent

top - 17:03:49 up 276 days, 5:14, 3 users, load average: 0.12, 0.16, 0.14
Tasks: 876 total, 1 running, 874 sleeping, 1 stopped, 0 zombie
Cpu(s): 5.5%us, 0.6%sy, 0.0%ni, 93.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32622384k total, 32039952k used, 582432k free, 62948k buffers
Swap: 10485756k total, 47584k used, 10438172k free, 28074096k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6514 oracle 20 0 16.3g 8.4g 8.4g S 9.7 27.1 16:27.29 oracle
8160 oracle 20 0 16.3g 1.2g 1.2g S 7.8 3.8 48:46.99 oracle
82 root 20 0 0 0 0 S 1.9 0.0 393:05.88 kblockd/0
6500 oracle -2 0 16.3g 16m 14m S 1.9 0.1 19:08.42 oracle
13153 root 20 0 15956 3080 1832 S 1.9 0.0 285:33.33 DiskGroupAgent
19553 root 20 0 15708 1780 836 R 1.9 0.0 0:00.01 top
22461 oracle 20 0 16.3g 5.3g 5.3g S 1.9 17.0 1:58.89 oracle

1 root 20 0 19356 1060 852 S 0.0 0.0 2:25.99 init

In the example below, it will log pidstat output every 10 minutes
pidstat -u 600 >/var/log/pidstats.log & disown $!

==============================
Script to get ongoing CPU usage
==============================
Make the script run every 5 minutes

crontab -e
*/5 * * * * /home/alec/workarea/DB/get_cpu_usage.sh
or
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /home/alec/workarea/DB/get_cpu_usage.sh

less get_cpu_usage.sh

#!/bin/bash

export HOME_DIR=/starhome/iu/scripts
export TOP_LOG=$HOME_DIR/top_trace.log
export PROCESSES_LOG=$HOME_DIR/sorted_processes.log
export VMSTAT_LOG=$HOME_DIR/vmstat.log
export line_delimiter='================================='
export SLEEP_SEC=10

touch $TOP_LOG
touch $PROCESSES_LOG
touch $VMSTAT_LOG
vmstat 2 2 | head -2 > $VMSTAT_LOG

while [ 1 -eq 1 ] ; do
echo $line_delimiter >> $TOP_LOG
date >> $TOP_LOG
echo $line_delimiter >> $TOP_LOG
top -b | head -20 | grep -v Swap | grep -v Mem | grep -v ^$ | grep -v "load average" >> $TOP_LOG

echo $line_delimiter >> $PROCESSES_LOG
date >> $PROCESSES_LOG
echo $line_delimiter >> $PROCESSES_LOG
ps axo pid,etime,%cpu,%mem,cmd,args | sort -rn -k3 | head -10 >> $PROCESSES_LOG

echo $line_delimiter >> $VMSTAT_LOG
date >> $VMSTAT_LOG
echo $line_delimiter >> $VMSTAT_LOG
vmstat 2 2 | tail -1 >> $VMSTAT_LOG

#ps -eo pid,ppid,rss,size,vsize,pcpu,pmem,cmd -ww --sort=vsize >> sorted_processes.log

sleep $SLEEP_SEC

#------------------------------------------
#prevent the log files from growing forever
#------------------------------------------
mv -f `find $HOME_DIR -name "sorted_processes.log" -size +1000000k` $HOME_DIR/sorted_processes.log_bak 2> /dev/null
mv -f `find $HOME_DIR -name "top_trace.log" -size +1000000k` $HOME_DIR/top_trace.log_bak 2> /dev/null
mv -f `find $HOME_DIR -name "vmstat.log" -size +1000000k` $HOME_DIR/vmstat.log_bak 2> /dev/null

done

Sample output:

top_trace.log

=================================

Mon Apr 25 07:35:04 CDT 2016

=================================

Tasks: 551 total, 1 running, 550 sleeping, 0 stopped, 0 zombie

Cpu(s): 7.3%us, 5.8%sy, 0.0%ni, 84.7%id, 2.1%wa, 0.0%hi, 0.0%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1492 iu 22 0 12.1g 4.2g 14m S 17.2 21.5 28455:25 java

17274 oracle 18 0 4316m 2.0g 2.0g D 3.8 10.4 1486:48 oracle

1490 iu 15 0 76968 1452 980 S 1.9 0.0 117:20.61 wrapper

15234 iu 15 0 13024 1348 724 R 1.9 0.0 0:00.02 top

17270 oracle 17 0 4320m 2.0g 2.0g D 1.9 10.4 1475:24 oracle

1 root 15 0 10368 572 536 S 0.0 0.0 51:56.89 init

2 root RT -5 0 0 0 S 0.0 0.0 18:03.14 migration/0

3 root 34 19 0 0 0 S 0.0 0.0 0:25.14 ksoftirqd/0

4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0

5 root RT -5 0 0 0 S 0.0 0.0 12:50.20 migration/1

6 root 34 19 0 0 0 S 0.0 0.0 0:10.18 ksoftirqd/1

7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1

8 root RT -5 0 0 0 S 0.0 0.0 15:52.97 migration/2

sorted_processes.log

=================================

Mon Apr 25 07:35:05 CDT 2016

=================================

21565 3-04:46:25 49.3 7.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

9799 2-21:42:45 41.0 9.2 ora_j000_igt ora_j000_igt

1492 136-23:46:08 14.4 21.5 /software/java/x64/jdk1.7.0 /software/java/x64/jdk1.7.0_13/bin/java -Dcom.sun.management.jmxremote.port=1098 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dstarhome.file=activemq.xml -XX:NewSize=200m -XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -DserviceName=sparx -Xms4096m -Xmx4096m

7340 10-05:03:12 9.2 8.0 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

10393 8-23:34:59 8.3 8.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

22114 16-11:36:18 6.5 8.5 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

22086 16-11:36:18 6.3 8.3 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

1356 14-04:42:59 6.2 8.3 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

13442 01:29 6.2 1.0 ora_p001_igt ora_p001_igt

22118 16-11:36:18 6.1 8.6 oracleigt (LOCAL=NO) oracleigt (LOCAL=NO)

Reference

Understanding linuxs cpu stats

Oracle DBA by Example. Code samples, Scripts, Reference.

Pages

Wednesday, April 27, 2016

Linux CPU Usage

No comments:

Post a Comment