Pages

Wednesday, April 27, 2016

Linux CPU Usage

==============================
Commands to get top CPU usage
==============================
ps
top
/proc filesystem
sar

pidstat   report single process CPU usage

iostat     reports CPU, disk I/O, and NFS statistics.
vmstat   reports virtual memory statistics.
mpstat  reports processors statictics.

URL Examples for iostat, vmstat, mpstat

==============================
Get number of CPUs
==============================
less /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10

processor       : 11


==============================
ps
==============================

In ps output CPU usage is represented as percent.
per Linux man page:
CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.

Generally process can be in one of these two states: Using CPU or Not Using CPU (waiting, sleeping)
Linux is dividing the CPU time of the process by the entire process time thus giving actually the process Average CPU percent usage.

ps -eo pcpu,pid,user,args --sort pcpu | sort -nr | head -10
or
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10 %CPU   PID USER     COMMAND
98.1  7991 iu       vip_monitor --PROCESSNAME=MEX-MOVSP-99_MON_01 --EVENTNAME=MEX-MOVSP-99_MON_01_XEVT --CONFIG=VIP.INI NOT_CREATE_PROCESS
12.8 17084 oracle   oracleigt (LOCAL=NO)
12.2 17063 oracle   oracleigt (LOCAL=NO)
11.6 17049 oracle   oracleigt (LOCAL=NO)
11.1  9490 oracle   oracleigt (LOCAL=NO)
11.1  9488 oracle   oracleigt (LOCAL=NO)
11.1 17110 oracle   oracleigt (LOCAL=NO)
11.0  9494 oracle   oracleigt (LOCAL=NO)
11.0  9492 oracle   oracleigt (LOCAL=NO)
10.9  9496 oracle   oracleigt (LOCAL=NO)
10.7 17072 oracle   oracleigt (LOCAL=NO)

or

ps axo pid,etime,%cpu,%mem,cmd,args | sort -rn -k3 | head -10
23590       06:57  1.1  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
 6056       01:29  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24489       06:52  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24487       06:52  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24485       06:52  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24479       06:52  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24477       06:52  0.9  0.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
24543 164-22:10:46 0.4  6.0 ora_dbw1_igt                ora_dbw1_igt
24531 164-22:10:46 0.4  0.2 ora_dia0_igt                ora_dia0_igt
 9123       00:23  0.3  0.0 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)

==============================
top
==============================
top - sort by memory
f -> select column to display
Shift+f -> select column to sort by

top: would provide information for the whole system
top -1: would provide information at the core level

top
Tasks: 511 total,   2 running, 509 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.0%us,  5.7%sy,  0.0%ni, 80.6%id, 11.7%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  20544592k total, 20505760k used,    38832k free,    21680k buffers
Swap: 10485752k total,  1009580k used,  9476172k free,  7699116k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7991 iu        25   0  104m  44m  10m R 100.0  0.2 959175:43 vip_monitor
 1492 iu        22   0 7317m 3.8g  14m S  4.0 19.2   8780:06 java
23021 oracle    18   0 4443m 852m 714m D  3.3  4.2   7:39.41 oracle
  650 root      10  -5     0    0    0 S  0.3  0.0 139:25.90 kswapd0
 3595 root      10  -5     0    0    0 S  0.3  0.0 110:34.73 kjournald
 5015 root      15   0  173m 9832 2648 S  0.3  0.0  88:54.38 snmpd
 5674 root      15   0  2384 1016  708 S  0.3  0.0  84:48.11 cmasm2d
13512 root      15   0 13024 1440  828 R  0.3  0.0   0:00.05 top
17270 oracle    16   0 4316m 2.1g 2.0g D  0.3 10.5   1193:53 oracle
17274 oracle    16   0 4318m 2.1g 2.0g D  0.3 10.5   1211:16 oracle
    1 root      15   0 10368  632  536 S  0.0  0.0  51:17.49 init
    2 root      RT  -5     0    0    0 S  0.0  0.0  15:21.84 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:24.12 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0  11:48.20 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:08.78 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0  13:28.13 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:31.18 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2

How can %cpu of a single process be more than 100% ?

Tasks: 510 total,  28 running, 482 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.3%us,  5.8%sy,  0.0%ni, 84.7%id,  2.2%wa,  0.0%hi,  0.0%si,  0.0%st
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1492  iu        22   0 12.5g 4.2g 14m  S 109.0 21.6  29469:56  java
11536 oracle    16   0 4302m 1.1g 1.1g R 63.3  5.7   3:15.06 oracle
11600 oracle    16   0 4303m 1.1g 1.1g R 63.3  5.8   2:14.94 oracle
11581 oracle    16   0 4302m 1.1g 1.1g R 51.0  5.6   2:06.25 oracle
11439 oracle    16   0 4303m 1.2g 1.1g R 47.5  5.9  10:54.44 oracle

%CPU -- CPU Usage : 

CPU data is displayed in percentage.
CPU time is divided between process various states:
CPU info is displayed in this line:

Cpu(s):  7.3%us,  5.8%sy,  0.0%ni, 84.7%id,  2.2%wa,  0.0%hi,  0.0%si,  0.0%st

%us - User Space: 
            This info is for any process that doesn't belong to the kernel.

%sy - System. 
           This is the percent of time that the CPU spent running the kernel.
           There a system CPU usage when a user space process needs something from the system. For example memory allocations, perform I/O, or when creating child process.
       
%ni - nice. The percent of time that the CPU spent running user space processes that have been niced.

%id - idle

%wa - wait. Idle while waiting for an I/O operation to complete.

%hi - hardware interrupts.

%si - software interrupts.

hi and si: How much time the processor has spent servicing interrupts. Hardware interrupts are physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt will actually cause the CPU to stop what it is doing and go handle the interrupt. A software interrupt doesn't occur at the CPU level, but rather at the kernel level.

%st -  Applies to virtual machines. When Linux is running as a virtual machine on a hypervisor, the st (short for stolen) statistic shows how long the virtual CPU has spent waiting for the hypervisor to service another virtual CPU.

How can cpu% be more than 100% ?
Consider following top output
1492  iu        22   0 12.5g 4.2g 14m  S 109.0 21.6  29469:56  java
By default, top displays this as a percentage of a single CPU. 
On multi-core systems, you can have percentages that are greater than 100%. 

For example, if 2 cores are at 60% use,  top will show a CPU use of 120%. 

==============================
/proc filesystem
==============================
The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures.  
It is commonly mounted at /proc.

Reference
==============================
/proc/stat
==============================
less /proc/stat
cpu#        user   nice     system         idle     iowait    irq  softirq
---- ----------- ------ ---------- ------------ ---------- ------ --------
cpu   8681569546  20781 7500186849 107708594984 2544771039 3432741 26310998 0
cpu0   359643227    164  214697909   7299502792   29611065       0   600384 0
cpu1   321834812    163  235978841   7256042542   88853474   27294  1318276 0
cpu2   405713055   6237  225353119   6707407683  559505025  652466  5417931 0
cpu3   305060789    206  241572665   7322828222   33928786   32563   632171 0
cpu4   383078927   4489  222956973   6934212094  359472763  658643  3671609 0
cpu5   315596685    252  265718076   7301472241   20771256   28023   468875 0
cpu6   377777117    498  230329707   7124598350  169031743  364325  1953733 0
cpu7   298895376    223  261328034   7326041683   17090250  312030   387802 0
cpu8   592757961    450  325305289   6957671678   27858570       1   461516 0
cpu9   442884180    123  378368398   6994452907   87811179    8977   529606 0
cpu10  624129254   3624  371815238   6324644659  582304175   25986  1132519 0
cpu11  552014661    881  499794403   6803694480   46959840  124518  1466579 0
cpu12  764377170   2026  461568339   6335395961  341717633   66748   927570 0
cpu13  706913799    345  759805737   6412366366   22464769  549796  1954558 0
cpu14 1157943414    849 1148640534   5449024929  145227184  318891  2899630 0
cpu15 1072949110    245 1656953579   5159238389   12163321  262472  2488231 0

intr 26803333079 1744487151 85 0 3 283 2 0 0 1 0 0 0 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1497978303 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1525495804 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1351727636 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1327471158 0 0 0 0 0 0 0 1990263721 0 0 0 0 0 0 0 1847687028 0 0 0 0 0 0 0 1788738681 0 0 0 0 0 0 0 1695860299 0 0 0 0 0 0 0 1717259594 0 0 0 0 0 0 0 1726428623 0 0 0 0
ctxt 696305326360
btime 1378880020
processes 1672102393
procs_running 4
procs_blocked 0

less /proc/stat  | grep cpu | awk '{print $1 " usage= " ($2+$4)*100/($2+$4+$5)"%"}'

cpu usage= 13.0613%
cpu0 usage= 7.29428%
cpu1 usage= 7.13873%
cpu2 usage= 8.59942%
cpu3 usage= 6.94624%
cpu4 usage= 8.03734%
cpu5 usage= 7.37443%
cpu6 usage= 7.86408%
cpu7 usage= 7.10376%
cpu8 usage= 11.6569%
cpu9 usage= 10.5077%
cpu10 usage= 13.6046%
cpu11 usage= 13.3894%
cpu12 usage= 16.2133%
cpu13 usage= 18.6153%
cpu14 usage= 29.7408%
cpu15 usage= 34.6033%

less /proc/stat  | grep cpu | awk '{print $1 " idle%= " ($5*100)/($2+$3+$4+$5+$6+$7+$8)"%"}'
cpu idle%= 85.1687%
cpu0 idle%= 92.3513%
cpu1 idle%= 91.8013%
cpu2 idle%= 84.8603%
cpu3 idle%= 92.6464%
cpu4 idle%= 87.7298%
cpu5 idle%= 92.3761%
cpu6 idle%= 90.1385%
cpu7 idle%= 92.6871%
cpu8 idle%= 88.0265%
cpu9 idle%= 88.4917%
cpu10 idle%= 80.0176%
cpu11 idle%= 86.0783%
cpu12 idle%= 80.1533%
cpu13 idle%= 81.1274%
cpu14 idle%= 68.9396%
cpu15 idle%= 65.2732%


The first line "cpu" aggregates the numbers in all of the other "cpuN" lines. 
Each cpuN line is for a different CPU processor.

The meanings of the cpu columns:
1st column : user = normal processes executing in user mode
2nd column : nice = niced processes executing in user mode
3rd column : system = processes executing in kernel mode
4th column : idle = idle
5th column : iowait = waiting for I/O to complete
6th column : irq = servicing interrupts
7th column : softirq = servicing softirqs

These numbers identify the amount of time the CPU has spent performing different kinds of work. Time units are in USER_HZ or Jiffies (typically hundredths of a second). 

These numbers are aggregation since system startup.
To get the average idle percentage X % = ( idle * 100 ) / ( user + nice + system + idle + iowait + irq + softirq )

The other numbers are not interesting.

==============================
Get CPU processor(s) details.
==============================
less /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping        : 2
cpu MHz         : 2399.384
cache size      : 12288 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx
16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4798.76
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: [8]

less /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11
processor       : 12
processor       : 13
processor       : 14
processor       : 15


==============================
sar
==============================
sar logs are written to: /var/log/sa/ folder
These are binary files, that can be viewed by sar command.

To see last day data:
sar

To see 04 day data: 
sar -f /var/log/sa/sa04

Example:
>% cd /var/log/sa/
>% ls -ltr
-rw-r--r-- 1 root root 875204 Apr 26 23:53 sar26
-rw-r--r-- 1 root root 894192 Apr 27 23:50 sa27
-rw-r--r-- 1 root root 875204 Apr 27 23:53 sar27
-rw-r--r-- 1 root root 894192 Apr 28 23:50 sa28
-rw-r--r-- 1 root root 875206 Apr 28 23:53 sar28
-rw-r--r-- 1 root root 894192 Apr 29 23:50 sa29
-rw-r--r-- 1 root root 875206 Apr 29 23:53 sar29
-rw-r--r-- 1 root root 894192 Apr 30 23:50 sa30
-rw-r--r-- 1 root root 875206 Apr 30 23:53 sar30
-rw-r--r-- 1 root root 894192 May  1 23:50 sa01
-rw-r--r-- 1 root root 875206 May  1 23:53 sar01
-rw-r--r-- 1 root root 894192 May  2 23:50 sa02
-rw-r--r-- 1 root root 875215 May  2 23:53 sar02
-rw-r--r-- 1 root root 894192 May  3 23:50 sa03
-rw-r--r-- 1 root root 875205 May  3 23:53 sar03
-rw-r--r-- 1 root root 894192 May  4 23:50 sa04
-rw-r--r-- 1 root root 875206 May  4 23:53 sar04
-rw-r--r-- 1 root root 292016 May  5 07:40 sa05 

>% sar
Linux 2.6.18-274.el5 (my_server)  05/05/2016
12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM       all     21.21      0.00      3.18      5.98      0.00     69.63
12:20:01 AM       all     16.86      0.00      2.68      6.24      0.00     74.21
12:30:01 AM       all     22.11      0.00      3.72      5.79      0.00     68.38
12:40:01 AM       all     17.10      0.00      2.61      6.17      0.00     74.12
12:50:01 AM       all     21.81      0.00      3.50      6.35      0.00     68.35
01:00:01 AM       all     15.50      0.00      2.57      6.24      0.00     75.69
01:10:01 AM       all     19.68      0.00      2.94      6.01      0.00     71.38
01:20:02 AM       all     17.21      0.00      2.59      6.09      0.00     74.11

sar -f sa04
Linux 2.6.18-274.el5 (my_server)  05/04/2016
12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM       all     15.80      0.00      2.16      6.25      0.00     75.80
12:20:01 AM       all     14.11      0.00      2.14      6.29      0.00     77.46
12:30:01 AM       all     15.91      0.00      2.61      6.33      0.00     75.15
12:40:01 AM       all     15.67      0.00      2.27      6.24      0.00     75.82
12:50:01 AM       all     16.44      0.00      2.27      6.28      0.00     75.01
01:00:01 AM       all     15.51      0.00      2.28      6.45      0.00     75.75
01:10:01 AM       all     16.21      0.00      2.33      6.16      0.00     75.30

CPU usage per core
>% sar -P ALL | more
>% sar -f /var/log/sa/sa17 -P ALL

Linux 2.6.18-274.el5 (my_server)  05/05/2016

12:00:01 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:10:01 AM       all     21.21      0.00      3.18      5.98      0.00     69.63
12:10:01 AM         0     15.18      0.00      4.38      0.19      0.00     80.25
12:10:01 AM         1     16.67      0.00      5.33     29.82      0.00     48.17
12:10:01 AM         2     24.79      0.00      3.87      0.15      0.00     71.20
12:10:01 AM         3     12.03      0.00      3.45      0.98      0.00     83.55
12:10:01 AM         4     16.14      0.00      3.55      0.35      0.00     79.97
12:10:01 AM         5     14.43      0.00      3.59      0.96      0.00     81.02
12:10:01 AM         6     13.43      0.00      2.98      0.26      0.00     83.33
12:10:01 AM         7     34.18      0.00      4.01      0.70      0.00     61.10
12:10:01 AM         8     35.98      0.00      1.44      0.11      0.00     62.46
12:10:01 AM         9     16.99      0.00      2.34     58.52      0.00     22.15
12:10:01 AM        10     22.20      0.00      2.20      0.31      0.00     75.29
12:10:01 AM        11     34.14      0.00      2.44      0.91      0.00     62.51
12:10:01 AM        12     19.57      0.00      2.24      0.26      0.00     77.93
12:10:01 AM        13     25.10      0.00      2.89      0.64      0.00     71.37
12:10:01 AM        14     19.29      0.00      2.30      0.29      0.00     78.12
12:10:01 AM        15     19.29      0.00      3.81      1.23      0.00     75.67
12:20:01 AM       all     16.86      0.00      2.68      6.24      0.00     74.21
12:20:01 AM         0     12.43      0.00      4.36      0.08      0.00     83.13
12:20:01 AM         1     10.29      0.00      4.85     23.72      0.00     61.13
12:20:01 AM         2     13.89      0.00      3.49      3.13      0.00     79.49
12:20:01 AM         3      8.23      0.00      2.95      0.51      0.00     88.31
12:20:01 AM         4     12.44      0.00      3.83      3.10      0.00     80.62
12:20:01 AM         5      8.45      0.00      2.92      0.73      0.00     87.90
12:20:01 AM         6     10.47      0.00      2.87      0.20      0.00     86.47
12:20:01 AM         7      4.65      0.00      1.77      0.93      0.00     92.66
12:20:01 AM         8     10.15      0.00      1.40      0.07      0.00     88.38
12:20:01 AM         9      7.75      0.00      1.76     65.39      0.00     25.10
12:20:01 AM        10     12.92      0.00      1.76      0.33      0.00     84.99
12:20:01 AM        11     46.04      0.00      1.56      0.42      0.00     51.98
12:20:01 AM        12     13.37      0.00      1.67      0.24      0.00     84.73
12:20:01 AM        13     20.81      0.00      2.56      0.62      0.00     76.01
12:20:01 AM        14     20.57      0.00      4.11      0.20      0.00     75.12
12:20:01 AM        15     57.35      0.00      1.10      0.23      0.00     41.32

12:30:01 AM       all     22.11      0.00      3.72      5.79      0.00     68.38

CPU usage for specific core.
In this example: sample for core 9, every 3 seconds, for 4 intervals.
>% sar  -P 9 3 4 | more
Linux 2.6.18-274.el5 (my_server)  05/05/2016

08:32:02 AM       CPU     %user     %nice   %system   %iowait    %steal     %idle
08:32:05 AM         9      0.33      0.00      0.00      0.00      0.00     99.67
08:32:08 AM         9      0.00      0.00      0.00      0.00      0.00    100.00
08:32:11 AM         9      0.00      0.00      0.33      0.00      0.00     99.67
08:32:14 AM         9      0.33      0.00      0.00      0.00      0.00     99.67
Average:            9      0.17      0.00      0.08      0.00      0.00     99.75




==============================
vmstat
==============================
Another option, not very detailed.
Example:

>% vmstat 2 6
procs -----------memory------------ ---swap-- -----io---- --system--- -----cpu------
 r  b   swpd   free   buff    cache   si   so    bi    bo   in     cs us sy id wa st
 2  1 783628 182264 121292 12727872    0    0    54    74    0      0  7  6 85  2  0
32  1 783628 190512 121372 12728092    0    0    80  1736 1231 194318 11  4 78  7  0
45  4 783628 191120 121380 12728304    0    0    64  1944 1253 189737 16  1 77  6  0
76  1 783628 191500 121420 12728572    0    0    56  1764 1215 190488  8  1 85  7  0
33  1 783628 192172 121436 12728800    0    0    80  1578 1192 193625  7  1 85  7  0
95  2 783628 191472 121456 12728488    0    0   156  2280 1185 194941 29  3 62  5  0

vmstat 2 6
Means run at 2 seconds interval, 6 times.
The first line, is a  result that is averages since the last reboot.
Subsequent lines display average for the sampling period of 2 seconds.

Output.
The output of CPU usage in in percent:
All of the CPU metrics are expressed as percentages. 
So, all of the CPU values (us + sy + id + wa+st) will always sum to 100. 

us: Used %
sy: System %
id: Idle %
wa: Wait %. Wait for I/O and other external resources.
st: Steal Time. Only related for VMs. 
     The percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor.


==============================
pidstat

==============================
This command supposed to log process activity.
With -u flag, it will log in continues manner.
But looking into the data, it is on single CPU level, and not aggregated CPU, as in top.

pidstat | sort -ur -k4 | head -10

05:03:22 PM       PID    %usr %system  %guest    %CPU   CPU  Command
Linux 2.6.32-504.8.1.el6.x86_64 (deu-via-5-aps-10)      11/06/2016      _x86_64_        (12 CPU)
05:03:22 PM     13157    0.06    0.02    0.00    0.07     6  MountAgent
05:03:22 PM     10032    0.05    0.04    0.00    0.09     0  cmaperfd
05:03:22 PM     13153    0.05    0.02    0.00    0.07     6  DiskGroupAgent
05:03:22 PM     13151    0.05    0.02    0.00    0.07     6  ApplicationAgen
05:03:22 PM     13163    0.05    0.02    0.00    0.07     1  VolumeAgent
05:03:22 PM     13155    0.05    0.02    0.00    0.07     1  IPAgent
05:03:22 PM     13162    0.05    0.02    0.00    0.07     1  HostMonitor
05:03:22 PM     13159    0.05    0.02    0.00    0.07     0  NICAgent


top - 17:03:49 up 276 days,  5:14,  3 users,  load average: 0.12, 0.16, 0.14
Tasks: 876 total,   1 running, 874 sleeping,   1 stopped,   0 zombie
Cpu(s):  5.5%us,  0.6%sy,  0.0%ni, 93.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32622384k total, 32039952k used,   582432k free,    62948k buffers
Swap: 10485756k total,    47584k used, 10438172k free, 28074096k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6514 oracle    20   0 16.3g 8.4g 8.4g S  9.7 27.1  16:27.29 oracle
 8160 oracle    20   0 16.3g 1.2g 1.2g S  7.8  3.8  48:46.99 oracle
   82 root      20   0     0    0    0 S  1.9  0.0 393:05.88 kblockd/0
 6500 oracle    -2   0 16.3g  16m  14m S  1.9  0.1  19:08.42 oracle
13153 root      20   0 15956 3080 1832 S  1.9  0.0 285:33.33 DiskGroupAgent
19553 root      20   0 15708 1780  836 R  1.9  0.0   0:00.01 top
22461 oracle    20   0 16.3g 5.3g 5.3g S  1.9 17.0   1:58.89 oracle

    1 root      20   0 19356 1060  852 S  0.0  0.0   2:25.99 init


In the example below, it will log pidstat output every 10 minutes
pidstat -u 600 >/var/log/pidstats.log & disown $!

==============================
Script to get ongoing CPU usage
==============================
Make the script run every 5 minutes

crontab -e
*/5 * * * * /home/alec/workarea/DB/get_cpu_usage.sh
or
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /home/alec/workarea/DB/get_cpu_usage.sh

less get_cpu_usage.sh


#!/bin/bash

export HOME_DIR=/starhome/iu/scripts
export TOP_LOG=$HOME_DIR/top_trace.log
export PROCESSES_LOG=$HOME_DIR/sorted_processes.log
export VMSTAT_LOG=$HOME_DIR/vmstat.log
export line_delimiter='================================='
export SLEEP_SEC=10

touch $TOP_LOG
touch $PROCESSES_LOG
touch $VMSTAT_LOG
vmstat 2 2 | head -2 > $VMSTAT_LOG

while [ 1 -eq 1 ] ; do
  echo $line_delimiter >> $TOP_LOG
  date >> $TOP_LOG
  echo $line_delimiter >> $TOP_LOG
  top -b | head -20 | grep -v Swap | grep -v Mem | grep -v ^$ | grep -v "load average" >> $TOP_LOG

  echo $line_delimiter >> $PROCESSES_LOG
  date >> $PROCESSES_LOG
  echo $line_delimiter >> $PROCESSES_LOG
  ps axo pid,etime,%cpu,%mem,cmd,args | sort -rn -k3 | head -10 >> $PROCESSES_LOG

  echo $line_delimiter >> $VMSTAT_LOG
  date >> $VMSTAT_LOG
  echo $line_delimiter >> $VMSTAT_LOG
  vmstat 2 2 | tail -1 >> $VMSTAT_LOG


  #ps -eo pid,ppid,rss,size,vsize,pcpu,pmem,cmd -ww --sort=vsize >> sorted_processes.log

  sleep $SLEEP_SEC

  #------------------------------------------
  #prevent the log files from growing forever
  #------------------------------------------
  mv -f `find $HOME_DIR -name "sorted_processes.log" -size +1000000k` $HOME_DIR/sorted_processes.log_bak 2> /dev/null
  mv -f `find $HOME_DIR -name "top_trace.log" -size +1000000k` $HOME_DIR/top_trace.log_bak 2> /dev/null
  mv -f `find $HOME_DIR -name "vmstat.log" -size +1000000k` $HOME_DIR/vmstat.log_bak 2> /dev/null


done


Sample output:

top_trace.log
=================================
Mon Apr 25 07:35:04 CDT 2016
=================================
Tasks: 551 total,   1 running, 550 sleeping,   0 stopped,   0 zombie
Cpu(s):  7.3%us,  5.8%sy,  0.0%ni, 84.7%id,  2.1%wa,  0.0%hi,  0.0%si,  0.0%st
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1492 iu        22   0 12.1g 4.2g  14m S 17.2 21.5  28455:25 java
17274 oracle    18   0 4316m 2.0g 2.0g D  3.8 10.4   1486:48 oracle
 1490 iu        15   0 76968 1452  980 S  1.9  0.0 117:20.61 wrapper
15234 iu        15   0 13024 1348  724 R  1.9  0.0   0:00.02 top
17270 oracle    17   0 4320m 2.0g 2.0g D  1.9 10.4   1475:24 oracle
    1 root      15   0 10368  572  536 S  0.0  0.0  51:56.89 init
    2 root      RT  -5     0    0    0 S  0.0  0.0  18:03.14 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:25.14 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0  12:50.20 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:10.18 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0  15:52.97 migration/2


sorted_processes.log
=================================
Mon Apr 25 07:35:05 CDT 2016
=================================
21565  3-04:46:25 49.3  7.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
 9799  2-21:42:45 41.0  9.2 ora_j000_igt                ora_j000_igt
 1492 136-23:46:08 14.4 21.5 /software/java/x64/jdk1.7.0 /software/java/x64/jdk1.7.0_13/bin/java -Dcom.sun.management.jmxremote.port=1098 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dstarhome.file=activemq.xml -XX:NewSize=200m -XX:SurvivorRatio=6 -XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -DserviceName=sparx -Xms4096m -Xmx4096m 
 7340 10-05:03:12  9.2  8.0 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
10393  8-23:34:59  8.3  8.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
22114 16-11:36:18  6.5  8.5 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
22086 16-11:36:18  6.3  8.3 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
 1356 14-04:42:59  6.2  8.3 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)
13442       01:29  6.2  1.0 ora_p001_igt                ora_p001_igt
22118 16-11:36:18  6.1  8.6 oracleigt (LOCAL=NO)        oracleigt (LOCAL=NO)

No comments:

Post a Comment