Process Monitoring

Introduction

Collectl has the ability to monitor processes in pretty much the same way as ps or top do as can be see here:

# collectl -sZ
# PROCESS SUMMARY (faults are /sec)
# PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime MajF MinF Command
21502  root     15  1749 S    6M    2M  0.00  0.00   0   0:06.40    0    0 /usr/sbin/sshd
21504  root     15 21502 S    4M    1M  0.00  0.00   0   0:00.79    0    0 -bash
22984  root     15     1 S    7M    1M  0.00  0.00   0   0:00.78    0    0 cupsd
23073  apache   15  1914 S   18M    8M  0.00  0.00   0   0:00.01    0    0 /usr/sbin/httpd

You can select processes to monitor by pid, parent, owner or command name (see section on Filters below). When using names, you can use partial or full names or even use strings that were part of the command invocation string such as parameters. The main benefit of monitoring processes with collectl is that you can coordinate the sample times of process data with any of the other subsystems collectl can monitor.

The way you tell collectl to monitor processes is to specify the Z subsystem and any optional parameters with --procopts. Since monitoring processes is a heavier-weight function, it is recommended to use a different interval, which can be specified after the main monitoring interval separated by a colon. The default is 60 seconds. Therefore, to monitor all the processes once every 20 seconds and the rest of the parameters every 5 simply say:

collectl -sZ -i5:20

The biggest mistake people make when running this command interactively is to leave off the interval or specificy something like -i1 and not see any process data. That is because the default interval is 60 seconds and they just haven't waited long enough for the output! This should obvious since collectl will announce it is waiting for a 60 second sample.

There are also a few restrictions to the way these intervals are specified. The process interval must be a multiple of the main interval AND cannot be less than it. If you specify a process interval without a main interval, the main interval defaults to the process interval.

Finally, as with other data collected by collectl, you can play back process data by specifying -p. While not exactly plottable data, you can specify -P and the output will be written to a separate file as time stamped space delimited data, one process per line.

Options

As stated earlier, you can specify options specific to process monitoring. These apply to all forms of process output unless otherwise noted, including --top and --procanalyze

c: include cpu times of children who have exited with their parent
f: use cumulative totals for maj/min page faults
i: Show alternate format which includes all io counters
m: Show alternate format which includes all memory sizes
p: Never look for new pids or threads after startup (improves performance)
r: Only show root name of command, leaving off path
t: Look for thread for all processes
w: Include command arguments, making a wider display. Can be combined with r.
z: Only show processes with non-zero sort field. This only applies to --top.

Filters

You can tell collectl to monitor a subset of processes by using the --procfilt switch followed by one or more process selectors, separated by commas (see collectl -x to see a detailed list). For the most part, the use of filters is pretty straighforward in that if you want to see all processes whose parent is 1234 as well as those that contain http in their command name, you would specify a filter of --procfilt P1234,chttp. However there is one important distinction to keep in mind. The c prefix says to select on the command name only whereas the f prefix says to look at the entire command string including arguments. In other words, if you're editing the file abc and try to select it via --procfilt cabc you'll never see it. This is particularly annoying when monitoring perl scripts since the name of the command is perl and the name of the script shows up as an argument.

If a plus sign immediately follows a process selector any processes selected by it will have their threads monitored as well. See collectl -x or man collectl for more details.

Dynamic Process/Thread Monitoring

A unique feature of process monitoring is that processes specified with a selection list via --procfilt do not have to exist at the time collectl is run. In other words, collectl will continue to look for new processes that match this selection list during every collection cycle! While this is indeed a good thing if that is what you want to do, it does come with a price in overhead: not a lot, but overhead never-the-less.

If you do not want this effect and only want to look at those processes that match the selection list at the time collectl is started, specify --procopts p to suppress dynamic process discovery. This holds for process threads as well, suppressing looking for new ones.

Perhaps the best way to see this in effect is to run collectl with the following command:

collectl -i:.1 -sZ --procfilt fabc

The .1 for an interval is not a mistake. It is there to show that you can indeed use collectl to spot the appearance of short lived processes - just don't do it unless you really need to. The --procfilt switch is saying to look for any processes invoked with a command that contains the string 'abc' in it. When this command is invoked there shouldn't be any output unless someone IS running a command with 'abc' in it. Now go to a different window or terminal and edit the file abc with your favorite editor. You will immediately see collectl display process information for your editor and when you exit the editor the output will stop.

The Time Fields

The SysT and UsrT represent the system and user time the line item spent during the current interval. One might think this means that in a 60 second interval the most time a process could spend is 60 seconds. Not quite! If this is a multi-processor/multi-core system the process could actually spend up to 60 seconds on each core, so just be careful how the times are interpretted. The Pct field is the percentage of the current interval the process had consumed in system and user time, which can also exceed 100% in multi-processor situations. Finally, since the AccuTime field accumulates these times it can exceed the actual wall clock time.

When run in non-threaded mode, the times reported include all time consumed by all threads. When run in threaded mode, times are reported for indivual threads as well as the main process. In other words, if a process's only job is to start threads, it will typically show times of 0. If you rerun collectl in non-threaded mode you will see it report aggregated times.

Process Memory Utilization

The types of memory utilization displayed as part of the process monitoring output are the Virtual and Resident sizes. However there are additional type of memory that collectl tracks and to see them as well as page faults you can select alternate process display format as follows:

# collectl -sZ -i:1 --procopts m
# PID  User     S VmSize  VmLck  VmRSS VmData  VmStk  VmExe  VmLib MajF MinF Command
 9410  root     R 81760K      0 15828K 14132K    84K    16K  3620K    0   18 /usr/bin/perl
    1  root     S  4832K      0   556K   212K    84K    36K  1388K    0    0 init
    2  root     S      0      0      0      0      0      0      0    0    0 kthreadd
    3  root     S      0      0      0      0      0      0      0    0    0 migration/0

Process I/O Statistics

As of collectl Version 2.4.0, if process I/O stats have been built into the kernel collectl will add 2 additional columns to the process display named RKB and WKB, noting in the following example I've set the display interval to 1 second and removed the initialization message from the output. As with all fields reported as rates/sec these will show consistent values independent of the interval and if you want the unnormalized value be sure to include that option with the -o switch as -on.

# collectl -sZ -i:1
# PROCESS SUMMARY (faults are /sec)
# PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
    1  root     20     0 S    4M  552K  0.00  0.00   0   0:00.68    0    0    0    0 init
    2  root     15     0 S     0     0  0.00  0.00   0   0:00.00    0    0    0    0 kthreadd
    3  root     RT     2 S     0     0  0.00  0.00   0   0:00.02    0    0    0    0 migration/0

A particularly useful feature I've found is monitoring one or more processes by name (you can also monitor by pid, ppid and uid) to see what they're doing. In this case I'm using the dt program to write a large file and telling collectl to display any process whose command string matches dt as well as to include time stamps.

# collectl -sZ -i:1 --procfilt cdt -oT
# PROCESS SUMMARY (faults are /sec)
#          PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
09:01:03 13577  root     20 12775 R    1M    1M  0.04  0.00   4   0:01.92    0  16K    0    0 ./dt
09:01:04 13577  root     20 12775 D    1M    1M  0.40  0.00  40   0:02.32    0 118K    0    0 ./dt
09:01:05 13577  root     20 12775 D    1M    1M  0.24  0.00  24   0:02.56    0  65K    0    0 ./dt

Finally, note that there is more process I/O data available but I chose to leave it off the default display and instead have the following alternate format. This is the same methodology used for reporting process memory utilitation, namely you only see VSZ and RSS in the default display but much more with --procopts m. Also note in this caseI chose 1/2 second monitoring as well as showing time in msec resolution:

# collectl -sZ --procopts i -i:.5 --procfilt cdt -oTm
#              PID  User     S  SysT  UsrT   RKB   WKB  RKBC  WKBC  RSYS  WSYS  CNCL  Command
09:03:24.003 13614  root     D  0.12  0.00     0   32K     0   32K     0    64     0  ./dt
09:03:24.503 13614  root     D  0.14  0.00     0   32K     0   32K     0    64     0  ./dt
09:03:25.003 13614  root     R  0.10  0.00     0   24K     0   24K     0    48     0  ./dt

The --top Switch

A feature that has been in collectl for awhile has been the --top switch which generates data in a format similar to the linux top command, though it was limited to process data only. However, since the inclusion of process I/O statistics as well as the inclusion of a few additional handy switches, this command has become much more useful as explained below.

In its simplest form, this switch tells collectl to simply display the top consumers of cpu. However, as of collectl V2.6.4 you can now now tell it to optionally display the list sorted by I/O or page faults. Here I'm simply looking for the top processes sorted by page faults with the command collectl --top flt and the display fills my window, which in this case is only 10 lines high. To look at the top consumers of I/O, simply use --top io instead:

# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 3009  root     20     1 S    2M  280K  3  0.00  0.00   0   0:43.01    0    0    0    8 irqbalance
 7144  root     20  6485 R   81M   15M  2  0.00  0.06   6   0:01.70    0    0    0    5 /usr/bin/perl
    1  root     20     0 S    4M  556K  2  0.00  0.00   0   0:03.60    0    0    0    0 init
    2  root     15     0 S     0     0  2  0.00  0.00   0   0:00.00    0    0    0    0 kthreadd
    3  root     RT     2 S     0     0  0  0.00  0.00   0   0:00.10    0    0    0    0 migration/0
    4  root     15     2 S     0     0  0  0.00  0.00   0   0:00.06    0    0    0    0 ksoftirqd/0
    5  root     RT     2 S     0     0  0  0.00  0.00   0   0:00.30    0    0    0    0 watchdog/0
    6  root     RT     2 S     0     0  1  0.00  0.00   0   0:00.08    0    0    0    0 migration/1
    7  root     15     2 S     0     0  1  0.00  0.00   0   0:00.02    0    0    0    0 ksoftirqd/1

As discussed earlier, threads can be considered for displaying with --procopts t which requests all selected processes be examined for threads. You can also specify a subset of processes be examined by specifying a + with --procfilt but that's getting into more advanced concepts. Fnally, in the spirit of saving screen real estate collectl doesn't include command arguments in the output but including --procopts w will request a wider display that does include them. In fact you can get an even narrow display by including --procopts which requests only a command's root name be displayed so in the example about we would see perl instead if /usr/bin/per.

The following 3 successive displays are the result of monitoring a processes named thread.pl which creates a couple of threads 10 seconds apart which then do some I/O. In the first display we see the main script, which is actually run under the perl interpretter and so the string thread does exist as part of the argument string, but I chose to leave it off the output to save screen real estate:

collectl --top io --procfilt fthread --procopt t
# PROCESS SUMMARY (faults are /sec) 06:57:42
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF CommandnF Command
 7024  root     20  6725 S   61M    2M  2  0.00  0.00   0   0:00.00    0    0    0    0 /usr/bin/perl

A few seconds later the first thread starts up and immediately goes to the top of the list since it does have the dominant I/O:

# PROCESS SUMMARY (faults are /sec) 06:57:52
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 7065+ root     20  6725 R   73M    5M  0  0.88  0.12 100   0:01.98    0 291K    0    0 thread.pl
 7064  root     20  6725 S   73M    5M  2  0.88  0.11  99   0:01.98    0    0    0    0 /usr/bin/perl

And still later the second thread shows up, it too having a higher sort order than the root script:

# PROCESS SUMMARY (faults are /sec) 06:58:02
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 7098+ root     20  6725 R   83M    8M  1  0.12  0.02  14   0:00.86    0  29K    0    0 thread.pl
 7096+ root     20  6725 R   83M    8M  0  0.16  0.00  16   0:04.24    0  27K    0    0 thread.pl
 7095  root     20  6725 S   83M    8M  2  0.28  0.02  30   0:05.13    0    0    0    0 /usr/bin/perl

Naturally, as with all other data in collectl, you can record it to a file and play it back later using various combinations of options with --procopts and even --top. If you are using --top collectl simply displays a blank line between intervals. Also don't forget to try different sort options and experioment with the number of lines per process interval you want to examine since the default is your screen height and may be too big for playback purposes.

Including non-process data
The native top can natively show other types of data besides the top processes and so can collectl. Just specify those subsystems you are interested in with -s and they will be displayed in a scrolling window above the process data - by scrolling multiple lines of data, you are able to see history, something the linux command cannot do. You may also want to include timestamps with the brief data by using -oT to make it easier to read.

But don't stop with brief data, you can even show verbose data as well. However in the case of multiple subsystems it just isn't practical to show scrolling history and so you will only see the latest sample. If you choose to show a single verbose subsystem you will see scrolled data.

Finally, if you want to customize the way the screen real-estate is allocated between the process and other data, you can change the size of the process section by including the number of lines to display as the second argument to --top. You can also control the size of the subsystem data with --hr lines, a synonym for --headerrpeat lines.

Experimental --export proctree

This is an experimental (meaning subject to change) alternate format for displaying process data. Rather than simply show processes in the defaut PID order or sorted by a particular field when using the --top format, this format displays processes in a parent/child relationship. As with all --export formats, one can use this interactively, when playing back data or to send the data over a socket when using --address. At the very least, this could offer a good starting point for developing your own alternative process output formats.

There are actually 2 main functional components to this format, the main one being to determine the parent child relationship between all processes (there IS some additional overhead involved here). A second function is the aggregation of various counters and meters.

Proctree can also be combined with --top to limit the number of processes display OR in playback mode with or without --top. Consider the following output when playing back a file with --top --export proctree:

#  PID       PPID User     PR S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime MajF MinF Command
00001           0 root     15 S    2G  108M  0  0.03  0.03   0   0:18.21    0    0 init
 05535          1 root     15 R  106M   15M  2  0.01  0.03   0   0:07.68    0    0  /usr/bin/perl
 05452          1 haldaemo 15 S   85M    7M  1  0.02  0.00   0   0:06.99    0    0  hald
  05453      5452 root     15 S   55M    3M  0  0.02  0.00   0   0:05.42    0    0   hald-runner
   05474     5453 root     16 S    9M  652K  0  0.02  0.00   0   0:05.41    0    0    hald-addon-storage:

One can quickly see that the total CPU consumption for this monitoring interval is 0.03 of both system and user time by simply looking at root process 1. Furthermore, of the system time 0.01 is consumed by 05535 while the other 0.02 is consumed by one of the children of 05452, actually the grandchild 05474. One should also note that any processes with no CPU time will be excluded from the display to keep the output reasonable dense. Without --top all processes are shown.

One can also use most of the process options as well (see --showsubopts for the complete set).

Additional interactive --top options
Proctree was really developed for real-time display with --top and so there are more available options, the main one to consider is the suppression of fields with zero in them. In the previous example, fields with 0 CPU were suppressed because by default --top sorts by CPU (even though we're not sorting). If one were to choose a different sort field with --top, proctree will use that field to suppress entries with zero in them. In fact, there are a number of different switches one can select interactively, one of which is to change the suppression value from 0 to something else.

So let's take a closer look at running in interactive mode by typing the command

collectl --top --export proctree

and at some time after the first data screen is displayed, type RETURN. You will now see a menu like this:

Enter a command and RETURN while in display mode:
  pid    only display this pid and its children
  a      toggle aggregation between 'on' and 'off'
  dxx    change display hierarchy depth to xx
  i      change display format to 'I/O'
  k      toggle multiplication of I/O numbers by 1024 between 'on' and 'off'
  m      change display format to 'memory'
  p      change display format to 'process'
  h      show this menu
  stype  where 'type' is a valid sorting type (see --showtopopts)
         entries with 0s in those field(s) will be skipped
  wxx    max width for display of command arguments
  z      toggle 'skip' logic between 'on' and 'off'
  Zxx    when skipping, only keep entries with I/O fields > xxKB
Press RETURN to go back to display mode...

This list shows a number of commands which will change the display contents and/or format much in the way you can do with the standard linux top command. First type RETURN to get back into real-time display mode and then simply type a command at any time while collectl is running and the command will take effect on the next display cycle.

These commands fall into several categories, one being those that toggle behavior such as aggregation, multiplication and the skip logic. By default, all values are aggregated up through their parent hierarchy and typing the a command followed by a RETURN will turn this behavior off. Similarly, when the values of the I/O counters are too large to easily read you can force their division by 1K with the k command. And finally, you can disable the logic that skips zero-based entries with the z command. If you'd rather skip on some value other 0 you can set the skip value with Zxxx.

Look at the display line above the following process data:

Process Tree 09:06:03 [skip when 'time'<=0 is 'on' aggr: 'on' x1024: 'off' depth 5]
#  PID       PPID User     PR S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime MajF MinF Command
00001           0 root     15 S  674M  272M  1  0.00  0.06   6   0:09.96    0    0 init
 01766          1 root     15 S   50M   24M  0  0.00  0.06   6   0:01.30    0    0  /usr/sbin/sshd
  02142      1766 root     15 S   25M   14M  1  0.00  0.06   6   0:00.88    0    0   /usr/sbin/sshd
   02144     2142 root     15 S   18M   12M  1  0.00  0.06   6   0:00.87    0    0    -bash
    02229    2144 root     19 R   14M   10M  0  0.00  0.06   6   0:00.84    0    0     /usr/bin/perl

Following the time field you can see what the toggle states are of the three fields as well as the skip value and display depth. By default you only see 5 levels of the process hierarchy but can change this with the d switch. For example d7 will set the depth to 7.

As with other process displays, you can also choose whether you want to see the default display, one that shows all I/O fields or one focused on memory using the p, i or m commands. You can easily switch between these formats at any time.

If you enter a number as a command, this is interpretted as a process PID and the display will be adjusted such that this becomes the first entry in the display. If you would like to skip on something other than the current field, you can easily change that with the s command immediately following by one of the sort field names listed with --showtop. Finally, if using the wide command option with --procopts w, long command string will cause wrapping and make the display unreadable. The w command can be used to set the maximum width of the command field.

As with other collectl options, there are simply far too many combinations to describe which are appropriate for a particular situation (such as using --procopt) so it is recommended you experiment to better understand the many capabilities of proctree.

Process Analysis

If you've run collectl as a daemon and collected process data, you now have a huge pile of data and up until version 2.6.5 it wasn't entirely clear what you could do with is short of writing some scripts to try and interpret it. Now you have an alternative and that's to playback one or more data files with the --procanalyze switch. When this switch is passed to collectl, it generates a process summary file with the extension of prcs in the same direcory as specified with the -f switch. This file will contain one line for each unique process and the fields will be separated by collectl's field separator which by default is a space but something you can also change with the --sep switch.

The fields themselves summarize all the key data elements associated with each process making it possible to see the process start/end times, cpu consumption, I/O (if the kernel supports I/O stats), page faults and even the ranges of the different types of memory consumed. And since the data elements are separated by a single character delimeter you can easily load the file into your favorite spreadsheet and perform deeper analysis (the data is actually not very user friendly as written).

It is also important to remember a couple of things:

Not all processes show up in the output if they were created and exited between samples
You can control the time frame that is analyzed by using the --from and -thru switches which can be a useful thing to do when you're interesed in a specific time period and don't want this file too cluttered
The --procanalyze overrides collectl's default behavior of processing all the data that has been collected and so will only generate the prcs file(s). If you want to generate plot data for other subsystems at the same time be sure to include then with -s.
By default, all data is written in space-separated format which is collectl's standard default. Since command arguments (if any) are also space-separated they will each show up in a different spreadsheet cell, which shouldn't be a big deal but if you want the entire command string together, you can always use --sep 9 to make the data tab-delimited or even choose a different separator.

Understanding Processing Overhead

This is intended to be a brief description of how process monitoring works with the hope that it will help one use the capability more efficiently and avoid unnecessary processing overhead. Normally the overhead is modest, but if you intend to run at higher monitoring rates or looking at threads it's worth reading further.

Collectl maintains 2 data structures that control monitoring: pids-to-monitor and pids-to-ignore. These lists are built at the time collectl starts, so if --procopts p is not specified, the effect is to execute a ps command and save all the pids in the pids-to-monitor list. If filters are specified with --procfilt, only those pids that match are placed in pids-to-monitor list and the rest placed in the pids-to-ignore list and so you can see that when filters are used there can be a significant reduction in overhead since collectl need not examine every processes data.

If collectl is only monitoring a specific set of processes, either because --procopts p was specified or procfilt was used and only specified specific pids (not ppids), on each monitoring pass collectl only looks at the pids in the to-be-monitored list. In other words, this is as efficient as it gets because it needn't look for processes if neither list, aka newly created processes.

If doing dynamic process monitoring, every monitoring pass collectl has to read all the pids in /proc to get a list of ALL current processes. While it ignores any in the do-not-monitor list, it must look at the rest. If any of these are in the to-be-monitored list and have had thread monitoring requested, additional work is required to see if any new threads have shown up. Any processes not in the to-be-monitored list are obviously NEW processes and must then be examined to see if they match any selection criteria and this involves reading the /proc/pid/stat file. That pid is then placed in one of the two lists. It should be understood that during any particular interval a lot of processes come and go, such as cat, ls, etc. However, these are short lived enough as to not even be seen by collectl, unless of course collectl is running at a very fine grained monitoring level.

Occasionally a process being monitored disappears because it had terminated. When this happens its pid is removed from the to-be-monitored list.

Finally, these data structures (and a couple of others that have not been described) need maintenance to keep them from growing. If the number of processes to monitor has been fixed, this maintenance is significantly reduced.

So the bottom line is if you have to use dynamic monitoring, try to bound the number of processes and/or threads. If you really need to see it all, don't be afraid to but just be mindful of the overhead. Collecting all process data with the default interval has been observed to take about 1 minute of CPU time, which is less than 0.1%, on a lightly loaded Proliant DL380, but that load will be higher with more active process.

RESTRICTIONS

You cannot specify --procfilt during playback mode. If you need to look at a subset of the data consider using a filter like grep.
Thread monitoring is limited to 2.6 kernels.
Process I/O monitoring is limited to kernels that have that capability enabled and that didn't even appear before 2.6.22. If you don't see the file /proc/self/io, your kernel was not built with process I/O accounting enabled and you need to get one that has the following enabled: CONFIG_TASKSTATS, CONFIG_TASK_XACCT and CONFIG_TASK_IO_ACCOUNTING.
There is a bug in the way the kernel reports I/O stats (see bugzilla 10702) such that when you exclude threads from the display the parent process I/O counts are not aggregated into the I/O counts from the threads resulting in misleading results. Andrea Righi has provided a patch here that provides the correct aggregation, but of course that will require a kernel rebuild. He has also informed me that his patch has been included in the 2.6.26 kernel so this should no longer be an issue from that version forward.

updated Dec 16, 2008