The following are descriptions of just some of collectl's many features but are
not intended as a tutorial of how and when to apply them:
Fine Grained, Non-Drifting Monitoring
If the Time:HiRes perl modules is installed, which you can verify with the
command collectl -v, you will be able to run with non-integral
sampling intervals. Whether integral or fractional, sample times will align
extremely close to the whole second and will not drift as it the case
with just about every other tool. You will also be able to use -om to
have all times reported in msec.
Collectl uses very little CPU. In fact it has been measured to use <0.1% when run
as a daemon using the default sampling interval of 60 seconds for process and slab
data and 10 seconds for everything else. The overhead can increase on systems which
have dozens of disks or running hundreds processes so you may want to measure its
effect on you own system if you are concerned.
Summary vs Detail Data
You can report aggregated performance numbers on many devices such as CPUs, Disks,
interconnects such as Infiniband or Quadrics, Networks and even the
Lustre file system.
However you can also report on individual devices if you want to see how the aggregate
load is being generated. Be sure to see the documentation
for more details, particularly the examples and tutorials in the Getting Started
Brief vs Verbose Format
If is often more useful to see less data but for more devices and collectl recognizes
this by providing brief format as a default interactive display format. This
allows you to see what a number of subsystems are doing on a single line, making it
much easier to spot inconsistencies in the data by scanning a column of numbers. If
you want more detail and are willing to look at multiple lines per sample verbose
format is what you want. In fact, if you collectl data in record you can play
it back in both formats, first brief to look for problems and than again in
verbose to see more of what is happening. This technique also applies to
Summary and Detail data as well.
Althougy you can also display interactive output in plot format, this is really intended
for its namesake, plotting. By generating output (or simply playing back recorded data)
in this format, you can then feed the resultant files into plotting tools that recognize
delimiter separtated fields such as gnuplot,
Excel or even OpenOffice.
If you need data with non-space separated data which is the default, you can even change
it via the --sep switch.
Aligned Monitoring Intervals
If you've installed Time::HiRes and are using an integral sampling interval,
by default collectl will align its sampling on integral second boundaries.
In interactive mode samples will be taken as close as possible to
the nearest second and when run as a daemon, samples will align to the
top of the minute. In the latter case this means if you're running on a cluster
with synchronized clocks, all instances of collectl will collectl their
samples within a few msec of each other, making it much easier to correlate
events across the cluster.
Process and Slab Monitoring
Both ot these are higher overhead activities, but collectl provides a secondary
interval so they can be gathered at a lower frequency. In addition one can
specify a number of filters to select processes by pid, parent, owner, or even name.
also request process threads be included. As for slabs, here too one can filter
by name and during interactive display can request that only slabs that have
changed in value be displayed, significantly reducing the output and enhancing the
Collectl also has the ability to display process and slab data in a way similar to the
top and slabtop commands, each with a number of sorting options including the
ability to sort processes by the top I/O users and slabs by the changes in allocated
memory, neither of which are currently available in any existing utilities.
New to the 2.4.0 release is the monitoring of process i/o statistics. See
this page for more details
Interrupt Reporting by CPU
New to the 2.5.0 release you can now report interrupts at the CPU level and even
examine them changing in more detail at the individual interrupt levels.
Rather than dispay its output on the terminal or write it to a file, collectl can
send its data over a socket as well, making it possible to integrate it with other
Exportable data formats
If you don't like the format of the data collectl presents, feel free to write your
own using --export. There are 4 main ones that come with collectl for writing as
S-Exporessionn, List Expressions and even exporting UDP data to ganglia
In the case of the first 2, this data can also be sent over the TCP socket interface as well.
Tested at scale within the HP Public Cloud
Running on all servers at a monitoring frequency of 1-second, collectl has been validated to
run efficiently in some of the most strenuous of environments.
IPMI monitoring for fans and temperature sensors
The main reason this is experimental is because all vendors report IMPI sensor data
differently, even within their own product lines. This is an attempt to table drive
the parsing of the data in such a way that collectl can rationalize its display.
API for importing additional data
If you've been a collectl user but wanted to import some additional data
this is the way to go. This API provides
seamless, full-function integration into collectl! You data can be displayed in brief, verbose
and detail formats. It can be written and plot files, accessed from multiple systems at the same time
with colmux and even sent over a socket to external clients. All while appearing as a core collectl
Top-anything across a cluster
With the inclusion of colmux, you can now run collectl on multiple machines in a cluster at
one time and be presented with an intergrated view from all nodes, sorted by the column
of your choice.