Collectl

Get collectl at SourceForge.net. Fast, secure and Free Open Source software downloads
 

Latest Version: 3.7.3 April 1 2014

 
I've decided to stop building rpms and srpms since they're now available via various distros
If you can't find one, just unpack the tarball and run INSTALL

I've also decided to deprecate native lustre support in collectl as Peter Piela has graciously volunteered to take over support for lustre V1.8 and beyond by writing custom plugin he will support when it's available. If you'd like to help with testing, feel free to contact him directly

Home | Architecture | Features | Documentation | Releases | FAQ | Support | News | Acknowledgements

There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation.

Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.

The following is an example taken while writing a large file and running the collectl command with no arguments. By default it shows cpu, network and disk stats in brief format. The key point of this format is all output appears on a single line making it much easier to spot spikes or other anomalies in the output:

[mjs@poker] collectl

#<--------CPU--------><-----------Disks-----------><-----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads  KBWrit Writes netKBi pkt-in  netKBo pkt-out
  37  37   382    188      0      0   27144    254     45     68       3      21
  25  25   366    180     20      4   31280    296      0      1       0       0
  25  25   368    183      0      0   31720    275      2     20       0       1
In this example, taken while writing to an NFS mounted filesystem, collectl displays interrupts, memory usage and nfs activity with timestamps. Keep in mind that you can mix and match any data and in the case of brief format you simply need to have a window wide enough to accommodate your output.
[mjs@poker] collectl -sjmf -oT

#         <-------Int--------><-----------Memory-----------><------NFS Totals------>
#Time     Cpu0 Cpu1 Cpu2 Cpu3 Free Buff Cach Inac Slab  Map  Reads Writes Meta Comm
08:36:52  1001   66    0    0   2G 201M 609M 363M 219M 106M      0      0    5    0
08:36:53   999 1657    0    0   2G 201M   1G 918M 252M 106M      0  12622    0    2
08:36:54  1001 7488    0    0   1G 201M   1G   1G 286M 106M      0  20147    0    2
You can also display the same information in verbose format, in which case you get a single line for each type of data at the expense of more screen real estate, as can be seen in this example of network data during NFS writes. Note how you can actually see the network traffic stall while waiting for the server to physically write the data.
[mjs@poker] collectl -sn --verbose -oT

# NETWORK SUMMARY (/sec)
#          KBIn  PktIn SizeIn  MultI   CmpI  ErrIn  KBOut PktOut  SizeO   CmpO ErrOut
08:46:35   3255  41000     81      0      0      0 112015  78837   1454      0      0
08:46:36      0      9     70      0      0      0     29     25   1174      0      0
08:46:37      0      2     70      0      0      0      0      2    134      0      0
In this last example we see what detail format looks like where we see multiple lines of output for a partitular type of data, which in this case is interrupts. We've also elected to show the time in msecs as well.
[mjs@poker] collectl -sJ -oTm

#              Int    Cpu0   Cpu1   Cpu2   Cpu3   Type            Device(s)
08:52:32.002   225       0      4      0      0   IO-APIC-level   ioc0
08:52:32.002   000    1000      0      0      0   IO-APIC-edge    timer
08:52:32.002   014       0      0     18      0   IO-APIC-edge    ide0
08:52:32.002   090       0      0      0  15461   IO-APIC-level   eth1
Output can also be saved in a rolling set of logs for later playback or displayed interactively in a variety of formats. If all that isn't enough there are additional mechanisms for supplying data to external tools by generating output as s-expressions, a format of choice for some tools such as supermon or in another format called list-expressions. This output can be written to a file or sent over a socket. Collectl can even send data to ganglia or by using collectl's API, collect almost any type of data of your choice AND do it efficiently. You can even create files in space-separated format for plotting with external packages like gnuplot. The one below was created with colplot, part of the collectl utilities project, which provides a web-based interface to gnuplot.

Collectl runs on all linux distros (it's included as part of Fedora) and only requires perl. If the perl Time::Hires module is installed, you will be able to use fractional intervals and display timestamps in msecs. If the Compress::Zlib module is installed the recorded data will be compressed and therefore use on average 90% less storage when recording to a file. Also note that the above links are not for RPMs. If you'd rather work with RPMs there are far too many versions out there to link to and so I'm sorry to say you're on your own.

Did you know there was an inconsistency in the way Linux reported disk metrics that wasn't even noticed/fixed until the 2.6-14 kernel was released? Collectl did. Or how about the fact that network stats may not accurately reported by most network monitoring tools at one second intervals? See this page for a description of the problem and how you can get more accurate stats by simply running collectl at a sub-second interval.

If you're still not sure if collectl is right for you, take a couple in minutes to look at the tutorial to get a better feel for what collectl can do. Also be sure to check back and see what's new on the website, sign up for a Mailing List or watch the Forums.

"I absolutely love it and have been using it extensively for months."

Kevin Closson: Performance Architect, Oracle Corporation

"Collectl is indispensable to any system admin."

Matt Heaton: President, Bluehost.com