Collectl

SourceForge.net Logo
 

Latest Version: 2.6.2 Apr 29, 2008

 

Home | Architecture | Features | Documentation | Releases | FAQ | Support
Support for Lustre 1.6.4.3 and OFED 1.3

There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation.

Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp. The following is an example of simply running the collectl command with no arguments and using its default settings. Below we see what the cpu, network and disk were doing while writing a large file:

#<--------CPU--------><-----------Disks-----------><-----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads  KBWrit Writes netKBi pkt-in  netKBo pkt-out
  37  37   382    188      0      0   27144    254     45     68       3      21
  25  25   366    180     20      4   31280    296      0      1       0       0
  25  25   368    183      0      0   31720    275      2     20       0       1
Output can also be saved in a rolling set of logs for later playback or displayed interactively in a variety of formats. If all that isn't enough there are additional mechanisms for supplying data to external tools via a socket interface or by generating its output as s-expressions, a format of choice for some tools such as supermon. You can even create files in space-separated formats for plotting with external packages like the one below which was done with gnuplot using 1 second samples.

Collectl runs on all linux distros and only requires perl. If the perl Time::Hires module is installed, you will be able to use fractional intervals and display timestamps in msecs. If the Compress::Zlib module is installed the recorded data will be compressed and therefore use on average 90% less storage when recording to a file. Also note that the above links are not for RPMs. If you'd rather work with RPMs there are far too many versions out there to link to and so I'm sorry to say you're on your own.

Did you know there was an inconsistency in the way Linux reported disk metrics that wasn't even noticed/fixed until the 2.6-14 kernel was released? Collectl did. Or how about the fact that network stats may not accurately reported by most network monitoring tools at one second intervals? See this page for a description of the problem and how you can get more accurate stats by simply running collectl at a sub-second interval.

If you're still not sure if collectl is right for you, take a couple in minutes to look at the tutorial to get a better feel for what collectl can do. Also be sure to check back and see what's new on the website, sign up for a Mailing List or watch the Forums.