Collectl |
||
Latest Version: 3.6.1 Feb 20 2012 |
||
![]() |
Graphite support, run as non-root daemon, more switches
There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation.
Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.
The following is an example taken while writing a large file and running the collectl command with no arguments. By default it shows cpu, network and disk stats in brief format. The key point of this format is all output appears on a single line making it much easier to spot spikes or other anomalies in the output:
[mjs@poker] collectl #<--------CPU--------><-----------Disks-----------><-----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes netKBi pkt-in netKBo pkt-out 37 37 382 188 0 0 27144 254 45 68 3 21 25 25 366 180 20 4 31280 296 0 1 0 0 25 25 368 183 0 0 31720 275 2 20 0 1
[mjs@poker] collectl -sjmf -oT # <-------Int--------><-----------Memory-----------><------NFS Totals------> #Time Cpu0 Cpu1 Cpu2 Cpu3 Free Buff Cach Inac Slab Map Reads Writes Meta Comm 08:36:52 1001 66 0 0 2G 201M 609M 363M 219M 106M 0 0 5 0 08:36:53 999 1657 0 0 2G 201M 1G 918M 252M 106M 0 12622 0 2 08:36:54 1001 7488 0 0 1G 201M 1G 1G 286M 106M 0 20147 0 2
[mjs@poker] collectl -sn --verbose -oT # NETWORK SUMMARY (/sec) # KBIn PktIn SizeIn MultI CmpI ErrIn KBOut PktOut SizeO CmpO ErrOut 08:46:35 3255 41000 81 0 0 0 112015 78837 1454 0 0 08:46:36 0 9 70 0 0 0 29 25 1174 0 0 08:46:37 0 2 70 0 0 0 0 2 134 0 0
[mjs@poker] collectl -sJ -oTm # Int Cpu0 Cpu1 Cpu2 Cpu3 Type Device(s) 08:52:32.002 225 0 4 0 0 IO-APIC-level ioc0 08:52:32.002 000 1000 0 0 0 IO-APIC-edge timer 08:52:32.002 014 0 0 18 0 IO-APIC-edge ide0 08:52:32.002 090 0 0 0 15461 IO-APIC-level eth1

Collectl runs on all linux distros (it's included as part of Fedora) and only requires perl. If the perl Time::Hires module is installed, you will be able to use fractional intervals and display timestamps in msecs. If the Compress::Zlib module is installed the recorded data will be compressed and therefore use on average 90% less storage when recording to a file. Also note that the above links are not for RPMs. If you'd rather work with RPMs there are far too many versions out there to link to and so I'm sorry to say you're on your own.
Did you know there was an inconsistency in the way Linux reported disk metrics that wasn't even noticed/fixed until the 2.6-14 kernel was released? Collectl did. Or how about the fact that network stats may not accurately reported by most network monitoring tools at one second intervals? See this page for a description of the problem and how you can get more accurate stats by simply running collectl at a sub-second interval.
If you're still not sure if collectl is right for you, take a couple in minutes to look at the tutorial to get a better feel for what collectl can do. Also be sure to check back and see what's new on the website, sign up for a Mailing List or watch the Forums.
"I absolutely love it and have been using it extensively for months." |
| Kevin Closson: Performance Architect, Oracle Corporation |
"Collectl is indispensable to any system admin." |
| Matt Heaton: President, Bluehost.com |