Exporting Data to Graphite

Introduction

With the release of Collectl Version 3.6.1, you can now send collectl data directly to graphite . For existing collectl users this now provides you with yet another way to store/plot collectl data, whether on a single system or hundreds. For graphite users who are not yet collectl users, you now have access to literally hundreds of performance metrics:

Usage

You use this export like any other, the only required option being the address to send the data to as in the following example:

collectl --export graphite,192.168.1.113
However you should also note that since by design this export does not provide any terminal output, there are only 2 real ways to make sure it is doing what you expect, the first being to inspect graphite's whisper storage area for your particular host name and make sure the data you're collecting is in fact showing up there:
ls /opt/graphite/storage/whisper/poker
cpuload  cputotals  ctxint  disktotals  nettotals
or to simply run with the debug mask set to 1, which tells the graphite module to echo all the data it is sending to graphite, noting in this case even though collectl is collecting cpu, disk and network data we're not sending cpu data to graphite. This is something you might do if logging more data to disk than you are sending to graphite, which in this case we are:
collectl --export graphite,192.168.1.113,d=1,s=dn -rawtoo -f /var/log/collectl
poker.disktotals.reads 0 1325609789
poker.disktotals.readkbs 0 1325609789
poker.disktotals.writes 0 1325609789
poker.disktotals.writekbs 0 1325609789
poker.nettotals.kbin 0 1325609789
poker.nettotals.pktin 1 1325609789
poker.nettotals.kbout 0 1325609789
poker.nettotals.pktout 0 1325609789
tip - if you add 8 to the debug flag, eg d=9, this tells the graphite module not to actually establish the connection with graphite's carbon listener but to only echo the data that would have been sent.

Once you're happy with the switch settings, be sure to update the DaemonCommands in /etc/collectl.conf and restart the collectl daemon to make them take effect.

Switches unique to graphite

e=escape
When sending data to graphite, collectl prefaces each line item with the hostname. If that name includes a domain name, extra dots add additional levels the the variable names which may not be desireable. By including an escape character, those dots will be replaced by that character.

r=seconds
By design, collectl calls the export module as soon as the required data has been collected and collection is synchronized to the nearest milli-seconds across a cluster, this means all instances of collectl will send their data to graphite at almost exactly the same time. This high burst of data can overwhelm graphite and so to reduce the load when that is found to be a problem, OR if you just want to smooth out the load you can use r=seconds which literally means delay sending your data to ganglia by a random number of micro-seconds <= seconds.

There is an additional caveat and that is that this stall must have completed by the end of the current data collection periods and so you're restricted to a maximum delay of the interval less 1 second. This means if you run collectl with -i1, you can't use -r. However, since most users run collectl with intervals of 5 or 10 seconds, values of 4 or 9 should be more than sufficient. And if you choose a collection interval of 30 seconds you may still want to use a value of r closer to 5 or 10 seconds so that the data will arrive at graphite reasonablly close together.

For help with what other valid switches are, you can actually get the graphite module itself to tell you like this:

collectl --export graphite,h

Communications

Collectl will attempt to establish a TCP connection to the specified address/port, noting the default port is 2003. If that connection cannot be established, collectl will report an error but not exit! This is because graphite itself may be down and need to be restarted.

collectl --export graphite,192.168.1.113,d=1,s=dn
Could not create socket to 192.168.1.113:2003.  Reason: Connection refused
By design when collectl assumes the graphite address is correct and will try to reconnect every monitoring interval. Further, to avoid generating too many errors, it will silently continue to retry and only report the connection failure every 100 times, a constant you can modify in the graphite.ph header if you really care. Once graphite comes back online collectl will again start sending data to it.

updated November 9, 2012