The second major form of logging is writing data to one or more tabularized, also known as plottable files, which have the extension tab for data associated with the core subsystems or one of several other files for the detail data associated with devices like cpus, disks, networks, etc.
The biggest benefit of raw files is they are very lightweight to create in that no additional processing is performed on the data. Since they contain the unaltered /proc data from which collectl derives its numbers to report, it is always possible to go back and look at the orginal data. In some cases, there is data in the raw file that was easier to collect than ignore and in these situations one can actually see more data than is normally available. In fact the --grep switch is available for looking for data in the raw files and prefacing them with timestamps, something the standard grep command cannot do.
As their type implies, plottable files have their data in a form that is ready to be plotted with tools like gnuplot or immediately loadable into a spreadsheet like OpenOffice or Excel or any other tool that can read space-separated data. When generated by collectl while it is running, this data can be read while it is being generated making it possible to do real-time monitoring/display of it. For situations where a tool requires data be delimited by something other than spaces, one can change the data separator with --sep. In fact, for the case where a tool such as rrd requires the date be in UTC format, you can even change the timestamp format using --utc.
While large files are nothing new to collectl, playing them back either for the purpose of drilling into the data or to simply generate plot files can become very expensive in terms of time and CPU load. In extreme some cases it can take tens of minutes to process a single, large raw file and even in normal cases it will take multiple minutes. Having collectl write to 2 separate files doesn't add any additional overhead or disk space but can significantly reduce the playback time when you are not interested in slab or process data, which is often the case during initial analysis. As a data point, on my development system, single compressed collectl logs are on the order of 35MB. When using the -G switch, it generated a pair of files where the process/slab data is about 34MB and the file with the rest of the data is only 1MB making the raw, where all the subsystem details are stored, very efficient to process in playback mode, taking about a minute compared to 5 minutes when that file includes slab and process data.
For most users this is all you need to know. On the other hand if want to use collectl to feed data to other tools or perhaps log to both raw and plot files at the same time, read on...
The main benefit in requesting collectl to write its data in plottable form is that data becomes available for immediate plotting without any post-processing required, the one expense being some additional processing overhead. However there are a few potential limits in doing so that should be understood.
First and foremost, once a plottable file has been created the original data from which it was created is lost forever. In many cases that is fine as many users feel there is really no need to go back to the original source. However, one often collects summary data because that is what they are interested in, but then later decides they want to look at the details. This can be easily done by just replaying the raw file and requesting details be displayed or (re)written to a plottable file. If the raw file had not been generated, this option would not be possible.
A second limitation with plottable data files is that one cannot easily examine the data by timeframes and when there are multiple data files involved, it is not easy to look at all the data together as time-oriented samples without plotting it. It is always possible to write a script that merges this data together, but that functionality is natively built into collectl when used in playback mode.
Finally, there are times when one might wish to go back and look at non-normalized data, for example if one has 3 processes created over a 10 second period collectl will report a rate of 0 process creations/second because it would round down and the only way to see what really happened is to play the data back with -on, which tells collectl not to normalize the data and will therefore tell you the value of the counter not its rate.
In most cases none of these restrictions should be a concern, but there may be occasions in which they are and that is where the --rawtoo switch comes in. When specified in conjunction with -P, collectl will generate raw data in addition to the plottable data, making it possible to go back to the source if/when necessary. The only real overhead is the amount of disk space required since the raw data is already sitting in a buffer and ready to be written. If the plottable files are being generated in uncompressed format, the size of the compressed raw file becomes even less significant.
If you are a little confused, and you probably should be, try experimenting with various combinations of switches and see which files get generated.
|updated Feb 21, 2011|