DaemonCommands = -f /var/log/collectl -r00:01,7 -m -F60 -s+CEYZ
./configure make make installThat's all it takes. However, collectl must be run as root and your system must support ipmi. The easiest way to tell is if the command dmidecode | grep IPMI runs without error. If you get the error Could not open device at /dev/ipmi0... you are not running as root. If you get some other error your system probably does not support ipmi and even if you were able to install impitool, you won't be able to use it.
The next step is to start the ipmi driver, and this is generally done via the command service ipmi start on a RedHat system or something line /etc/init.d/impi start on others. On some systems such as HP blades, you may need to install a custom ipmi driver such as hp-OpenIPMI and start that instead of the standard driver.
At this point you should be able to execute the command "ipmitool sdr and see a all your sensor data or the commands ipmitool sdr type fan and ipmitool sdr type temp to just see fan and temperature data:
[root@bl460-63 ipmitool-1.8.9]# ipmitool sdr UID Light | 0 unspecified | ok Int. Health LED | 0 unspecified | ok VRM 1 | 0 unspecified | cr VRM 2 | 0 unspecified | cr Temp 1 | 47 degrees C | ok Temp 2 | 34 degrees C | ok Temp 3 | 30 degrees C | ok Temp 4 | 30 degrees C | ok Temp 5 | 31 degrees C | ok Temp 6 | 30 degrees C | ok Temp 7 | 30 degrees C | ok Temp 8 | 66 degrees C | ok Temp 9 | 20 degrees C | ok Virtual Fan | 37.24 unspecifi | nc Enclosure Status | 0 unspecified | nc
You can control the way ipmi data is displayed in playback mode using --envopts and one of 3 switches that allow you to only report fan or temperature data and if you are reporting both, which is the default, you can request the 2 types of data be displayed on separate lines. This latter option can be useful if you have a lot of devices on which to report.
The following is an example of time-stamped output on an HP BL460c Blade, first without any options
collectl.pl -sE -i::1 -oT # ENVIRONMENTAL STATISTICS # VFan Temp1 Temp2 Temp3 Temp4 Temp5 Temp6 Temp7 Temp8 Temp9 Power 08:39:15 37.240 47 35 30 30 33 30 30 58 24 206 08:39:16 37.240 47 35 30 30 33 30 30 58 24 206 08:39:17 37.240 47 35 30 30 33 30 30 58 24 206
collectl.pl -sE -i::1 -oT --envopts M ### RECORD 1 >>> opteron167 <<< (1218022891.002) (Wed Aug 6 07:41:31 2008) ### # ENVIRONMENTAL STATISTICS # CFAN1 CFAN2 CFAN3 CFAN4 CFAN5 CFAN6 CFAN7 CFAN8 CFAN9 CFAN10 SFAN1 SFAN2 6200 6000 6200 6200 6200 5800 6200 6000 6200 6000 6000 6200 # CTEMP0 CTEMP1 STEMP 51 48 29 ### RECORD 2 >>> opteron167 <<< (1218022892.002) (Wed Aug 6 07:41:32 2008) ### # ENVIRONMENTAL STATISTICS # CFAN1 CFAN2 CFAN3 CFAN4 CFAN5 CFAN6 CFAN7 CFAN8 CFAN9 CFAN10 SFAN1 SFAN2 6200 6000 6200 6200 6200 5800 6200 6000 6200 6000 6000 6200 # CTEMP0 CTEMP1 STEMP 51 48 29
Fan 1 Fans CPU FAN1 SYS FAN1 Fan1A (CPU) FAN CPU0 FAN MOD 1A RPM Fan RedundancyOn the one hand, collectl could simply report the exact names as they are reported, but the challenge of trying to format them in such a way as to provide a compact display are impossible. Given that the collectl standard reporting format is a single data header line, the notion of multiple-line headers is not an option. While it is tempting to simply determine the widest device name and use that for a header width, for systems that report over a dozen devices you couldn't fit them on the same line and that's only for systems that have been tested.
After looking at all these different names and formats, one common theme did emerge. All devices appear to have optional numbers (I didn't see any with just letters) and those numbers if there have optional letters. Furthermore, there seems to be some sort of optional type associated with many as well. This led to the idea of a standard naming for these devices as follows:
[type]Fan|Temp[devicenumber[deviceletter]]
in which the type field would be limited to a single character. Applying this scheme to the examples above leads to the following name mapping:
Fan 1 Fan1 Fans Fan CPU FAN1 CFAN1 SYS FAN1 SFAN1 Fan1A (CPU) CFan1A FAN CPU0 CFAN0 FAN MOD 1A RPM MFAN1A Fan Redundancy RFanThis is admittedly not perfect but seems like a reasonable compromise and since collectl will report the device names in the same order returned by ipmitool it is not all that difficult to figure out how collectl chose to map them.
After examing many different types of device name formats, it was determined that most tended to follow the pattern of
prefix type instanceNumber suffix
Where things get a little crazy is that sometimes the actual instance number can be part of the prefix OR sometimes the instance contains a letter.
All that said, collectl breaks a device name into these components, assuming a numeric instance. It then applies the minimal set of tests/modifications, note there are examples of all these cases in the sample names shown earlier:
Fan CPU0 Tach,3480 Prefix: Name: Fan Instance: Suffix: CPU0 Tach Fan1A (CPU),EAh,ok,29.3,Performance Met Prefix: Name: Fan Instance: 1 Suffix: A (CPU) FAN MOD 1A RPM,5775,RPM,ok Prefix: Name: FAN Instance: Suffix: MOD 1A RPM
If your system is not in this standard set, you can either add your own rules to /usr/share/collectl/envrules.std (assuming your system type can be obained through dmidecode) or put them in a standalone file and tell collectl to use it instead of the standard one using --envrules. If you do use your own file it should simply contains line of the following form (no stanza preface) noting that spaces and comments (lines preceeded with a #) are permitted:
[ignore] /pattern/ ... [pre] /pattern1/replace1/ /pattern2/replace2/ ... [post] /pattern1/replace1/ /pattern2/replace2/ ...If you know perl (and you really should if you want to do this), collectl builds a perl pattern match command if you specify [ignore] and ignores any strings returned by ipmitool that match. This is a good way to reduce the volume of sensors on systems that may have dozens of them and you're only interested in a specific subset.
In the cases of [pre] and [post] a perl substistituion command is built out of the pattern and replace strings and applied to the sensor names. There is one caveat about [post] and that is it only applies to the actual derived sensor name and not the instance, so it you want to change a specific instance consider using a pre string to make a unique sensor name and then change it to what you really want with post. So looking at the string
FAN MOD 1A RPMand the processing rules described in the previous section, the MOD suffix will be prepended to FAN and the first letter used to name the device MFAN, losing the instance information with is 1A.
There are at least 3 options here. The first is to simply remove MOD from each name which we can do with the rule:
/ MOD//which will result in the instance names being picked up correctly because they will now immediately follow FAN. In fact, if you include --envdebug along with your rules you'll see the results of the replacement:
FAN MOD 1A RPM,5775,RPM,ok Pre-Remapped 'FAN MOD 1A RPM' to 'FAN 1A RPM' Prefix: Name: FAN Instance: 1 Suffix: A RPM
/(.*) MOD (.*)/MOD $1$2/and results in the following parsing:
FAN MOD 1A RPM,5775,RPM,ok Pre-Remapped 'FAN MOD 1A RPM' to 'MOD FAN 1A RPM' Prefix: MOD Name: FAN Instance: 1 Suffix: A RPMUnfortunately in order to make perl iterpret the $1$2 symbols an eval is required which generates a little extra overhead and while not horrible an even better solution is the third option which doesn't use any special $ symbols:
/FAN MOD/MOD FAN/which produces exactly the same results as the previous example except without the eval command.There is in fact at least one other mechanism for those that are not all that familiar with perl and is only being included for completeness, and that is to simply hardcode the replacement of each device with the desired output. In other words
/FAN MOD 1A RPM/MOD FAN1 A/ /FAN MOD 2A RPM/MOD FAN2 A/ /FAN MOD 3A RPM/MOD FAN3 A/ etcwill produce strings that can also be properly parsed without involved $ variables but this means you need to specify each unique device name to remap and it will also result in all pattern matching statements to be executed for each device which will also result in slightly more overhead.
Power Monitoring
Currently all the systems that power monitoring has been testing on report it as the field Power Meter and without more examples, the parsing is currently set up to specifically look for that field.
Performance and Alternate IMPI Devices
In some situations there may be multiple ipmi devices over which to communicate and if so, the default one may not necessarily be the fastest one. If you thing the ipmi commands are taking too long to execute, try a simple experiment like this:
ipmitool sdr dump /tmp/xxx time for i in `seq 1 10`; do ipmitool -S /tmp/xxx sdr > /dev/null; done; real 0m20.476s user 0m0.004s sys 0m0.015sAs you can see, even though the command only used 0.02 seconds of CPU time, the elapsed time was over 20 seconds, a good indication something is not right. If look in /proc/ipmi you make see more than one directory as in the following case:[root@hpdc3dmgt1 ~]# ls /proc/ipmi 0 1This means there are 2 different IPMI devices and since the default is one, let's try repating the command above on the other device. Also notice that since we've already initialized our cache file we do not need to reissue the ipmitool sdr dump command:time for i in `seq 1 10`; do ipmitool -S /tmp/xxx -d1 sdr > /dev/null; done; real 0m0.487s user 0m0.004s sys 0m0.013sSee how the elapsed time is only a fraction using device 1? To tell collectl to use this device instead of the default, simply specify the number in the --envopts switch, for example collectl -sE --envopts 1
Restrictions
Some systems report what appears to be device codes in the data field and the data in the 4th field and I don't know why. For now, when this occurs report the 4th column as the data instead. If this breaks other things it will have to be removed and invalid data reported for those who do not report it in column 2.
updated June 25, 2010 |