Infiniband

Monitoring

Collectl V3.7.3 now supports monitoring infiniband by looking at 64 bit counters, when the HCA supports them and virtually all of them do. This means several things:

The easiest way to tell if your HCA supports 64 bit counters is to run perfquery -x and if it works, you have 64 bit counters. Alternatively you could also run:

collectl -sx --showheader

and if it displays an X in the flag field, you have them. If you do have 64 bit counters but collectl doesn't report the X, you have an older version installed. The code to deal with 32 bit counters will be left in place for awhile but eventually removed. The rest of this documentation talks about monitoring the narrower counters and is largely unchanged from before.

32 Bit Counters

The most important thing you should know about 32 bit monitoring is that it is destructive. What is meant by this is that every time collectl reads the counters from the HCA it immediately resets them to zero, thereby destroying their previous contents. You should also note this does not apply error counters, which are never reset.

The obvious question is why? and perhaps the less than obvious answer is because when the hardware specifications were written for the Infiniband HCAs it was decided that performance counters would not wrap, probably because nobody thought someone might want to do continuous sampling. In any event, at even modest traffic rates HCAs with 32-bit counters quickly reach their maximum values and stop incrementing, rendering them useless for performance monitors like collectl. Collectl's solution to this problem is to read the counters and immediately reset them to 0. As long as the next sampling period occurs before the counters fill up, this methodology comes reasonably close to reflecting the traffic rates (some counts are lost between the read and reset).

However, this methodology has a downside in that while collectl is monitoring the Infiniband stats, nobody else can (including other copies of collectl). Unfortunately there is no solution to this problem short of redesigning the HCA and that's simply not going to happen. A second alternative would be to come up with a mechanism in which the read/rest of the counters are moved into an OFED module which exports these to /proc or /sys as rolling counters. This was in fact done in a pre-ofed version of Voltaire's IB stack which is currently supported by collectl. If someone would like to hear more details on how this was done, feel free to contact me or to post something in a collectl forum or to the mailing list.

If you want to run collectl but also prevent it from doing destructive monitoring, simple comment out the line in /etc/collectl.conf that begins with PQuery = and you will be informed by collectl that Infiniband monitoring has been disabled whenever someone try to monitor it.

Monitoring Mechanics

The main purpose of this section is to help you understand how monitoring works so when it doesn't you might be able to figure out what went wrong. There are 2 different ways collectl can monitor Infiniband, one for the OFED stack, which is the Infiniband Stack of choice these days and the other for pre-OFED.

OFED

The OFED stack can be identified by the presence of the /sys/class/infiniband directory. If there, collectl looks inside to find which HCAs are present and which ports are active. This information is then used to query the HCA via the perfquery utility.

Unfortunately, with each release of OFED that utility seems to move to another location and collectl tries to react by using a search path in /etc/collectl.conf. As of the 2.5.1 release of collectl, if it still can't find the utility it will try to find its location with rpm and then add its path to collectl.conf. If a future OFED release eliminates or replaces perfquery collectl will break.

Pre-OFED

All pre-OFED monitoring code has been removed.

Debugging

Collectl has a variety of debugging capabilities built into it, the main one being the debug switch -d. To use this switch you specify a bit mask which is then applied against a variety settings which tells collectl what to display. For debugging interconnect problems simply use -d2. All possible bit settings and their meanings are listed in the beginning of collectl itself.

If collectl runs without errors but you're not seeing IB traffic being reported when you think you should, you can always use -d4 or even -d6, which show the values of the counters returned by both perfquery and get_pcounter. If they don't change something outside of collectl must be wrong.

One example of a non-collectl problem was a system had IB configured and started which could be verified by seein an ib0 interface show up with ifconfig. However, when running collectl -sN, which will show the traffic over all the network interfaces, there was never any traffic on the ib interface however there was unexpected traffic on one of the eth interfaces. Clearly something was wrong and looking at the routing showed the routes were set such that all traffic to the infiniband address was being routed over the eth interface.
updated Feb 04, 2014