Lustre

Overview

The first thing to understand about lustre reporting is in most cases, where one has configured the server(s) and just wants to monitor them, all one need do is specify -sl or -sL and collectl will do the right thing. It will automatically detect the type of service(s) currently running and will either record or display the appropriate data. If you select -sl and the system doesn't have lustre installed, it will warn you and then disable that switch.

Controlling Which Data is Displayed

Lustre records a wealth of performance data, far more than makes sense to display all the time, and so by default collectl displays minimal information such as bytes/operations read and written. At the client detail level lustre can differentiate this data at the filesystem and even the OST level! In order to accommodate the broadest flexibility one is allowed to control the way data is collected/displayed via several complementary switches.

In the spirit of letting the user display whatever they want to, collectl will allow one to select multiple values for --lustopts and it will try to display the results appropriately. Perhaps the easiest thing to do is just experiment and in most cases you'll get what you're looking for. There are a few combinations of -s and --lustopts that do not make sense and if you choose one, you will be told.

What About Playback?

As is always the case with playback, unless otherwise told to do something else, collectl will playback its recorded data based on the parameters selected for collection. In other words, if you specify --lustopts OBR in record mode, collectl will record both RPC buffer and read_ahead stats. When you play the data back, it will then display both as well. However, you also have the option of specifying --lustopts, typically thought of as a collection-only switch, and it will force the output to what you'd like it to be. If you select a statistics type that hasn't been recorded, that information will be displayed, but as zeros.

Recognizing Service Configuration Changes

In some cases lustre services may change after collectl starts. In fact, it may not even be running and if so you'll get a message telling you it is not and that collectl cannot determine the system type since it could be a client, MDS, OSS or some combination. This includes services starting and stopping as well as the configurations of those services themselves changing. For example one might occasionally mount/umount different lustre filesystems on a client. Not to worry. Collectl periodically checks for configuration changes and automatically adjusts the data it collects as well as anything it may be currently displaying. However this can also lead to the output format changing. If you know that the system type could change and you simply want to force the type of output to be consistent, use --lustsvc as described in the next section.

Changing the Default Recording/Display Behavior

There are some times when you want specific control over what data is recorded or displayed rather than the default behavior OR collectl starts before lustre does and it can't determine the type of system it is. This is typically the case when a system is playing multiple roles by providing more than one service. For example, if a system has been configured as both an OSS and a client, every time you run collectl you will collect or display data about both and sometimes this is NOT what you want. There may be other times where you have developed some reports or graphs that expect data in a standard format and you've collected a subset (or superset) of data.

To override this behavior of the lustre portion of the data (remember you can control the displaying of individual subsystems with -s), use --lustsvc to specify the type of service(s) you're interested in and collectl will only pay attention to those, both for recording to a file as well as display. Naturally when displaying data for services you never collectled data on, those services will print as zeros.

If all this sounds confusing, just experiment with various combinations of -s, --lustopts and --lustsvcs and observe the behavior.
updated Mar 26, 2010