Lustre
Overview
The first thing to understand about lustre reporting is in most cases, where one has configured
the server(s) and just wants to monitor them, all one need do is specify -sl or -sL and
collectl will do the right thing.
It will automatically detect the type of service(s) currently running and will
either record or display the appropriate data.
If you select -sl and the system doesn't have lustre installed, it will warn
you and then disable that switch.
Controlling Which Data is Displayed
Lustre records a wealth of performance data, far more than makes sense to
display all the time, and so by default collectl displays minimal information
such as bytes/operations read and written. At the client detail level lustre
can differentiate this
data at the filesystem and even the OST level! In order to accommodate the broadest flexibility
one is allowed to control the way data is collected/displayed via several complementary
switches.
- -s: As is normally the case, one can specify '-sl' for summary level data, '-sL' for
detail data or combine them to get both. However, since the client detail data can actually
be presented at the individual filesystem or OST level, there is an option to show the OST
level details (filesystem details are the default) see --lustopts O.
- --lustopts: This switch is used to provide further detail about the types of data that
is to be collected/displayed. There are 5 such values that collectl cares about:
- B - rpc buffer level data.
- D - disk block statistics, which applies to both MDS and OSS servers. One should also note
this is specific to HP SFS and this data is not available in the open source version.
- M - client metadata (note that this was the default prior to collectl V1.6.2).
- O - for client details only, show results by OST
- R - read_ahead statistics. Unlike the other options, which generate a lot of data,
--lustopts R may be used with brief mode.
As it turns out, nothing is quite as simple as it seems and while the following
case is not typical, it needs to be addressed for completeness. Since collectl allows one to
collect one set of data and to later display a different set,
consider what happens in one were
to collect multiple types of lustre data for a client using --lustopts MR, but then just play back
the basic client data which is collected without specifying --lustopts.
By default, playback mode defaults
to the settings data was collected with and to change the display one needs to explicitly
change those settings. To meet this need, use --lustsvc, which is described in more
detail later.
In the spirit of letting the user display whatever they want to, collectl will allow one to
select multiple values for --lustopts and it will try to display the results appropriately.
Perhaps the easiest thing to do is just experiment and in most cases you'll get what you're
looking for. There are a few combinations of -s and --lustopts that do not
make sense and if you choose one, you will be told.
What About Playback?
As is always the case with playback, unless otherwise told to do something else, collectl
will playback its recorded data
based on the parameters selected for collection. In other words, if you specify
--lustopts OBR
in record mode, collectl will record both RPC buffer and read_ahead stats. When you play the
data back, it will then display both as well. However, you also have the option of specifying
--lustopts, typically thought of as a collection-only switch, and it will force the output to what
you'd like it to be. If you select a statistics type that hasn't been recorded,
that information will be displayed, but as zeros.
Recognizing Service Configuration Changes
In some cases lustre services may change after collectl starts. In fact, it may not
even be running and if so you'll get a message telling you it is not and that collectl
cannot determine the system type since it could be a client, MDS, OSS or some
combination. This includes services starting and stopping as well as the configurations
of those services themselves changing.
For example one might occasionally mount/umount different lustre filesystems on
a client. Not to worry. Collectl
periodically checks for configuration changes and automatically adjusts the data it collects
as well as anything it may be currently displaying. However this can also lead to the
output format changing. If you know that the system type could change and you simply want to
force the type of output to be consistent, use --lustsvc as described
in the next section.
Changing the Default Recording/Display Behavior
There are some times when you want specific control over what data is recorded or
displayed rather than the default behavior OR collectl starts before lustre does and
it can't determine the type of system it is. This is typically the case when a system
is playing multiple roles by providing more than one service. For example, if a system
has been configured as both an OSS and a client, every time you run collectl you will collect
or display data about both and sometimes this is NOT what you want. There may be other times
where you have developed some reports or graphs that expect data in a standard format and
you've collected a subset (or superset) of data.
To override this behavior of the lustre portion of the data (remember you can
control the displaying of individual subsystems with -s), use --lustsvc to
specify the type of service(s) you're interested
in and collectl will only pay attention to those, both for recording to a file as well as
display. Naturally when displaying
data for services you never collectled data on, those services will print as zeros.
If all this sounds confusing, just experiment with various combinations of -s,
--lustopts and --lustsvcs and observe the behavior.