Verbose Data

Data is reported in this form when either --verbose is used OR if there is at least one type of data requested that doesn't have a brief form such as any detail data or ionodes, processes or slabs. Specifying some of the lustre output options with --lustopts such as B, D and M will also force verbose format.

Buddy (Memory Fragmentation) Data, collectl -sb

# MEMORY FRAGMENTATION SUMMARY (4K pages)
#     1Pg    2Pgs    4Pgs    8Pgs   16Pgs   32Pgs   64Pgs  128Pgs  256Pgs  512Pgs 1024Pgs

This table shows the total number of memory fragments by pagesize in increasing powers of 2 for all the memory types.

CPU, collectl -sc

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# User  Nice   Sys  Wait   IRQ  Soft Steal Guest NiceG  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15 RunT BlkT

These are the percentage of time the system in running is one of the modes, noting that these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others may not be:

User Time spent in User mode, not including time spend in "nice" mode.

Nice Time spent in Nice mode, that is lower priority as adjusted by the nice command and have the "N" status flag set when examined with "ps".

Sys This is time spent in "pure" system time.

Wait Also known as "iowait", this is the time the CPU was idle during an outstanding disk I/O request. This is not considered to be part of the total or system times reported in brief mode.

Irq Time spent processing interrupts and also considered to be part of the summary system time reported in "brief" mode.

Soft Time spent processing soft interrupts and also considered to be part of the summary system time reported in "brief" mode.

Steal Time spent in other operating systems when running in a virtualized environment

Guest Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, new since 2.6.24

NiceG Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel), new since 2.6.33

This next set of fields apply to processes

Proc Process creations/sec.

Runq Number of processes in the run queue.

Run Number of processes in the run state.

Avg1, Avg5, Avg15 Load average over the last 1,5 and 15 minutes.

RunT Total number of process in the run state, not counting collectl itself

BlkT Total number of process blocked, waiting on I/O

Disks, collectl -sd

If you specify filtering with --dskfilt, the disks that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB   KBWrit WMerged Writes SizeKB

KBRead	KB read/sec
RMerged	Read requests merged per second when being dequeued.
Reads	Number of reads/sec
SizeKB	Average read size in KB
KBWrite	KB written/sec
WMerged	Write requests merged per second when being dequeued.
Writes	Number of writes/sec
SizeKB	Average write size in KB

Inodes/Filesystem, collectl -si

# INODE SUMMARY
#    Dentries      File Handles    Inodes
# Number  Unused   Alloc  MaxPct   Number
   40585   39442     576    0.17    38348

DCache
Dentries Number	Number of entries in directory cache
Dentried Unused	Number of unused entries in directory cache
Handles Alloc	Number of allocated file handles
handles % Max	Percentage of maximum available file handles
Inodes Number	Number of inodes in use

NOTE - as of this writing I'm baffled by the dentry unused field. No matter how many files and/or directories I create, this number goes up! Sholdn't it go down?

Infiniband, collectl -sx

# INFINIBAND SUMMARY (/sec)
#  KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut  Errors

KBIn	KB received/sec.
PktIn	Packets received/sec.
SizeIn	Average incoming packet size in KB
KBOut	KB transmitted/sec.
PktOut	Packets transmitted/sec.
SizeOut	Average outgoing packet size in KB
Errs	Count of current errors. Since these are typically infrequent, it is felt that reporting them as a rate would result in either not seeing them OR round-off hiding their values.

Lustre

Lustre Client, collectl -sl

There are several formats here controlled by the --lustopts switch. There is also detail data for these available as well. Specifying -sL results in data broken out by the file system and --lustopts O further breaks it out by OST. Also note the average read/write sizes are only reported when --lustopts is not specified.

# LUSTRE CLIENT SUMMARY
# KBRead  Reads SizeKB  KBWrite Writes SizeKB

KBRead	KB/sec delivered to the client.
Reads	Reads/sec delivered to the client, not necessarily from the lustre storage servers.
SizeKB	Average read size in KB
KBWrite	KB Writes/sec delievered to the storage servers.
Writes	Writes/sec delievered to the storage servers.
SizeKB	Average write size in KB

# LUSTRE CLIENT SUMMARY: METADATA
# KBRead  Reads KBWrite Writes  Open Close GAttr SAttr  Seek Fsynk DrtHit DrtMis

KBRead	KB/sec delivered to the client.
Reads	Reads/sec delivered to the client, not necessarily from the lustre storage servers.
KBWrite	KB Writes/sec delievered to the storage servers.
Writes	Writes/sec delievered to the storage servers.
Open	File opens/sec
Close	File closes/sec
GAttr	getattrs/sec
Seek	seeks/sec
Fsync	fsyncs/sec
DrtHit	dirty hits/sec
DrtMis	dirty misses/sec

# LUSTRE CLIENT SUMMARY: READAHEAD
# KBRead  Reads KBWrite Writes  Pend  Hits Misses NotCon MisWin FalGrb LckFal  Discrd ZFile ZerWin RA2Eof HitMax  Wrong

KBRead	KB/sec delivered to the client.
Reads	Reads/sec delivered to the client, not necessarily from the lustre storage servers.
KBWrite	KB Writes/sec delievered to the storage servers.
Writes	Writes/sec delievered to the storage servers.
Pend	Pending issued pages
Hits	prefetch cache hits
Misses	prefetch cache misses
NotCon	The current pages read that were not consecutive with the previous ones./td>
MisWin	Miss inside window. The pages that were expected to be in the prefetch cache but weren't. They were probably reclaimed due to memory pressure
LckFal	Failed grab_cache_pages. Tried to prefetch page but it was locked.
Discrd	Read but discarded. Prefetched pages (but not read by applicatin) have been discarded either becuase of memory pressure or lock revocation.
ZFile	Zero length file.
ZerWin	Zero size window.
RA2Eof	Read ahead to end of file
HitMax	Hit maximum readahead issue. The read-ahead window has grown to the maximum specified by max_read_ahead_mb

# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#RdK  Rds   1K   2K   ...  WrtK Wrts   1K   2K   ...

This display shows the size of rpc buffer distribution buckets in K-pages. You can find the page size for you system in the header (collectl --showheader).

RdK KBs read/sec

Rds Reads/sec

nK Number of pages of of this size read

WrtK KBs written/sec

Wrts Writes/sec

nK Number of pages of of this size written

Lustre Meta-Data Server, collectl -sl

As of Lustre 1.6.5, the data reported for the MDS had changed, breaking out the Reint data into 5 individual buckets which are the last 5 fields described below. For earlier versions those 5 fields will be replaced by a single one named Reint.

# LUSTRE MDS SUMMARY
#Getattr GttrLck  StatFS    Sync  Gxattr  Sxattr Connect Disconn Create   Link Setattr Rename Unlink

Getattr	Number of getattr calls, for example lfs osts. Note that this counter is not incremented as the result of ls - see Gxattr
GttrLck	These are getattrs that also return a lock on the file
StatFS	Number of stat calls, for example df or lfs df. Note that lustre caches data for up to a second so many calls within a second may only show up as a single statfs
Sync	Number of sync calls
Gxattr	Extended attribute get operations, for example getfattr, getfacl or even ls. Note that the MDS must have been mounted with -o acl for this counter to be enabled.
Sxattr	Extended attribute set operations, for example setfattr or setfacl
Connect	Client mount operations
Disconn	Client umount operations
Create	Count of mknod and mkdir operations, also used by NFS servers internally when creating files
Link	Hard and symbolic links, for example ln
Setattr	All operations that modify inode attributes including chmod, chown, touch, etc
Rename	File and directory renames, for example mv
Unlink	File/directory removals, for example rm or rmdir

The following display is very similar the the RPC buffers in that the sizes of different size I/O requests are reported. In this case there are requests sent to the disk driver. Note that this report is only available for HP's SFS.

# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds  RdK 0.5K   1K   ...  Wrts WrtK 0.5K   1K   ...

Rds	Reads/sec
RdK	KBs read/sec
nK	Number of blocks of of this size read
Wrts	Writes/sec
WrtK	KBs written/sec
nK	Number of blocks of of this size written

Lustre Object Storage Server, collectl -sl

# LUSTRE OST SUMMARY
# KBRead   Reads  SizeKB KBWrite  Writes  SizeKB

KBRead	KB/sec read
Reads	Reads/sec
SizeKB	Average read size in KB
KBWrite	KB/sec written
Writes	Writes/sec
SizeKB	Average write size in KB

Lustre Object Storage Server, collectl -sl --lustopts B

As with client data, when you only get read/write average sizes when --lustopt is not specified.

# LUSTRE OST SUMMARY
#<--------reads-----------|----writes-----------------
#RdK  Rds   1K   2K   ...  WrtK Wrts   1K   2K   ....

RdK	KBs read/sec
Rds	Reads/sec
nK	Number of pages of of this size read
WrtK	KBs written/sec
Wrts	Writes/sec
nK	Number of pages of of this size written

Lustre Object Storage Server, collectl -sl --lustopts D

# LUSTRE DISK BLOCK LEVEL SUMMARY
#RdK  Rds 0.5K   1K   ...   WrtK Wrts 0.5K   1K   ...

RdK	KBs read/sec
Rds	Reads/sec
nK	Number of blocks of of this size read
WrtK	KBs written/sec
Wrts	Writes/sec
nK	Number of blocks of of this size written

Memory, collectl -sm

# MEMORY SUMMARY
#<-------------------------------Physical Memory-------------------------------------><-----------Swap------------><-------Paging------>
#   Total    Used    Free    Buff  Cached    Slab  Mapped    Anon  Commit Locked Inact Total  Used  Free   In  Out Fault MajFt   In  Out

Total	Total physical memory
Used	Used physical memory. This does not include memory used by the kernel itself.
Free	Unallocated memory
Buff	Memory used for system buffers
Cached	Memory used for caching data beween the kernel and disk, noting direct I/O does not use the cache
Slab	Memory used for slabs, see collectl -sY
Mapped	Memory mapped by processes
Anon	Anonymous memory. NOTE - this is included with mapped memory in brief format
Commit	According to RedHat: "An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload."
Locked	Locked Memory
Inactive	Inactive pages. On ealier kernels this number is the sum of the clean, dirty and laundry pages.
Swap Total	Total Swap
Swap Used	Used Swap
Swap Free	Free Swap
Swap In	Kb swapped in/sec
Swap Out	Kb swapped out/sec
Fault	Page faults/sec resolved by not going to disk
MajFt	These page faults are resolved by going to disk
Paging In	Total number of pages read by block devices
Paging Out	Total number of pages written by block devices

Notes If you include --memopts R, memory and swap values wil be displayed as changes/sec between intervals rather than absolute values in addition to page fault information, which is already displayed as rates. This switch will also honor -on in that the values will not be normalized to a rate but rather displayed as changes in size per interval.

If you include --memopts with P or V, collectl will only display Physical or Virtual memory. The default is PV and will display both.

Memory, collectl -sm --memopts ps

The p and s options allow you to display data about page and/or steal and scan information. If you want this data combined with the standard physical or virtual data you must explicitly request them as well. The columns show how the memory is allocated for the respective sections.

# MEMORY SUMMARY
#<---Other---|-------Page Alloc------|------Page Refill-----><------Page Steal-------|-------Scan KSwap------|------Scan Direct----->
#  Free Activ   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move
    14M  136K     2    69   13M     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0

Network, collectl -sn

The entries for error counts in the following are actually the total of several types of errors. To get individual error counts, you must either include --netopts e or report details on individual interfaces in plot format by specifying -P. Transmission errors are categorized by errors, dropped, fifo, collisions and carrier. Receive errors are broken out for errors, dropped, fifo and framing errors.

If you specify filtering with --netfilt, the names that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO  ErrsO

KBIn	Incoming KB/sec
PktIn	Incoming packets/sec
SizeI	Average incoming packet size in bytes
MultI	Incoming multicast packets/sec
CmpI	Incoming compressed packets/sec
ErrsI	Total incoming errors/sec. This is an aggregation of incoming error counters. To see explicit error counters use --netopts e
KBOut	Outgoing KB/sec
PktOut	Outgoing packets/sec
SizeO	Average outgoing packet size in bytes
CmpO	Outgoing compressed packets/sec
ErrsO	Total outgoing errors/sec. This is an aggregation of outgoing error counters. To see explicit error counters use --netopts e

Network, collectl -sn --netopts e

This alternative format, which is displayed when you specify --netopts e enumerates the individual error types. You cannot see both output formats at the same time.

# NETWORK ERRORS SUMMARY (/sec)
#  ErrIn  DropIn  FifoIn FrameIn    ErrOut DropOut FifoOut CollOut CarrOut

ErrIn	Receive errors/sec detected by the device driver
DropIn	Receive packets dropped/sec
FifoIn	Receive packet FIFO buffer errors/sec
FrameIn	Receive packet framing errors/sec
ErrOut	Transmit errors/sec detected by the device driver
DropOut	Transmit packets dropped/sec
FifoOut	Transmit packet FIFO buffer errors/sec
CollOut	Transmit collisions/sec detected on the interface
CarrOut	Transmit packet carrier loss errors detected/sec

NFS, collectl -sf

As of version 3.2.1, by default collectl collects and reports on all versions of nfs data, both clients and servers. One can limit the types of data reported with --nfsfilt and if only server or client data has been selected, only that type of data will be reported as shown in the 2 forms below. When both server and client data are being reported they will be displayed side by side. As with brief format, if filters have been selected they will be displayed in the header.

# NFS SUMMARY (/sec)
#<---------------------------server--------------------------->
# Reads Writes Meta Comm  UDP   TCP  TCPConn  BadAuth  BadClnt

Reads	Total reads/sec
Writes	Total writes/sec
Meta	Total nfs meta data calls/sec, where meta data is considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, noting that not all types of nfs version report all as V3 clients/servers do.
Comm	Total commits/sec
UDP	Number of UDP packets/sec
TCP	Number of TCP packets/sec
TCPConn	Number of TCP connections/sec
BadAuth	Number of authentication failures/sec
BadClnt	Number of unknown clients/sec

# NFS SUMMARY (/sec)
#<----------------client---------------->
# Reads Writes Meta Comm Retrans  Authref

Reads	Total reads/sec
Writes	Total writes/sec
Meta	Total nfs meta data calls/sec, where meta data is considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, noting that not all types of nfs version report all as V3 clients/servers do.
Comm	Total commits/sec
Retrans	Number of retransmissions/sec
Authref	Number of authrefreshes/sec

NFS, collectl -sf -nfsopts C

The data reported for clients is slightly different, specifically the retrans and authref fields.

# NFS CLIENT (/sec)
#<----------RPC---------><---NFS V3--->
#CALLS  RETRANS  AUTHREF    READ  WRITE

Calls	Number of RPC calls/sec
Retrans	Retransmitted calls
Authref	Authentication failed
Read	Number of reads/sec
Write	Number of writes/sec

Slabs, collectl -sy

As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since there is not a 1:1 mapping between what it reports and the older slab allocator, the format of this listing will depend on which allocator is being used. The following format is for the older allocator.

# SLAB SUMMARY
#<------------Objects------------><--------Slab Allocation-------><--Caches--->
#  InUse   Bytes    Alloc   Bytes   InUse   Bytes   Total   Bytes  InUse  Total

Objects
InUse	Total number of objects that are currently in use.
Bytes	Total size of all the objects in use.
Alloc	Total number of objects that have been allocated but not necessarily in use.
Bytes	Total size of all the allocated objects whether in use or not.
Slab Allocation
InUse	Number of slabs that have at least one active object in them.
Bytes	Total size of all the slabs.
Total	Total number of slabs that have been allocated whether in use or not.
Bytes	Total size of all the slabs that have been allocted whether in use or not.
Caches
InUse	Not all caches are actully in use. This included only those with non-zero counts.
Total	This is the count of all caches, whether currently in use or not.

This is format for the new slub allocator

# SLAB SUMMARY
#<---Objects---><-Slabs-><-----memory----->
# In Use   Avail  Number      Used    Total

One should note that this report summarizes those slabs being monitored. In general this represents all slabs, but if filering is being used these numbers will only apply to those slabs that have matched the filter.

Objects

InUse The total number of objects that have been allocated to processes.

Avail The total number of objects that are available in the currently allocated slabs. This includes those that have already been allocated toprocesses.

Slabs

Number This is the number of individual slabs that have been allocated and taking physical memory.

Memory

Used Used memory corresponds to those objects that have been allocated to processes.

Total Total physical memory allocated to processes. When there is no filtering in effect, this number will be equal to the Slabs field reported by -sm.

Sockets, collectl -ss

# SOCKET STATISTICS
#      <-------------Tcp------------->   Udp   Raw   <---Frag-->
#Used  Inuse Orphan    Tw  Alloc   Mem  Inuse Inuse  Inuse   Mem

Used	Total number if socket allocated which can include additional types such as domain.
Tcp
Inuse	Number of TCP connections in use
Orphan	Number of TCP orphaned connections
Tw	Number of connections in TIME_WAIT
Alloc	TCP sockets allocated
Mem
Udp
Inuse	Number of UCP connections in use
Raw
Inuse	Number of RAW connections in use
Frag
Inuse
Mem

TCP, collectl -st

These are the counters one sees when running the command netstat -s, whose output is very verbose. Since this format is an attemt to compress those field names to 6 characters or less, sometime something gets lost in the translation. As described in the brief data formats, the actual TCP data displayed is based on the value of --tcpfilt and like brief data, everything is displayed on a single line which can be quite wide, even more reason to use this switch, espcially since the default format is over 200 columns wide! The following definitions are based the value of that filter:

--tcpfilt i

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<----------------------------------IpPkts----------------------------------->
# Receiv Delivr Forwrd DiscdI InvAdd   Sent DiscrO ReasRq ReasOK FragOK FragCr

Receiv		- total packets received/sec
Delivr		- incoming packets delivered/sec
Forwrd		- packets forwarded
DiscdI		- discarded incoming packets
InvAdd		- packets received with invalid addresses
Sent		- requests sent out/sec
DiscrO		- discarded outbound requests
ReasRq		- reassembled requests
ReasOK		- reassembled OK
FragOK		- fragments received OK
FragCr		- fragments created

--tcpfilt t

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<---------------------------------Tcp--------------------------------->
# ActOpn PasOpn Failed ResetR  Estab   SegIn SegOut SegRtn SegBad SegRes

ActOpn	- active connections opened/sec
PasOpn	- passive connection opened/sec
Failed	- failed connection attempts
ResetR	- connection resets received
Estab	- connections established
SegIn	- segments received/sec
SegOut	- segments sent out/sec
SegRtn	- segments retransmitted
SegBad	- bad segments received
SegRes	- resets sent

--tcpfilt u

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<------------Udp----------->
#  InDgm OutDgm NoPort Errors

InDgm	- packets received/sec
OutDgm	- packets sent/sec
NoPort	- packets received to unknown port
Errors	- packet receive errors

--tcpfilt c

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<----------------------------Icmp--------------------------->
# Recvd FailI UnreI EchoI ReplI  Trans FailO UnreO EchoO ReplO

Recvd	- ICMP messages received
FailI	- incoming ICMP messages failed
UnreI	- input destination unreachable
EchoI	- input echo requests
ReplI	- input echo reploes
Trans	- ICMP messages sent
FailO	- outbound ICMP messages failed
UnreO	- output destination unreachable
EchoO	- output echo requests
ReplO	- output echo replies

--tcpfilt T

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<------------------------------------------TcpExt----------------------------------------->
# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos  SackS

FasTim	- TCP sockets finished time wait in fast timer
Reject	- packet rejects in established connections because of timestamp
DelAck	- delayed ACKs sent
QikAck	- times quick ACK mode activated
PktQue	- packets directly queued to recvmsg prequeue
PreQuB	- bytes directly received in process context from prequeue
HdPdct	- packet headers predicted
AkNoPy	- acknowledgements for received packets not containing data
PreAck	- predicted acknowledgements
DsAcks	- DSACKS sent for old packets
RUData	- connections reset to do unexpected data
REClos	- connections reset due to early close
SackS	- SackShiftFallback

updated July 23, 2014

User	Time spent in User mode, not including time spend in "nice" mode.
Nice	Time spent in Nice mode, that is lower priority as adjusted by the nice command and have the "N" status flag set when examined with "ps".
Sys	This is time spent in "pure" system time.
Wait	Also known as "iowait", this is the time the CPU was idle during an outstanding disk I/O request. This is not considered to be part of the total or system times reported in brief mode.
Irq	Time spent processing interrupts and also considered to be part of the summary system time reported in "brief" mode.
Soft	Time spent processing soft interrupts and also considered to be part of the summary system time reported in "brief" mode.
Steal	Time spent in other operating systems when running in a virtualized environment
Guest	Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, new since 2.6.24
NiceG	Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel), new since 2.6.33

Proc	Process creations/sec.
Runq	Number of processes in the run queue.
Run	Number of processes in the run state.
Avg1, Avg5, Avg15	Load average over the last 1,5 and 15 minutes.
RunT	Total number of process in the run state, not counting collectl itself
BlkT	Total number of process blocked, waiting on I/O

Objects
InUse	The total number of objects that have been allocated to processes.
Avail	The total number of objects that are available in the currently allocated slabs. This includes those that have already been allocated toprocesses.
Slabs
Number	This is the number of individual slabs that have been allocated and taking physical memory.
Memory
Used	Used memory corresponds to those objects that have been allocated to processes.
Total	Total physical memory allocated to processes. When there is no filtering in effect, this number will be equal to the Slabs field reported by -sm.