Disk Statistics Anomoly

First Some Background

My original collectl development was back on a 2.4 kernel and at the time I would not only validate its output against existing tools (since most of the data collectl reports is available elsewhere), I would also induce system loads and make sure I was seeing rational numbers. Therefore it should come as no surprise when I got my hands on my first 2.6 kernel I repeated the process. Much to my surprise when I wrote a large file and looked at the disk statistics you can imagine my surprise when I saw the following when writing a 5GB file to /tmp with dt, the Data Test Program:

[root@cag-dl145-11 ~]# collectl -sd -oT
waiting for 1 second sample...
#         <-----------Disks----------->
#Time     KBRead  Reads  KBWrit Writes
10:05:31       0      0  114184    159
10:05:32       0      0  663324    148
10:05:33       0      0  164592    153
10:05:34       0      0       0    154
10:05:35       0      0       0    201
10:05:36       0      0       0    228
10:05:37       4      0       0    312

note that typical single disk write performance numbers for large files tend to be in the 40-50MB/sec range.

The obvious question is how can I be seeing over 100MB/sec during 3 consecutive intervals and then nothing? It's also curious to see there are writes but no data being written.

Here Is The Explanation

It turned out that in the block device driver that updates the statistics, the byte counters were being incremented as candidate blocks were selected in the buffer cache and queued for I/O, not when they were actually sent to the driver as was being done with the number of IOs counter. Therefore it turned out that the bytes/sec were actually measuring the bytes queued for I/O per second as opposed to the actual I/O rates. It is also of interest to note that this problem also existed in the 2.4 kernel, but the size of the I/O queue was only 128 entries and therefore the time a request spent in the queue was short enough that nobody noticed! When the 2.6 kernel first came out the queue size was 1024 and so the situation became much more noticeable. In fact if you changed the queue size on the 2.6 system the numbers again became more reasonable. You should also note that Direct I/O, which bypasses the buffer cache, would not cause this to occur.

Does the Problem Still Exist?

In a word, no. I sent my findings to the maintainer of the block device driver and he changed the way statistics are measured somewhere around the 2.6-14 kernel and so this is no longer a problem.