Memory Info Monitoring

Introduction
Collectl reports the standard values with respect to memory which are fully documented under the data definitions. Trying to rationalize the way these values relate to each other can be a frustrating experience because they rarely add up to what you expect. Part of the reason for this that every byte is simply not accounted for in every category. This can be further complicated because at boot time, some devices will actually grab some memory that the kernel will never even see and so the total memory will not always equal to the physical amount of installed memory.

If one looks at /proc/meminfo, which shows a lot more types of memory than collectl reports, it begs the question "why not report it all?" and the simple answer is there is just too much. Further, collectl uses a second file /proc/vmstat to gather virtual memory stats which further adds to the volume of possibly candidates to report. Again, collectl tries to report values of most use.

Brief, verbose and detail
Like other data in these 3 categories memory also reports values in this way as well. However there are a few important caveats to note:

Don't get fooled by cache memory vs used and free memory
One of the most confusing things for people who aren't more familiar with linux memory management is the interpretation of cache memory and its relationship to used and free memory. As one might expect, as cache memory increases so does used and as expected free decreases.

What fools people is that the first (or many) times they see low free memory they think their system is running out of memory when in fact is it not. If they reboot, the memory frees up, but then starts to fill again. So what's going on? It turns out that whenever you read/write a file, unless you explicitly tell linux not to, it passes the file through the cache and this will cause an increase in the amount of cache memory used and a drop in free memory. What many people do not realize is, until that file is deleted or the cache explicitly cleared, all files remain in cache and as a result if the system accesses a lot of files cache will eventually fill up and reduce the amount of free memory.

Naturally linux can't allow the cache to grow unchecked, and so when it reaches a maximum set by kernel, older entries will start to age out. In other words, reading a file will be extremely fast when in cache but its access slowed to disk speeds when not.

The only real way to tell what is going on is to look at the disk subsystem while accessing a file. If a complete or partial file is in cache, read I/O rates will be much higher than normal. If a file is written that will completely fit in cache, again the I/O rates will be very high because the rate at which cache is being filled is what is actually being reported. It is only when a file is a lot larger than cache that the I/O rates slow down, operating only as fast as dirty data in cache can be written to disk and is in fact the only real way to measure how fast your disk subsystem actually is.

tip: --vmstat
Some people find the way vmstat reports virtual memory information to be very handy in some cases. The only problem with vmstat is it doesn't write its output to a file and so even if you wrap it in a script to write its output to a file you're now stuck with memory information in a specific format and if you do want to plot it that takes a little more effort too.

Collectl's --vmstat switch is actually internally turned into --export vmstat and so reports data the same way as vmstat does but now you get some added bonuses:

tip: don't forget about --grep
As previously mentioned, there is a lot more data contained in the memory /proc structures than collectl reports. So does that mean you're out of luck if you want to see the value of say Committed_AS? Absolutely not, at least not during playback. Since collectl actually records the contents of /proc data in its original formats in the raw files, you could always use linux's grep (or zgrep) commands to search them for a particular pattern like this:
zgrep Committed_AS /var/log/collectl/poker-20110928-000000.raw.gz
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
But unfortunately this does nothing to tell you what time the values correspond to. You could have included the reporting of fields with >>> in them and you'll see the UTC timestamps, but those aren't easily mapped to conventional time formats. Now look at this:
collectl -p /var/log/collectl/poker-20110928-000000.raw.gz --grep Committed_AS -oT
00:00:00 Committed_AS:   889272 kB
00:00:10 Committed_AS:   889272 kB
00:00:20 Committed_AS:   889272 kB
00:00:30 Committed_AS:   889272 kB
00:00:40 Committed_AS:   889272 kB
00:00:50 Committed_AS:   889272 kB
00:01:00 Committed_AS:   889272 kB
00:01:10 Committed_AS:   891664 kB
00:01:20 Committed_AS:   891664 kB
Pretty slick! And since this is collectl/playback, you can use other switches like from/thru or even change the timestamp format and/or see msec too. Also remember this trick can be applied to any data collectl records, though memory tends to be the most interesting.

updated September 29, 2011