« There's no "I" in fruit / Yes there is / It's a little one; it doesn't matter | Main | Who's afraid of the Big Bad Wolf? »

There is only one unit of measurement

bcheck just simply rocks.

After beating my head against a wall for 2 days looking for a memory bug in LAM/MPI using valgrind (a memory-checking debugger for Linux), bcheck found the error within about test 3 runs on Solaris.

Don't get me wrong -- valgrind rocks as well. valgrind is a fabulous tool and I'm extremely glad that its available (many thanks Julian!). But bcheck somehow provides more detailed information than valgrind provides.

...actually, I guess that's not entirely true. I was sitting here thinking about it while writing this entry and I figured out why valgrind didn't tell me the same information that bcheck did. Here's the scoop:

In this case, the problem was both a read from unallocated and a duplicate free within LAM's myrinet network device. bcheck reported these problems, but valgrind did not. Why?

It all comes back to Myrinet -- arrgh! On Linux systems, LAM/MPI has to use its own memory allocator (a derrivation of the venerable ptmalloc) to be able to catch calls to sbrk() such that memory returned to the OS is guaranteed to be unpinned before it is returned. Hence, valgrind is probably not intercepting these calls because it doesn't know that it's the "real" free(), sbrk(), etc.

This doesn't happen on Solaris because Solaris has a bug deep within its kernel such that gm can't atomicly allocate-and-pin memory, and therefore LAM/MPI doesn't need to replace malloc/free/etc. (that's the short version, omitting all the juicy details). Hence, bcheck is able to see/report on the "true" malloc/free, but valgrind isn't.

So Valgrind rocks! Bcheck rocks! Memory-checking debuggers are life!


TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


This page contains a single entry from the blog posted on August 21, 2003 9:18 AM.

The previous post in this blog was There's no "I" in fruit / Yes there is / It's a little one; it doesn't matter.

The next post in this blog is Who's afraid of the Big Bad Wolf?.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34