I did much work on IMPI today.
Lesson for the wise: never write/debug parallel programs with only two nodes. Always use at least three. Three is probably better than four, actually, if your program has to work for all general cases.
I already knew this, but I discovered it again the hard way today. I'm working with HP and MPI Software Technology on our IMPI demo for SC'2000; I thought that I pretty much had LAM ready to go on Friday. Today, I tried it with three clients (instead of just two, up in nd.edu) -- i.e., two clients in nd.edu and a client down here in squyres.com for a local display (the demo is a GUI plot of the Mandelbrot set --
the plotting is calculated in parallel, and the results are sent to the display master to be shown on X).
Everything worked great with two clients, but started barfing horribly with three clients. Ugh! I had to go around and fix all the places where I had made bad assumptions and whatnot.
So, kids, please don't program in parallel with just two nodes --
always have adult supervision and use three, four, or two hundred nodes.
It didn't help that there were actually other bugs in the demo code that we're supposed to run (the parallel Mandelbrot stuff was originally written by the MPICH guys and then modified by the NIST folks for specific purposes of the IMPI SC'2000 demo). I found at least two bugs today (remember: broadcasting pointer values across multiple architectures is meaningless) -- possibly more, but I think I've blocked them from my memory to prevent further trauma.
I also had a few bugs left in LAM -- the code for calculating host and client colors and sizes looked like a Darwinian experiment gone horribly wrong. I had to evolve that code into something better and greater -- to make it more than the sum of its parts. Now, it rocks with the rest of LAM.
I just can't help it -- LAM rocks.
It all seems to be working now. It's happily checked back into CVS, and hopefully I'll be done with that for a while...
Conversed with a guy at GE Aircraft Engines today. They're using LAM for somethingorother. He asked for a good feature on Friday (see his post on the LAM list); so I moved our discussion off the list and we'll iterate through a few things trying to get it right.
In related news, GE acquired Honeywell today. And "Just Jack" will stay on as CEO for an additional several months (he was going to retire next April, IIRC) until the end of 2001. You just can't go wrong with "Just Jack".
Glory be to the Father, the Son, and GE's stock price, amen.