It's been quite a while since I've done an entry.
Some of my lateness is due to travel (of course), but some of it is due to plain laziness.
So let's sum up the last week...
Some quickies:
- I learned that my mom reads this journal. Hey mom -- sign up on the mailing list, it's easier!
- Gone through oodles of details for the house. We've picked out paint, carpet, and lighting fixtures.
- Tracy has picked out the appliances. GE rocks.
- Had a great trip to Wisconsin. More on this below.
- Firmed up the three topics for the chapters for my dissertation (two were already known; the third kept moving around). In no particular order:
- Interoperable MPI (IMPI) / LAM. This chapter is essentially written.
- Generalized master / worker scheme, with its demo application --
the parallel Ogg/Vorbis encoder. - Scalable LAM (SLAM): port some of the technology from Minime back to LAM, including the tree-based booting and TCP connection caching.
- I returned to Louisville on Wednesday to find that Bell South had taken a back hoe to my phone lines. I was without telephone and internet for over a day. It sucked.
- I'm really displeased with boost, and aside from the Boost Graph Library (BGL), I'm not going to use it anymore. More below.
- Everyone in the LSC is now scattered around different offices in Fitz/Cushing. They're cleaning out 325 so that the carpet can be replaced and the room can be cleaned (after the annual winter/roof/water leaking/mold disaster).
- Lumsdaine just formally announced that he's going to after this semester.
Ok, now for some longer explanations.
Wisconsin trip was good. Met several of the Condor folks, including Miron Livny (the head PI up there). He's a good guy, and really smart. He's the only other professor type whom I've met who views software the same way we (Lumsdaine/the LSC) do:
- Software should just work; it needs to suck less
- Releasable software (portable, stable, etc., etc. -- not "research quality" software) is equal to a journal article
He actually wasn't as big on checkpointing as we were; he's more interested in bringing MPI into the dynamic computing world, where faults can (and do) happen. It took all of about 15 minutes to figure out how to run LAM under the static Condor scheduler. It will take a good deal longer to figure out how to [re]define MPI semantics to work in a dynamic environment where nodes can fail. Then it will take a little time to implement those in LAM.
After getting LAM to work in their static scheduler, we decided that the first step would be to get LAM to run in a "debugging / interactive" mode in Condor. That is, do something like:
unix% condor_lamboot -np 16
unix% condor_mpirun -np 16 foobar
...more condor_mpirun commands
unix% condor_lamhalt
The first step reserves up to 16 nodes (however, one or more may disappear at any time if a user returns to their computer, etc.). The last step releases any of the nodes that are still left.
Definitely an imperfect scheme, but a good first step -- and necessary to do anything more complicated.
This is going to be good stuff!
I got really annoyed with Boost today. The following is essentially an e-mail that I sent to Jeremy, Andy, and Rich about my experiences with Boost today:
Boost has a long way to go before it becomes usable for the Common User. As it has been all along, boost is really only suitable for its own developers and a handful of other hard-core geeks. Boost needs to become a lot more friendly towards the user before it can hope to become widely accepted.
Software needs to suck less, and boost is not fulfilling that requirement right now.
Here's my story, and why I'm annoyed at boost:
I used a "progress" class from boost in some of my dissertation code. It just prints a simple ASCII progress bar, from 0 to 100%. Handy for some long-running computations. I could well have written this myself, but decided to use boost, well, for pretty much the same reasons I love the STL (it's already written, it just works, etc., etc.).
I haven't touched this code in a month or two, and when I resurrected it today, I figured that I should go get the latest version of Boost, because I saw that there were some BGL fixes, and I'm finally getting to the point where I'm going to need the BGL. This turned out to be a Big Mistake.
I downloaded boost_all.zip (which still doesn't have a version number in the name) from boost.org. There was no indication that this was an unstable release, so I figured that I'd be safe.
I unzipped it (sigh; why still no tar.gz version?) and inserted it into my source tree on my linux box here at home. Four unexpected and Bad things happened:
- File locations have changed. Specifically, there are .cpp files that no longer exist. Normally, as a library user, I wouldn't care one bit about this. But since boost provides no build interface, this is highly relevant to the user.
I grumbled a bit, but modified my
Makefile.am's to adjust. -
boost/config.hppseems to be broken with g++ 2.95.2
-- g++ complains of bad preprocessor mojo. It seems that g++'s preprocessor doesn't allow you to split#ifstatements across multiple lines with "\".I edited
boost/config.hppto fix the error. -
boost/cstdint.hppis broken for the same reason. There may be others that are broken in the same way; I was only using the "progress" class, and this is apparently all that they used.I edited
boost/cstdint.hppto fix the error. -
boost/timer.hpp(from Beman himself!) does not protect#include <limits>with#if BOOST_NO_LIMITS. g++ does not have<limits>. Hence, broken.After poking around a bit to figure out why this was happening, I edited
boost/timer.hppto fix the error.
After cooling off a bit, I concluded the following:
- Although boost is continually evolving and filenames/locations are going to keep changing, until a build and/or install mechanism is in place, users will continue to get burned by .cpp files moving, etc. This needs to be fixed. Soon.
- I realize how hard it is to distribute software, especially catching all the minor details before you ship a tarball -- but didn't anyone test g++? Having [at least] three separate things broken on the latest g++ version seems like a glaring oversight.
There may be some rationale here that I'm not aware of that makes this "ok" (there's nothing in the documentation that I saw about this), but Joe User is going to download
boost_all.zipand just expect it to work on his Linux boxen. He's certainly not going to comb through the mailing list archives looking for why it doesn't even compile. - Other than the BGL, I'm not going to use boost anymore. I will use the BGL because it's probably safe to assume that there's some LSC-inspired sanity in that package (i.e., I have much higher faith in that package vs. the others because I trust the people involved), but I'm going to write my own "progress" class. It's just not worth my effort to use any other part of boost.
Jeremy replied that there had been some confusion among the boosters -- the version that was released today was released before it was ready. Although this may be true, the problems that I was having were unrelated to the problems that he was talking about. So I stick by my statement that other than the BGL, I won't be using Boost anytime in the near future.
I dug up some old minime code today (the threaded booter) and familiarized myself with how it worked again. I then dove into LAM and added hooks for threads and whatnot (nothing will be checked in, of course, until after 6.5 is released!).
I took a good long look at the current lamboot mechanism in LAM and had to sit and doodle out several designs before I got one that elegantly meshed the assumptions that are built into LAM with the new ideas from the threaded booter. I'll try to implement it tomorrow; it will be a real pain without the STL and C++ strings. :-(
Tracy and I went out to the house today. I took loads of pictures, and we measured out several rooms for planning purposes, etc. I can't decide which of the two bedrooms at the end of the 2nd floor hallway will be my office. I may have to visit the house in the morning and see how the sun glare is in the front one (that's my first choice, but if the sun glare is too much, I'll have to use the back one).
I bought Tracy a new cell phone this week because a) Valentine's day is next week, b) she'll be on jury duty all week, and c) she lost her old cell phone about 3-4 weeks ago.
I found some cool undocumented features from the sales guy. Most of which are scary looking codes with no explanations, so even though I know they're there, I won't touch them for fear of breaking my phone. One cool one, however, lets me program the top line of my phone's "idle" display to say whatever I want. This is extremely helpful because Tracy and I now have identical phones...
And of course the day after I bought Tracy her new phone, she found her old one.
Figures.
Do you ever wonder where the subjects come form for my journal entries?
Most are movie or TV quotes. Some are totally random and off the top of my head.
That's all for now.