« January 2001 | Main | March 2001 »

February 2001 Archives

February 3, 2001

All I ask is that you obey me like the Will of God

I have somehow angered Herman, the God of Automobiles.

Last Wednesday, I came out in the morning to my car and noticed a huge crack in it. It was fine on Tuesday night. It doesn't look like an impact crack; perhaps it was thermal stress...?

Thursday afternoon, when I drove in to South Bend, I drove up to Chez Costech and ran over a bottle that I didn't see. Not only did I get a flat, but the bottle managed to gash the side of my tire (which isn't repairable), so I had to buy a whole new one. The folks at Basney Honda were quite nice and hooked me up (they didn't even charge me labor, which was nice -- Kudos to the "Jeff" guy who worked there!), but it was $65 that I didn't particularly want to spend.

I'm worried because these things typically come in threes, and ID have 3 more long drives ahead of me (to Madison, from Madison, and back to Looieville).

Whatever I did, Herman, I'm sorry.

In other Big News, Tracy and I have finally decided on a house. Here's a breakdown of the big details:

  • 2300 square feet
  • 2 floors
  • 4 bedrooms (all on second floor)
  • Laundry room upstairs
  • Entryway off front door is open all the way up to the second floor; the stairs go around the edge
  • Sitting room on first floor
  • Dining room
  • Big kitchen
  • Great room
  • 2 car garage
  • Patio out back
  • Basement

We picked out the cabenits and countertop last week, and put some "good faith" money down on the house so that the builder would customize it for us. Yes, it's a brand new house -- not even complete yet. Tracy's working on picking out colors and carpets this weekend.

And thanks to Alan Greenspan, we got a truly awesome interest rate on our mortguage -- 6 7/8%. Rock on!!

We expect to close by the end of the month (Tracy worked out these details after I left for SBN, so I don't know them offhand). We'll spend the next month cleaning and moving in and whatnot (we'll kinda be taking out time with this), and probably move in by the end of March.

Woo hoo!

And now for some quickies...

  • Went to the Keenan Revue with Arun, Perk, and Co. Was quite fun. Some of the skits were really funny. I won't give anything away here, but the wheelchair bit was my favorite.

  • Lummy and I are heading to Madison tomorrow to visit the Condor folks. Should be a great trip; I'm pretty excited about it. I'm giving a talk there on Monday afternoon; I need to finish it!!

  • All told, Arun, Jeremiah, Brian, Raja, and I spent probably about 3-4 hours discussing quoting and shell escaping rules for LAM on Friday. Wow. In the end, we decided to punt, and only allow simple stuff -- no quoting will be allowed. Maybe someday.

  • Brian gave a talk on IPv6 at LSC lunch yesterday, which was quite informative. When LAM 6.5 gets out the door, he'll be looking at supporting IPv6 with LAM, and doing some cool things with collectives with it. We'll see how that does.

  • Arun and I spent some time yesterday with the Myrinet RPI and tried to make ti work right (still a problem with long messages), but got sidetracked wondering if it would be worth it to replace the state machine that we don't fully understand with our own state machine. A few hours and a full white board later, we decided to stick with the one that we already have and try to make it work. A re-write will be necessary, but not right now.

  • Arun and I also decided that he'll next move on to do Totalview support for LAM rather than VIA. It is both RPI work and helps move Arun up a level in the LAM code -- gives him greater exposure. Plus, when it's done, it will be immensely helpful for debugging LAM itself.

That's it for now -- I somehow have whacked my journal client up here on AFS, so I'm using one on queeg via DSL, and somehow backspace doesn't work in Emacs, which is quite frustrating.

February 11, 2001

Did you know that he helped Matthew with his weight problem?

It's been quite a while since I've done an entry.

Some of my lateness is due to travel (of course), but some of it is due to plain laziness.

So let's sum up the last week...

Some quickies:

  • I learned that my mom reads this journal. Hey mom -- sign up on the mailing list, it's easier!

  • Gone through oodles of details for the house. We've picked out paint, carpet, and lighting fixtures.

  • Tracy has picked out the appliances. GE rocks.

  • Had a great trip to Wisconsin. More on this below.

  • Firmed up the three topics for the chapters for my dissertation (two were already known; the third kept moving around). In no particular order:

    • Interoperable MPI (IMPI) / LAM. This chapter is essentially written.

    • Generalized master / worker scheme, with its demo application --
      the parallel Ogg/Vorbis encoder.

    • Scalable LAM (SLAM): port some of the technology from Minime back to LAM, including the tree-based booting and TCP connection caching.

  • I returned to Louisville on Wednesday to find that Bell South had taken a back hoe to my phone lines. I was without telephone and internet for over a day. It sucked.

  • I'm really displeased with boost, and aside from the Boost Graph Library (BGL), I'm not going to use it anymore. More below.

  • Everyone in the LSC is now scattered around different offices in Fitz/Cushing. They're cleaning out 325 so that the carpet can be replaced and the room can be cleaned (after the annual winter/roof/water leaking/mold disaster).

  • Lumsdaine just formally announced that he's going to after this semester.

Ok, now for some longer explanations.

Wisconsin trip was good. Met several of the Condor folks, including Miron Livny (the head PI up there). He's a good guy, and really smart. He's the only other professor type whom I've met who views software the same way we (Lumsdaine/the LSC) do:

  • Software should just work; it needs to suck less

  • Releasable software (portable, stable, etc., etc. -- not "research quality" software) is equal to a journal article

He actually wasn't as big on checkpointing as we were; he's more interested in bringing MPI into the dynamic computing world, where faults can (and do) happen. It took all of about 15 minutes to figure out how to run LAM under the static Condor scheduler. It will take a good deal longer to figure out how to [re]define MPI semantics to work in a dynamic environment where nodes can fail. Then it will take a little time to implement those in LAM.

After getting LAM to work in their static scheduler, we decided that the first step would be to get LAM to run in a "debugging / interactive" mode in Condor. That is, do something like:

  unix% condor_lamboot -np 16
unix% condor_mpirun -np 16 foobar
...more condor_mpirun commands
unix% condor_lamhalt

The first step reserves up to 16 nodes (however, one or more may disappear at any time if a user returns to their computer, etc.). The last step releases any of the nodes that are still left.

Definitely an imperfect scheme, but a good first step -- and necessary to do anything more complicated.

This is going to be good stuff!

I got really annoyed with Boost today. The following is essentially an e-mail that I sent to Jeremy, Andy, and Rich about my experiences with Boost today:

Boost has a long way to go before it becomes usable for the Common User. As it has been all along, boost is really only suitable for its own developers and a handful of other hard-core geeks. Boost needs to become a lot more friendly towards the user before it can hope to become widely accepted.

Software needs to suck less, and boost is not fulfilling that requirement right now.

Here's my story, and why I'm annoyed at boost:

I used a "progress" class from boost in some of my dissertation code. It just prints a simple ASCII progress bar, from 0 to 100%. Handy for some long-running computations. I could well have written this myself, but decided to use boost, well, for pretty much the same reasons I love the STL (it's already written, it just works, etc., etc.).

I haven't touched this code in a month or two, and when I resurrected it today, I figured that I should go get the latest version of Boost, because I saw that there were some BGL fixes, and I'm finally getting to the point where I'm going to need the BGL. This turned out to be a Big Mistake.

I downloaded boost_all.zip (which still doesn't have a version number in the name) from boost.org. There was no indication that this was an unstable release, so I figured that I'd be safe.

I unzipped it (sigh; why still no tar.gz version?) and inserted it into my source tree on my linux box here at home. Four unexpected and Bad things happened:

  1. File locations have changed. Specifically, there are .cpp files that no longer exist. Normally, as a library user, I wouldn't care one bit about this. But since boost provides no build interface, this is highly relevant to the user.

    I grumbled a bit, but modified my Makefile.am's to adjust.

  2. boost/config.hpp seems to be broken with g++ 2.95.2
    -- g++ complains of bad preprocessor mojo. It seems that g++'s preprocessor doesn't allow you to split #if statements across multiple lines with "\".

    I edited boost/config.hpp to fix the error.

  3. boost/cstdint.hpp is broken for the same reason. There may be others that are broken in the same way; I was only using the "progress" class, and this is apparently all that they used.

    I edited boost/cstdint.hpp to fix the error.

  4. boost/timer.hpp (from Beman himself!) does not protect #include <limits> with #if BOOST_NO_LIMITS. g++ does not have <limits>. Hence, broken.

    After poking around a bit to figure out why this was happening, I edited boost/timer.hpp to fix the error.

After cooling off a bit, I concluded the following:

  1. Although boost is continually evolving and filenames/locations are going to keep changing, until a build and/or install mechanism is in place, users will continue to get burned by .cpp files moving, etc. This needs to be fixed. Soon.

  2. I realize how hard it is to distribute software, especially catching all the minor details before you ship a tarball -- but didn't anyone test g++? Having [at least] three separate things broken on the latest g++ version seems like a glaring oversight.

    There may be some rationale here that I'm not aware of that makes this "ok" (there's nothing in the documentation that I saw about this), but Joe User is going to download boost_all.zip and just expect it to work on his Linux boxen. He's certainly not going to comb through the mailing list archives looking for why it doesn't even compile.

  3. Other than the BGL, I'm not going to use boost anymore. I will use the BGL because it's probably safe to assume that there's some LSC-inspired sanity in that package (i.e., I have much higher faith in that package vs. the others because I trust the people involved), but I'm going to write my own "progress" class. It's just not worth my effort to use any other part of boost.

Jeremy replied that there had been some confusion among the boosters -- the version that was released today was released before it was ready. Although this may be true, the problems that I was having were unrelated to the problems that he was talking about. So I stick by my statement that other than the BGL, I won't be using Boost anytime in the near future.

I dug up some old minime code today (the threaded booter) and familiarized myself with how it worked again. I then dove into LAM and added hooks for threads and whatnot (nothing will be checked in, of course, until after 6.5 is released!).

I took a good long look at the current lamboot mechanism in LAM and had to sit and doodle out several designs before I got one that elegantly meshed the assumptions that are built into LAM with the new ideas from the threaded booter. I'll try to implement it tomorrow; it will be a real pain without the STL and C++ strings. :-(

Tracy and I went out to the house today. I took loads of pictures, and we measured out several rooms for planning purposes, etc. I can't decide which of the two bedrooms at the end of the 2nd floor hallway will be my office. I may have to visit the house in the morning and see how the sun glare is in the front one (that's my first choice, but if the sun glare is too much, I'll have to use the back one).

I bought Tracy a new cell phone this week because a) Valentine's day is next week, b) she'll be on jury duty all week, and c) she lost her old cell phone about 3-4 weeks ago.

I found some cool undocumented features from the sales guy. Most of which are scary looking codes with no explanations, so even though I know they're there, I won't touch them for fear of breaking my phone. One cool one, however, lets me program the top line of my phone's "idle" display to say whatever I want. This is extremely helpful because Tracy and I now have identical phones...

And of course the day after I bought Tracy her new phone, she found her old one.


Do you ever wonder where the subjects come form for my journal entries?

Most are movie or TV quotes. Some are totally random and off the top of my head.

That's all for now.

February 14, 2001

...made entirely from copies of Steve Miller's Greatest Hits

Arun complains about .8Mbps from his server in his dorm room to the engineering building on ND's campus.

Tests showed that I was getting .3Mbps from squyres.com to nd.edu yesterday. So Arun has no room to complain. :-)

Actually, as Arun pointed out later, we should both complain!

House preparations are going well.

For anyone who is really bored, I took a whole schload of pictures of our unfinished house and put them online at:


I've been working on integrating the multi-threaded tree booter into LAM (nothing will be CVS committed until after LAM 6.5 is released, of course). I've had some interesting (and frustrating) problems, but it seems to be going more-or-less well.

When I originally wrote it, it was outside of the LAM framework, so I re-wrote/copied some of the LAM stuff for basic network services and whatnot, frequently putting it in a C++ kind of context (using the STL, making basic objects, etc.). So I've been stripping that stuff out and reverting back to LAM's C interface for these services.

It's coming along swimmingly.

DSL is getting installed in my church; they responded with a telephone installation date of 22 Feb, 2001. They're already up on a LAN; they use individual modems to connect to AOL right now, which is terribly inefficient. DSL will be a Good Thing for them.

With all the hubaloo about ssh1 this week and last, I upgraded to OpenSSH. Took a little bit of pain, because I need the AFS token passing support, so I had to compile it myself. What isn't obvious is that "OpenSSH" is a BSD-specific application. You have to get "Portable OpenSSH" to run on linux (or anything else) machines.

With some futzing, I got the AFS stuff to work.

Then I started mucking around with SSH2. Took a bit more futzing to get that to work.

Important fact: I don't know if I selected this during installation or if it's a Mandrake default -- you have to configure OpenSSH with --with-md5-passwords to get password authentication on the server side to work properly.

After all that (I was using Portable OpenSSH 2.3.0p1), I was randomly getting "authentication response too long" errors when I tried to connect to an openssh server. I asked Todd about this (he's a FreeBSD guy), and he mentioned that they "had problems with RSA authentication somewhere around 2.3.0".

So I got the latest CVS copy of Portable OpenSSH (which is version 2.3.2), and all seems to be well. I don't know if it was the client or the server that was whacky, but I suspect it was the client -- I couldn't connect to an openssh 2.3.2 server with it either (same error: authentication response too long). I don't know what the difference is between 2.3.0 and 2.3.0p1. On my 'drake 7.2 laptop, I have RPMs installed for openssh 2.3.0, and they seem to work just fine, so perhaps p1 broke something...?

But the CVS copy seems to be working, so I'm happy with that.

I may have to switch to gnome. I caught a bit about "Evolution" on /. the other day -- it looks like a free version of MS Outlook. Very cool. But it has lots of dependencies, and seems fairly gnome-specific.

I'm not inspired to try it at the moment, but I might well be upgrading all my current linux boxen (3) after I graduate to whatever latest/greatest stuff is out there, which may include switching to gnome, etc.

I actually only use KDE right now because it was the default when I installed linux on my laptop. Not having previously used KDE or Gnome before, I took KDE simply because it was the first in the list on the login screen.

Nina and Joe from the LAM list made a good suggestion (Nina indirectly asked it about 2 weeks ago, and we never got to it... oops) today that I put into the main-line LAM tree so that it will be released in 6.5.

I added a "-s" option to the lamboot command. Normally, the stdout/stderr of the LAM daemon on the node where lamboot is run is left open. This is so that LAM's internal "tstdio" package can function properly. tstdio is an emulation of normal stdio, but it works in a parallel environment, and funnels everything back to the lamd on the node where you booted.

Anyway, we normally leave stdout/stderr open on the local node for this reason. The stdout/stderr on all remote nodes is closed. However, Joe and Nina both wanted to do:

    rsh somenode lamboot hostfile 

It's important to remember that rsh requires two criteria before quitting:

  1. The application that it launches finishes (lamboot in this case)
  2. stdout/stderr from the application that it launches and all of its children are closed

This makes sense, actually; normally you'd want to see the output from all the children processes that you rsh over to some node, and wouldn't want rsh to finish before they did, because then you wouldn't see all the output.

But in this case, it causes rsh to hang. Since 99%
of LAM users don't use tstdio, I added "-s" that will force the closing of stdout/stderr on the local node, so that "rsh somenode lamboot -s hostfile" will allow rsh to complete.

More information than you wanted, but I wanted it archived in my journal. :-)

I seem to have years worth of data in my palm pilot datebk. Rich Murphy suggested trying to "purge" option.

If I forget your birthday next year, it's his fault.

Be glad that you weren't in Toronto today.


Look at the logo on one of the Cupid's butts. I can't believe that my friend works there. :-)

xmms crashed yesterday at 10:55am. At the time, it was running 928 processes out of 1023 -- almost 93%. xmms now has ... processes out of ....

February 19, 2001

Are you going to cry right here, or run to the bathroom?

I saw the last episode of News Radio last Thursday (the one where everyone but Dave and Matthew go to Jimmy's new radio station in New Hampshire). The next night, they ran the first show -- they're starting the cycle over from the beginning. Marvelous!

I love the changes from the first few episodes from the rest of the show:

  • The break room is a production room
  • Joe is played by some other actor (and I don't think the character's name is Joe)
  • Beth's character is a decidedly different personality than in later episodes of the show

There's probably other differences, but those ones jump out at you.

Network performance to nd.edu has been pretty bad this week. It was especially bad when the OIT screwed our entire switch and dropped everything down to half-duplex 100Mbps.


I got write access to the Ogg/Vorbis CVS repository last night. I've complained about so many build process problems that Monty (the lead developer) finally just gave me write access to go fix them myself (he's actually tried to give me write access before, but I refused). I committed a few minor patches, but a few issues still remain in debate. It's a bit more complicated because all this stuff has to compile on windoze (which I won't spend any time on), and by the fact that beta4 is due out immanently. Hence, everything is frozen except for bugs.

I've actually got several patches that I submitted long, long ago but were never applied (mostly build-process things) that I will likely apply after beta4 is released. The fact that you still can't build with native Solaris compilers without modifying the included Makefile.am's is still a sore spot that I plan to fix (they include the foolish GNU dependency generating thingy that breaks building for all non-gcc compilers).

There's also heated debate about the use and implementation of getopt_long(). It's silly, but still important. Ugh. Again, this probably won't be Really Fixed until after beta4.

Not that I have time for any of this, anyway...

I've been doing lots of dissertation hacking this week. I'll save that for a second journal entry; I have some important facts to report there.

Not too much else has been going on -- I've been really concentrating on dissertation stuff this past week. I hear that Arun and co. hung a "Lust for Glory" sign underneath ND's "Engineering Week" sign. Most excellent. Arun is promising pictures.

Look Dave, no strings.

Ugh. I've spent the past few days fighting the return semantics of rsh and ssh.
In trying to make the tree-based booter industrial strength by putting it into LAM, I found out that not all rsh implementations are created equal. Grrrr...

It seems that some versions of rsh pretend to close stderr, but will in fact actually send things across it later. i.e., read() will return 0, but then will later return a positive number and have valid bytes in the buffer.


There's also some mysterious things happening that I don't fully understand yet (this only happens when you scale to above 25 nodes or so). So I finally decided that if rsh cannot be trusted, the whole framework in LAM for generic remote-launching is wrong. i.e., the whole issue is about determining if the remote program started successfully or not. How to do this in a programtic fashion?

It currently goes like this (and rsh can be replaced with ssh or whatever):

  1. Open two pipes
  2. fork() a child process
  3. Close the respective pipe ends in the parent and child processes
  4. Tie the pipes to the stdout and stderr in the child process
  5. The child exec() the rsh command command
  6. The parent watches the pipes:
    • If something comes across stderr, our heuristic says to abort
    • It something comes across stdout, buffer it
    • When stderr and stdout close, the child is done, quit the loop
  7. The parent calls waitpid() to wait for the child to die
  8. If the return status of the child is not 0, abort

If we incorrectly determine that a remote program failed to start (i.e., it actually did start, but the local node thinks it didn't), the remote program gets stranded, and is left running forever because no one will ever contact it again. Among other reasons why this is bad, this is anti-social behavior.

Plus, the code is complicated as well because of the statefull nature it has to maintain while checking multiple data sources in a non-blocking way. Ugh. And I didn't even mention how we have to check and see if the other side is running a Bourne or Korn shell...

The long and the short is that the remote agent (rsh, ssh, whatever) cannot be trusted to give reliable information. So the only thing to do is to disregard the information that it gives and determine if the remote program started correctly by a different means. One way to do that is to have the remote process call the spawning process back with a TCP socket.

If the remote process doesn't call back within a timeout period, the spawner can reason that it failed and give up on it. If the remote process starts up properly and is unable to contact the spawner (perhaps it took a long time to start, and the spawner has timed out already), it will just abort. This prevents orphaned remote processes.

Specifically, I'm looking at something like:

  1. Parent creates listening socket for the callback
  2. Parent launches a thread to wait for the callback on that socket
  3. Parent makes three pipes (for stdin|out|err)
  4. Parent fork()s a child
  5. Parent closes appropriate ends of the pipes
  6. Parent launches two threads to monitor the pipes
  7. Parent launches a thread to block on waitpid()
  8. Child closes appropriate ends of the pipes, ties the other ends to stdout|err
  9. Child exec()'s the remote agent
  10. Parent blocks on a queue

    • When either of the pipe threads wake up on a read, they buffer the data and put it in an event and queue it up for the parent
    • Closing either of the pipes is similar -- an event is queued up for the parent followed by the thread committing suicide
    • When waitpid() returns, the return status is queued up in an event for the parent, and the thread commits suicide
    • When the listening thread succeeds on accept(), it begins the authentication/connection protocol. Upon success, it queues up an event for the parent (including the open socket file descriptor) and commits suicide.

  11. When all the threads die, it means that the remote process has started up, the remote process has authentications and indicated that it wants to run, a socket is still open to the remote process, the remote agent is now dead, and all threads/processes have been reaped, so the parent can now continue.

In the previous scheme, the remote agent would launch the remote program. The remote program would immediately close stdin|out|err and then fork a child into the background as a user-level daemon, and then quit. This would allow the remote agent to finish normally (hah!). The child process would then continue on to do whatever it was launched to do.

In the new scheme, there is no need to have the remote agent finish until the callback to the spawner has completed and there is no more gain to having the remote agent process around anymore. i.e., in the previous (linear) scheme, it was necessary for the remote agent to quit before the next step would proceed (wait for a callback). In this scheme, they are independent events -- the remote agent quitting has little bearing on the callback since those are in different threads. Indeed, it may be advantageous to have the remote agent stick around until the callback occurs successfully to give one more way to abort the remote process if something goes wrong. That is, if something goes wrong and the callback gets mucked up, send a signal or some kind of message down the stdin pipe to the remote agent, which will get passed to the remote process that will cause the remote parent and child to abort.

Additionally, just like giving each remote process a thread to manage it, giving a thread to each of the stdout and stderr pipes eliminates the combined state machine and uses blocking reads. This makes the algorithm for monitoring the pipes much simpler. Hence, we can monitor the pipes, waitpid(), and the callback separately, and therefore greatly simplify the code (why didn't I think of this earlier?).

Jeff's law of non-blocking:

writing blocking algorithms is much simpler than writing non-blocking algorithms.

Jeff's law of blocking:

writing concurrent blocking algorithms introduce their own problems, but generally only in terms of infrastructure, and are typically problems that are already solved.

What's even cooler is that the remote process can startup, call back the spawner, and give a "I'm ready to go" message, or "things suck over here; I can't run so I'm going to abort" message. i.e., the remote process can decide whether it's going to run or not (e.g., check to see if the load is not too high) and send back a yay or nay to the spawner. Even cooler than that -- an integrated startup protocol allows for authentication instead of security through obscurity (security though obscurity isn't, for those of you who care!).

I'm currently in the middle of re-writing all this code (it takes time to setup the infrastructure and whatnot). The result should

xmms currently has 619 of 703 processes on queeg (88%).

February 20, 2001

What we need around here is an anti-whining ordanance!

Quote of the day (and I sent this to a list of people who will be meeting next month to discuss a Scyld-like cluster installation thingy, and no, I don't know everyone who is on the list):

...However, if it ever came down to a WWF Smackdown-style Deathmatch between LAM and MPICH, I've got several football-playing undergrads on my side. This is their latest exploit:


So I stretched the truth a big. I've seen Arun and Brian play football; it's not that much of a lie, is it?

February 21, 2001

I think Joe saw us in the movie theater last night

I've gotten an unexpected result from my thread booter.

When booting across the ND helios cluster of 161+ sparcs (some of which should fail, BTW -- at least 2-3 are down at any given time, and about 5-10 are a different version of Solaris than the rest such that there are shared library linker problems trying to run on them).

Even with about 10-20 nodes expected to fail, about 1/3 of them fail to boot properly on a regular basis. This is many more than expected.

The main reason is that the parent that is trying to boot them times out. i.e., if the child does not callback on the socket within N seconds, the parent decides that the remote boot must have failed (even if the boot does succeed at some point later, and the child does try to boot). The parent rules that that child is dead and moved on to the next.

The weird thing is that this was happening a large percentage of the time; much more than I expected. Worse than that, it was inconsistent -- I would get different results every time I did a helios-wide boot (even if they were only separated by only a few seconds). This is clearly not good enough.

One solution is to increase the timeout time (I was using timeout values of 5, 10, and 30 seconds -- the problem occurs with all the values). Increasing the timeout value to 120 seconds seems to make it work most of the time; most bootable helios machines actually boot properly. However, this significantly adds to the overall boot time because we now have to wait 2 minutes for each individual failure before moving on to the next child, which is undesirable.

So I think I need to change my booting algorithm yet again (this is the point of research, isn't it?).

  • Still keep the basic tree-based structure.

  • To overcome the problem with slow children, we need a system where the work of one child can be given to another, but need to keep this in a tree-based structure (vs. a monolithic server) so that we don't run out of resources. That is, some kind of first-come, first-serve basis, since we know that if a child requests work, it is ready to go. Faster children will naturally ask for more work.

  • Right now, each parent node receives a list of all the children that it is responsible for booting. It divides this list up into N sub-lists (where N is the number of immediate children that it will boot), spawns a thread for each, and gives each thread one of the sub lists. This needs to change.

  • Instead, spawn off N threads and give them each one child to boot. The parent thread keeps the rest of the list of nodes that it is ultimately responsible for booting.

  • If a child fails to boot by some kind of immediate failure (e.g., a ping to that child fails), the parent can kill that thread and launch a new thread and give it the next node from its master list.

  • When [if] a child actually boots successfully (which is defined by the grandchild opening a socket back to the child and saying, "Ok, I'm ready -- gimme some work"), it asks the parent for a subset of nodes from the parent's pool. The parent will give a list roughly of size total_size/N so that each descendant's subtree will be about the same size, which then child then passes on to the grandchild.

  • Aside from the parent keeping the list of children, this is more or less how it happens now.

  • Here's the new part: when a parent finishes (actually, when any node finishes -- whether it was originally a parent or a leaf), it sends an "all done -- got any more work?" message to its parent.

    • If the parent's parent has any more work, i.e., it has some nodes left in its pool because one or more of its children were slow to boot, it will give a subset list (of about the same size as it has given out to every other node who queried) to the querying child.

    • If the parent's parent doesn't have any more work, it passes the request to its parent, where the same procedure is repeated. If any work is eventually found, it is sent back down the tree to the original child who queried.

As such, with this scheme, it is possible for a grandchild (or some node even further down) to steal work from a slow child. This scheme can allow for long timeouts which may be necessary (particularly in an active workstation environment), but still allow for speed in the overall boot -- we just eliminate the blocking on slow children by potentially taking away their work from them.

A side-effect of this is that the overall tree may become lop-sided. But that doesn't really matter much, because the idea here is to parallelize in time, not in space. So if we have a slightly taller-than-optimal meta-tree at the end, it doesn't matter -- the meta tree is only for booting and will be discarded anyway.

It's good to be a gangsta

Some hard-learned C++ knowledge:

cin always uses file descriptor 0.
cout always uses file descriptor 1.
cerr always uses file descriptor 2.

This is particularly annoying where you close file descriptors 0, 1, and 2 and re-open them to something else, because cin/cout/cerr will still use those file descriptors, and will read/write to the new things that you opened!

I finally figured out that that was what was happening on my tree-based booters -- I had debugging cerr's that ended up writing down sockets, which caused havoc on the remote side, because it got unformatted and unexpected messages. Doh!!!

In hindsight, this completely makes sense (and may even be by design; I don't have an iostream book handy). Consider that cin/cout/cerr are not tied to the OS -- they have no way of knowing when file descriptors 0, 1, and 2 have been closed and reopened to something else. For example, cout's operator<<(...) assumedly eventually boils down to:

write(1, ...);

In which case, cout has had no indication that file descriptor has been closed and re-opened into something else.

Just a point of wisdom for readers out there... it caused me three days of grief.

February 26, 2001

We have IPv6 telephones

Whew! What a week.

Tracy and I got caught in the snowstorm out east last week. It took over 3 hours to drive from Philadelphia to Baltimore (and we used the non-congested roads) -- a drive that normally takes 1 hour 45 minutes. Top speed in Alan's range rover was about 40mph. Woof!

And how about Southwest airlines -- $55/each for Tracy and I to fly to Baltimore from Louisville for a grand total of $220 (round trip). That was literally an order of magnitude cheaper than all the other airlines.

More house stuff is going on. We close in T-7 days if all goes well. The building is working feverishly to finish the house by next Monday. Tracy and I went to the house again this weekend and took a bunch more pictures -- have a look if you care:


Things are looking really good; they've painted the house and added all kinds of things. Kitchen cabinets go in tomorrow. Carpet and kitchen floor go in on Wednesday or Thursday.

I've got a bunch of details to work out this week, including life insurance, an updated will, getting a certified check for the closing costs, details with the builder, etc., etc. Will be busy, and it does detract from the time that I spend dissertating, but it will be worth it.

Network Solutions is the root of all evil.

I got a really stupid and poor-english speaking tech help person on Saturday morning when I called to ask why www.lam-mpi.org still didn't resolve in DNS. Suffice it to say that an extremely frustrating 30 minute conversation ensued. After several hours of cooling down, I figured that what that guy had told me couldn't possibly be correct (i.e., I had to wait to ensure that it was a factual response, not an emotional response), so I called again.

I sat on hold for about an hour (apparently the weekend staff is pretty small!) before I got someone. This guy was actually very intelligent and generally knew a lot more about his job than the first guy. We finally figured out the problems and he set some things up to make everything work nicely. lam-mpi.org is now propagating around the world and will soon come to a DNS server near you.

There still seems to be some kind of hitch between ns1.lam-mpi.org and ns1.squyres.com -- the DNS on squyres.com can't seem to do a zone transfer from the main DNS server. I think it might have something to do with Curt's firewall setup.

Chatted with Brian and Arun this weekend about the upcoming release and general LAM Things; we're really, really close. The last few beta tarballs have all been caused by AIX. AIX sucks.

Word to the wise: AIX's "make" is broken. Use GNU make instead. Unbelievable...

Much overhauling had to be done to LAM's web pages to get them to appear nicely on www.lam-mpi.org. In the end, we decided to start a whole new CVS module for these pages (which actually makes a lot of sense). Some of the directory structure has changed, and a whole lot of broken assumptions that version 6.3.something would be the most current version. All the tutorials and the MPI implementation list database are now part of www.lam-mpi.org. 6.5 is the default version of LAM/MPI. Oodles of fixes and changes to the web site.

I'm using checkbot to check that all the links and whatnot in the web site are correct. It's a handy little tool. It takes an hour or so to run, so it conveniently leaves me to do other things, and mails me when it's done. It dumps its output into HTML so it's convenient to view in a browser.

I found checkbot on freshmeat.

Ogg/Vorbis announced early this morning that they released beta4.

Woo hoo! I guess that since I have CVS write access to Ogg/Vorbis, I'm officially an author as well. Or am I? Either way, I'll take credit when things go Right, and disavow myself if/when things go wrong. ;-)

Interestingly enough, they changed their license from LGPL to BSD because too many .com'ers were afraid of the GPL. This is actually exactly my stance on the GPL for LAM/MPI. We're in discussions right now as to what to change our license to (right now we have a proprietary ND license, but it says you can do anything you want with the code -- it's more or less the BSD/Artistic license, but ND lawyers wrote it). So someday we'll change the LAM/MPI license, and probably to a stock license so that people understand it easily, but probably not until we go to IU.

I got my master/slave tree-based booter running yesterday.

Very cool. When it's done, it dumps the final tree that it used into GraphViz format so that you can view it as a jpg or postscript or whatever. The tree that is uses changes every time I boot on the helios cluster -- new downed nodes, and/or timing between when one slave steals work from another forces changes in the overall tree structure. It's very cool.

I need to drop in the final bits to make it do a complete lamboot and launch a lamd after the tree is made. This won't be too hard. But I'll probably work on the IMPI chapter of my dissertation first:

  1. I'm in a writing mood; I spent all yesterday afternoon doing web page content
  2. I printed it out while I was in Philadelphia and marked it up with a bunch of changes. I think I can get it in dissertation shape if I spent about a day on it

So off to dissertating I go!

Broadband to your washing machine

I just saw this great quote and felt the need to share.

It's from Chris "Monty" Montgomery in regards to the new release of Ogg/Vorbis. He was asked what was the difference in this release compared to the last release:

It kicks much booty. More booty than has been kicked in recent memory ... Booty being virtual ass, in this case.

I love software developers. We're really bizarre people.

February 27, 2001

Dave and Lisa have been conducting a secret office affair for the past several months

A darkness has fallen over the land.

Brian found some unexplained failures in LAM testing yesterday. These failures occurred on multiple platforms in different places. The failures were that test programs just "hung" for apparently no reason.

This is not good!

We're tracking this down, but it will take some time. :-(

Some quickies:

  • I'm dissertating on my IMPI chapter. It's basically the MPIDC paper, but I've added a bunch more sections and topics, and so far I've added one more figure (might possibly add more; after all, a picture is worth a thousand words). Even with just this one chapter, my dissertation is up to 41 pages. Woof! This has actually been the majority of my time recently.

  • Thank goodness for comments in code! Writing about IMPI (i.e., something that happened over 24 hours ago) has been difficult because I have to refresh myself with exactly how all this stuff works. Thank goodness I put in [sometimes lengthy] comments in all the IMPI code so that I can have a hope of remembering what I was thinking when I wrote that code.

  • My church (Epiphany) had their DSL router installed yesterday. I'm going to stop by on Thursday and see if I can get them up and running.

  • House stuff seems to be progressing well. T-6 days.

  • We finally decided that we goofed on the Cisco proposal yesterday. Lummy and I were wandering through a bewildering array of Cisco equipment, trying to decide what we should ask for, and finally realized that we had no clue -- we should have involved a Cisco sales rep long before this point. Since yesterday was the deadline for proposals, we decided to punt and do a propsal later, from IU.

Back to dissertating...

queeg has been up for 72 days without rebooting.

There are 926 xmms instanaces running on queeg out of 1005 processes total -- 92%. I predict an xmms crash in the not-too-distant future.

February 28, 2001

Don't mess with the guy with the way-back machine, Dave

Did much LAM work today, and very little dissertation work. :-(

The good news is that pending some final approvals and Arun finishing his part of the FAQ, and Brian making the RPMs, we should be ready to release! We're waiting on the approval of some of the Llamas, and then I might release it to the Linux distributors (RedHat, SuSE, etc.) and let them give it a final test -- just to ensure we didn't do anything stupid.

I did find one bug tonight; not in the MPI functionality itself, but in the new program lamnodes, which I fixed up. Amazing that it lived so long. We'll have to think of a way to test that kind of stuff -- our current test suite really only tests MPI things, not LAM things.

I'm downloading the new Mandrake 8.0 beta; I'll install it on my laptop. It has some things in it that I really want to try out (Eazel, Nautilis, Evolution, KDE 2.1, etc., etc.). I haven't used my laptop seriously in a while, so there's nothing important on it --
this is a perfect opprotunity to wipe it clean and try out 8.0.

Not much else exciting happened today. Did a bunch of house followup stuff, and, as I've done before, I'll spare you all from the boring details. Suffice it to say that everything is proceeding right on track, and things look good for Monday. T-5 days.

373 of the current processes running on queeg are xmms; that's 80%.

About February 2001

This page contains all entries posted to JeffJournal in February 2001. They are listed from oldest to newest.

January 2001 is the previous archive.

March 2001 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34