Archiving some more test results...
Per Lummy's suggestion, I have compared lamboot vs. a serial ring-like boot of several different sizes to compare the two different topologies. My hypothesis was that they would be roughly equivalent -- the rsh latency would dominate any bookkeeping and efficiency of the two codes.
I used the threaded scaleboot version -- not that it mattered, 'cause there would only be one thread/child anyway. Here's the results:
| Program | Number of nodes | ||
|---|---|---|---|
| 8 | 32 | 147 | |
lamboot | 0:23.1 | 3:18 | 15:xx |
| ring boot | 0:22.6 | 3:15 | 15:06 |
I unfortunately forgot to run /bin/time on the biggest lamboot, so I could only go off the timestamps from my unix prompt. Doh...
Also, with all this big testing with lamboot, I am soooo glad that I wrote lamhalt (to replace wipe) -- it takes down a running LAM by simply sending messages to all the lamds, as opposed to doing a whole new set of rsh's to each machine to kill the daemons.
As Arun says, "'wipe' sounds silly and doesn't have the syllable 'lam' in it." lamhalt rocks.