Main

General Archives

July 12, 2000

First entry

So here we are.

I'm copying an idea from several undergrads (Pete, Arun, Perk, Brian -- in no particular order; I have no idea who had the idea first). All besides Evil Brian have their own custom journal scripts (Evil Brian has his hosted on a .com). Similar to batch queueing systems, there's a complciated heirarchy of who is derived from whom, and who stole what features from who else (if the grammar is wrong, deal).

When I decided to get into "that crazy journal thing that all the wacky kids are doing these days", Pete gave me a copy of his journal code, and I thought to myself, "Wait, this can't be right. It's under 100 lines or so. Nope, can't be right." So what did I do? I wrote my own.

One week and 1,887 lines of new C++ code and 875 new lines of PHP code later, I have my own journal system. It's chock full of features; I think it will even write simple Pascal programs for you. But lest we be accused to plagerism, let's give full credit of the other code bases that I stole from to make the jeffjournal package:

  • Shell client: 1,887 lines of C++ code

  • Back end web support: 857 lines of PHP code

  • GNU readline library: 21,222 lines of C code
    While readline actually set me back about a week (moral of the story: be very careful about including configure-generated C header files in C++ code), it is truly cool and extremely useful.

  • inilib library: 2,178 lines of C code
    A truly cool project for reading/writing .INI-style files.

  • minime libraries: 11,585 lines of code
    This is my dissertation project. I pirated the use of the socket and console (i.e., readline) interfaces out of it.

So this is actually... well, it's a lot of code (can't do simple math anymore and am too lazy to fire up bc). Ok, this was just over the top. But what else are you gonna do with a DSL connection?


I plan on having some semblance of a journal out here for the world to see. Readers can expect to see gritty coding nuances, general musings on [un]reality, and lots of other boring things. Probably mainly boring things (I'm a geek, what do you want?).

Readers should not expect to get too many journal entries next week, and should expect to get none the week after that (I'm getting married next weekend; I've been verbotten to touch computers on our honeymoon -- what's a geek to do? Oh yeah... :-).

That's enough for now. Outta here.

July 13, 2000

Jeff's Journal

Got that + thing worked out -- the last journal message made it look like all the journal code stuff was written in C instead of C++. Ugh!

New C++ code
Still has some bugs to work out
Close enough now; sleep

In the words of Jimmy James, "No, I've never... had... much luck with jobs until I stumbled onto this multi-billionare thing."

Jeff's Journal

Eric Roman mailed out an interesting project that he heard about recently: rexec. Seems to be a new project under the old name for transparant and secure remote execution from the CS folks at Berkeley. Printed out the paper; it should make a good read.

More wedding things today; finalizing contracts with the Marriott (pizza-n-beer, yum), finalized numbers to Tippecanoe (rehearsal dinner, yum), table layout for the reception, etc. Getting down to the finer details -- T-9 days.

Spent the afternoon cleaning up the minime code -- I made bunches of changes to the socket and console routines to be able to write the shell client for the journal system (jjc).

Hmm.. just found an annoying bug in jjc: C-h C-h (i.e., hitting backspace twice) brings up the emacs -nw appropros list, but hitting C-g to abort the appropos list somehow makes jjc think that the emacs child has finished, and therefore jumps back to the prompt, but then seg faults and dies. Ugh! Gonna have to fix that one. :-)

Some guy mailed me today about parallel bladeenc today. Apparently, his company (www.scyld.com) is releasing their own MPI soon. He suggested that I add a two-line fix to parallel bladeenc that allows MPI_INIT to fail, and then allow it to procede in a serial fashion. This is a truly cool idea, actually. He was motivated by the fact that they support a "serial" MPI dynamic library that allows mpirun-less invocations of MPI programs. In contains stubs of all the MPI functions and simply fails (i.e., returns != MPI_SUCCESS) if you invoke any of them (e.g., MPI_INIT). Hence, if your code is smart, it takes the failure of MPI_INIT to mean that it should run in serial. So I made the quick change to parallel bladeenc; it'll go out in the next release (whenever that is).

Speaking of parallel bladeenc, I mailed Tord about a week or two ago asking questions about the MP3 format itself -- Jeremy Faller and I spent about half a day trying to make parallel bladeenc generate diffable output to serial bladeenc. We didn't succeed, and actually came up with many more questions than answers/solutions, but we understood why parallel bladeenc's output is different than serial bladeenc's. The parallel output is actually probably lower quality -- something we'd like to fix. But we can't do that until we understand the output format of MP3 more... Still waiting for an answer from Tord. :-(

July 17, 2000

Jimmy has fancy plans, and pants to match

More wedding stuff today. Spent all day waiting for a friggen' package from UPS that never arrived (they tried to deliver it Friday, left a note saying that they'd deliver it Monday). Ugh. Got lots other wedding planning stuff done, though.

Helped Don Peterson with some C++ stuff today. I sent him a bunch of code (that I actually tested), and then discussed mods to this the rest of the day (well, actually discussed my typos in the mails mostly
-- 'cause I was sending him mods that I hadn't tested -- ugh!).

Talked to the DoD investigator that covers this area again today. An undergrad who graduated from CSE a year or two ago is in the Air Force (did the ROTC thing here at ND) is being assigned to a "sensitive" job in the Air Force. I've talked to this investigator several times over the past several years about various other students who I knew who went on to various DoD/DoE jobs. Pretty standard stuff, actually -- not as impressive at it sounds. :-)

He's a nice guy. I've talked to him about his daughter (she's a Signal Corps LT, like me) and various other military stuff (he's ex military himself -- a warrant officer). We talked about the person he came to talk about, and then we chatted for a while before he left.

Sepeta is coming over later to watch Fight Club.

July 19, 2000

Donkey, donkey, donkey, donkey, donkey

Whoo hoo!!

Here we go into the home stretch... Journal readers should not expect another journal entry for about 1.5 weeks or so. It's Wednesday before my wedding, and I likely will only be in sporatic contact with internet-enabled computers (a new innovation, so I hear) for a while. There's much to do, and little time to do it!

vacation has been enabled, and I've proverbially passed the buck to others for the next 1.5 weeks.

My wedding day comes
Friends and family to South Bend
Screw the rest of you!

I wasn't an english major for nuttin'. Did I mention that I'm moving to Looieville?

See you all in 1.5 weeks.

("Hey, does anyone know how long Jeff will be gone?")

July 31, 2000

To the moon!

Back to reality.

What a week. This'll be a pretty long journal entry, as I have abbreviated entries for the entire past week in this one entry, as I have had little to no computer access the entire time (and I wanna know who bet that I would check my e-mail while on vacation -- they lost!). Some notes are kinda sketchy 'cause I didn't start taking journal notes until Friday or so. You'll deal.


Friday, 21 July, 2000

T-1 day. I spent the morning in the office hurriedly trying to finish the wedding program. My Big Thing was that the music had to be in the program (i.e., not just the words). Tracy's church in Looieville only puts the words in the Sunday programs, and it really annoys me because I don't know all their songs, and it makes it really hard to sing them. Since we have a lot of non-Domer folks coming to the ceremony, I wanted to put the music in the program.

So here's another problem: I decided to do the program in MS Word on the assumption that Tracy would be able to edit it as well. i.e., I could do some work, e-mail it to Tracy, have her make some edits, send it back to me, and repeat as necessary. Bad assumption on my part -- Tracy's MS Word couldn't read my file (i.e., it came out at garbage), even though they were the same version of word.

Know what I like about Microsoft products? Nothing at all.

Also particularly annoying is the scrolling behavior when in two-column landscape mode (that I used 'cause the programs were folded in half). If you go to the bottom of the left column and hit the down arrow, one would expect to go to the top of the right column -- i.e., go down with the text. Nope -- you go to the top of the left column on the next page. There's other non-intuitive (IMHO) scrolling like that was well. Needless to say, I was strongly wishing that I had just done the whole thing in LaTeX by the end of the ordeal.

I ended up scanning in the music and placing them in the document. It all turned out ok in the end, but I think that Word really made it take longer than it should have. Ugh!!!

Renzo (the best man) and Lynn (his wife) picked me up and we ran to Kinko's to run off the programs (I had some nice paper that I wanted to use). Kinko's could do it by 9pm at the earliest, but we needed them at the rehearsal at 5pm, so that was no good. This was kind of frightening, because Kinko's has never failed me before.

So we went to Copy Max (of Office Max). They were able to do it just fine. Dr. Romi was working, so I said hi to her as well. While they were doing it, Renzo and I went to pick up our tuxes at Bernardo's. Both of us needed slight alterations to our tuxes (which they do on the premises). While we were waiting, my dad called and was surprised when I reminded him to pick up his tux (<sigh> --
good help is so hard to find these days!). So I told him I would pick him up shortly and get his tux with him. John Shipman (another groomsman) also called during this time, so I told him I'd pick him up as well.

Renzo and I finished, swung by the Marriott and picked up my Dad and John and promptly went back to Bernardo's. We ran into Mark Payne (Tracy's brother, another groomsman) and her father getting fitted for their tuxes as well. After getting all of that straightened out, we ran by Copy Max and picked up the programs. John's response to the text that I wrote about him in program was, "Jeff, I have two words for you: rat bastard." BTW, be sure to ask him what "wizard fries" are. :-)

I got dropped off at my apartment so that I could change and go meet Fr. Hesburgh (Fr. Ted wanted to meet with Tracy and I for about an hour before the ceremony and have a chat). Tracy met me at his office on the 13th floor of the Hesburgh Library right at 4pm. While we were waiting, I looked around his waiting room and noticed a corner of it completely filled with military stuff. I saw a big picture of an SR-71. Apparently its the same SR-71 that he flew in and broke mach 3.3 in. This guy has had an amazing life, and is still a really down-to-earth guy.

Tracy had never met him before; I'd met him a handful of times. We had a nice chat, and Fr. Hesburgh gave us his collected wisdom of marriage from his life (he was a marriage counselor for many years, and has probably married thousands of couples in his time). I'm really glad that we were able to have him preside over our ceremony in the Basilica at Notre Dame -- it was way cool. If you've never met Fr. Hesburgh, I highly recommend making an appointment and just going to have a chat with him. He loves to meet with people (particularly current students) to just shoot the breeze. He's got some amazing stories and is probably the most famous person you or I will ever meet.

After our chat, Tracy and I went over to the Basilica for the rehearsal. The Basilica staff is very Draconian about schedules --
you have 45 minutes for your rehearsal, and that's it (which is completely understandable -- 4 couples get married there every Saturday; it takes a finely tuned machine to keep it running smoothly). We ran over a bit, but they were not able to interrupt Fr. Hesburgh (it's his church, after all!), which, I have to admit, we were kinda counting on. :-)

The rehearsal dinner was at Tippecanoe Place, and went very well. My dad gave a really nice speech at the end, and gave me his self-winding chronometer (a highly tuned watch, for all you laymen) that he got from Luzern, Switzerland (which, coincidentally, is where Dr. Lumsdaine's family is from, and is the name of 8 machines in the LSC) when he was a teenager. He gave a good speech which included the following statistic:

There are approximately 90,000 living ND graduates. Jeff has been at ND for the graduations of about 25% of them.

Wow -- if that doesn't date me, I don't know what will!

John, Renzo, and Darrell came over to my apartment for a cigar and a beer or two to calmly round out the evening. We hung out by the smoking table for perhaps the last time. There was a party going on in the apartment above mine, which was very amusing. Jeremy Faller and Kevin Barker their respective weekend significant-others showed up after a while, too. So we were all hanging out by the smoking table, which was fun.

After everyone left, it was just Kevin, Danielle, and me left at Chuck's old place. I packed for the cruise, and laid out my clothes for the wedding tomorrow.


Saturday, 22 July, 2000

Ms. Tracy Payne and I were married in the Basilica of the Sacred Heart on the campus of the University of Notre Dame on 22 July, 2000. Renzo and Lynn came and got me around 7:30am. Did a bunch of pictures before the ceremony (my parents were late... <sigh>). The wedding ceremony went well (aside from a little confusion about my name... :-). Pictures were good, too, but very numerous (a little rushed in the church, 'cause Hesburgh's homily went a bit long, but hey -- it's his house, he can do whatever he wants! Plus, it was a pretty nice homily :-). Oodles of pictures down in the grotto and whatnot, and then a limo with Renzo and V to the reception (Marriott, downtown South Bend).

The reception was a blast. It was way cool to see so many friends and family all in once place (thanks, everyone, for coming!). Started with a typical receiving line followed by dinner (ok, it was really lunch, but you have to s/lunch/dinner/g for a reception -- it's a protocol thing). Gotta love being at the head table -- you get served first! There was an open bar, etc., etc. Renzo gave a good best man toast. Cutting the cake went really well, too -- Tracy and I did an impromptu (and very minor) cake-on-the-nose deal that apparently went over pretty well (many "aww..."'s and "that's cute"'s, etc., etc.). When I was eating my piece of cake, however, Jeremy Faller had the verve to say right in my ear, "Hey Jeff... seafood!"

As a Pavlovian response (no, really!), I turned around to face the crowd, and did seafood with my wedding cake. Tounge out, cake/icing everywhere -- the whole 9.7 yards. True class all the way (Tracy was so proud. No, really!). Many flashbulbs went off, so I had better get a few copies of those pictures.

Sidenote: the only thing that I knew about my wedding for the past several years was that there was going to be free alcohol available during the whole schameel (Irish catholic and all that). We had an open bar before dinner, freely flowing wine during dinner (reference: Jesus/"that Cana wedding"), and open bar again after dinner. I mention this only because I was particularly proud to see the whole ND crowd cheer and stampede for the bar as soon as it opened again after dinner. I salute you, my fine feathered friends --
you inspire us all (reference: Bill McNeal/News Radio).

Many people danced, which was cool. The DJ did really well --
played all the typical ND songs which kept everyone dancing (except for the Madonna song, which cleared the floor -- and I again blame Faller [guilt by association]). I'll spare the details here, but I danced a good deal of the time, and still managed to greet most of the guests at least briefly.

After the reception broke up, we had a pizza-n-beer party (again in the Marriott) a few hours later in which a good number of people showed up (more than we anticipated, actually -- we ran the Marriott out of pizza, so we switched to hot wings). More way coolness, 'cause the setting was much more informal than the reception.


Sunday, 23 July, 2000

After all that, Tracy and I had to get up at 3:45am to catch our 5:15am flight to Miami (V drove us to the airport). Aside from being early, the flight went well, and we boarded the Royal Caribbean (RCCL) cruise ship Voyager of the Seas. It's an amazing ship. It's the largest cruise ship in the world (although not the largest ship in the world -- there's still a few oil tankers that have that prestigious honor). Here's some impressive stats about the ship:

  • It has more crew space than RCCL had on their entire first cruise ship.

  • I think there were 3200+ passengers on this trip; 108 honeymoon couples.

  • Voyager is several times larger than a US nuclear aircraft carrier.

  • It's so big that it has 2 wake-reduction generators under the ship to limit the size of its wake while in port.

  • It has no rudders -- it has three propellers, two of which can rotate 360 degrees to steer the ship.

  • Voyager has a climbing wall, miniature golf course, inline skating track, ice skating rink, countless pools, hot tubs, and bars, a full theater, 3 story dining room, a 3 story promenade, billions of deck chairs, etc., etc.

  • It's just fricken' huge.

Voyager is a most excellent example of Engineering with Extreme Prejudice. Tracy and I actually borrowed my friend Darrell's 3-tape video series about the design and building of the ship. My deep admiration and respect goes out to all of the designers, architects, and builders.

So anyway, we arrived in Miami with no problems (although we were dead tired), and got to the boat via a shuttle bus. Did I mention that it's a big fricken' boat (hitherto referred to as BFB)? There was a monstrously long line for check in, but it actually went pretty quickly, and we got on the boat in fairly direct order.

After wandering aimlessly for a little while, we found our cabin (#7572). It had a little couch, mini table, dresk (i.e., combo dresser/desk), several large dressing mirrors, a mini safe, a closet with several shelves, a bathroom, a queen-sized bed (or possibly king-sized -- we never did figure that out), 2 nightstands, a phone, and a balcony. The balcony had two chairs and a mini table. The amount of furniture makes the whole arrangement sound larger than it really was; it was actually fairly... cozy (we're convinced that the cabin was actually built around some of the larger pieces of furniture [reference: Engineering with Extreme Prejudice]). But it was ours for the week, so it was perfect.

We wandered around for a bit (did I mention that this was a BFB?) and had lunch in the Windjammer Cafe.

Sidenote: It seems that they use the same names for things on all RCCL boats. Tracy and I took a cruise on Granduer of the Seas a few years ago, and it also had a Windjammer Cafe . Indeed, many of the other cafes, bars, pools, etc., etc., had the same names on Voyager as they did on Granduer. Coincidentally, the Cruise Director (i.e., the main PR face) was the same guy from our previous cruise on Granduer. This must have been a promotion for him --
Voyager has been at sea for less than a year (launched in November of 1999), and apparently RCCL took the brightest and best from its other cruise ships to staff it.

Sidenote: Food on a cruise ship is amazing. There's no end to the supply of it and it's all free. Drinks are just about the only food that you pay for. Sodas and regular stuff like that come free when you're having a meal, but you have to pay for them when you get one from a bar, for example. Alcoholic beverages always cost money. But you pay for everything with a cruise charge card (which also serves as a room key); no cash is used on the boat. Pretty handy, actually. And it works out well for RCCL, because you have no concept of how much money you're spending. Anyway, cruise food is never ending; there is really good food available just about 24 hours a day. It's a truly amazing feat of logistics, actually --
providing chef-level food (i.e., with all the little garnish decorations, ice sculptures, people in tall white hats, etc.) for so many people in various locations around the BFB around the clock. Let's call it Cooking with Extreme Prejudice.

We had a mandatory muster drill before the ship sailed. This is apparently required by maritime law in an attempt to prevent the need for movies like Titanic from ever being filmed again. All passengers meet on the muster deck underneath their life boat and stand in rank and file to for an attendance check (kinda like the Army). Our muster captain's name was Regina. Even though it was 4:30 in the afternoon, it was hot in the Miami port. The passengers were somewhat restless, but we got through it.

There was a lot of activity in the port while we were sitting there, waiting to sail; powerboats, jet skis, and even a water-based airplane were going hither and thither. Some powerboat even sped by the entire Voyager and mooned the entire BFB during the muster drill. Needless to say, this involved having his ass in the breeze for probably a full minute or so as his boat sped down the length of the BFB. True class!

We got a package with our cruise that entitled us to a bottle of Champagne in our cabin upon sailing, so Tracy and I enjoyed it on our balcony while sailing out of Miami Port. It was amazing to see how many powerboats, jet skis, and people on shore stopped to wave as we sailed. Indeed, a large number of cars pulled over on the highway to watch us go, too. Since there are a non-trivial number of cruise ships that have Miami as their home port, you'd think that Miamians would be jaded to seeing the cruise ships set sail. Apparently not. But this does raise the question: why is the fundamental human response to seeing a cruise ship sail by to wave? Without fail during the entire week, whenever we sailed by some group of people, one or more of them would wave. Is this a Pavlovian response? Have all of us, in some prior life, been conditioned to wave at cruise ships as they go by in order to receive a food pellet? Maybe it's just Waving with Extreme Prejudice.

We also discovered that our room's TV actually functioned as an interactive system that provided not only tons of information about our scheduled island stops, but allowed us to order room service, check our cruise charges, order excursion tickets, etc., etc. Pretty neat, actually.

The main dining room serves dinner in two shifts: main seating and second seating. Tracy and I opted for second seating. It is typical for cruise ships to ask a few demographic questions about you when you buy the ticket for the purposes of (among other reasons) finding compatible people to seat you with during dinner. However, there was some kind of mix up with our table. The matrid'D (whatever) took us to our table, but it was filled to capacity with 80 year old ladies. So they had to move us to a different table (which wasn't a bad thing
-- while I personally have nothing against 80 year old ladies, we were glad to sit with people closer to our own age). Amazingly enough, they did this with big paper maps of the entire dining room rather than on a computer. We got moved to table 476 with the following people (whose names we did not remember at all on the first night):

  • Randall and his 8 year old son Blake from Texas. Blake (who appeared to be both highly intelligent for an 8 year old as well as highly annoying), only showed up to dinner once that week, though, and Randall only showed up twice. Indeed, you can get food just about anywhere on the boat -- the main dining room is not the only place to get dinner. I guess they didn't like us. Bah.

  • Marty and her 18 year old son John. Friendly folk from the San Francisco area.

  • Tina and her 14 year old son Peter. Also friendly folk from New York city.

  • Mercedes and her ?15? year old daughter Daniella (not sure I spelled those right) from Florida. Nice people, but kinda quiet. They also usually sat directly on the other end of the table, so Tracy and I didn't get to talk to them much.

All in all, a pretty likeable crowd. Not exactly our age bracket, but much closer than the little old ladies at our real table. Tracy and I thought it highly ironic that we, the honeymooners, were at a table of divorcees with their children (indeed, we were pegged as honeymooners on the first night), but it actually worked out really well. As you'll see below, we got along quite well with everyone and had a great time all week. Indeed, we were frequently among the last to leave after dinner every night.


Monday, 24 July, 2000

This was a day at sea en route to our first destination: Labadee, Haiti (see Tuesday). Tracy and I did nothing, and did it all day. I mainly read Cryptonomicon while Tracy sunned on deck (while I did come back with a little bit of color, I'm not much of a sun worshiper. I sometimes come out of Cushing at ND at night and am surprised to see that entire weather systems have moved in and out during my day at work, completely unbeknownst to me).

The ship was moving at 17 knots which meant that it was really windy on deck. Some things that I have noticed so far:

  • Many families are using walkie talkies to communicate with each other on the boat. I wonder how well they work -- i.e., if you're in the depths of the BFB, do they really work well enough to talk to your mother on the upper pool deck?

  • The staff on the ship use 2-way phone/walkie talkie things to communicate with each other. And they even work when we're out at sea, miles from any possible commercial cell coverage. So do they have their own cell on the boat itself? Hmm. Interesting.

  • The rank of the officers on the boat is widely different: the lowest seems to be indicated with shoulder boards that have a narrow white strip on a wide yellow stripe. But the shoulder board strip combinations are widely different after that -- different widths of yellow and white stripes, sometimes white on yellow, sometimes just plain yellow, etc., etc. I'll try to figure this out over the course of the week.

  • All several hundred cash registers on board the BFB (the various shops, the bars, etc.) all use flat screen touch-sensitive monitors. No keyboards. This must have cost a large chunk of change! But it seems to work well for them -- very little footprint and no additional keyboard, and you can do all data entry with an index finger. Didn't really get a chance to look at them (they're inevitably always facing the other way), so I don't know what OS they were running, but it's probably either some flavor of Windoze or a custom OS/application. Probably 'doze.

Had lunch at an on board Johnny Rockets (reference: cruise food, above). Apparently, Johnny Rockets is a chain of 50s-style burger joints, complete with the staff in white aprons, paper hats, 50's music blaring out of jukeboxes, etc., but I'd never heard of them before. Had a good burger and shake (but it was not a $5 shake, mind you). I think the most surreal point of my Johnny Rockets experience was when the whole staff got up to do the Hand Jive when it started playing over the jukebox. Let me clarify exactly why this was surreal: the entire staff was multi-ethnic -- not a single soon-to-be-DWM (i.e., no Caucasians) among them. This is not intended to be a racist statement -- it just struck me as odd to see the Hand Jive, in which you picture John Travolta and a bunch of other decidedly white 50's males with greased back hair and leather jackets, performed by people from other countries (literally; every staff member's nametag also identified the country that they were from --
Voyager's crew was from something like 50+ different countries). Their English was markedly better during the song, too; is that how America is known and identified? By show tunes from Grease? If I ever get mistaken for a foreign spy and am interrogated by the CIA, am I going to have to (in addition to knowing all the world series and superbowl winners from the past 100 years) be able to sing any Grease show tune upon command?

We also attended a wine tasting in the afternoon. We got to sample nine different wines, which was pretty cool. Most of them were good, but I didn't like two of them. The people at our table (don't remember any of their names) immediately pegged us as honeymooners as well.

We went to the show before dinner -- an "intro" show, which had several acts, all punctuated/MC'ed by the Cruise Director.

Dinner attire was "smart casual" -- I wore my new suit. John showed us a game called "spoons". It's one of those "try and figure out the rules" kinds of games, so I won't go into detail here. I happened to figure out the rules first, which was irritating to the others at the table (reference: cocky, flippant, arrogant). I then introduced everyone to "Big Black Frying Pan" which, although different, is along the same lines. Tina was about ready to murder someone by the end of dinner because these games can be quite frustrating when you can't figure them out, but much fun was had by all.


Tuesday, 25 July, 2000

We arrived at RCCL's private area on Haiti: Labadee. In the words of a stand up comedian that we saw on the boat, "Labadee is apparently the Haitian word for 'damn hot'." Labadee is a little peninsula with nice beaches and all the usual water sports. Tracy and I rented a jet ski and took a tour several miles down the Haitian coast with it.

Neither of us had ridden a jet ski before, and it was BIG fun. We had to watch a Yamaha safety video before skiing off, which featured a perky US Coast Guard officer giving all kind of rules and safety tips. I found this pretty ironic, since we were in Haiti.

I drove down the coast, and Tracy drove back. Did I mention that jet skis are way fun? (reference: Top Gun movie, "I feel the need... the need for speed!", reference: Fr. Hesburgh's SR-71 flight) Our guide pointed out some nifty things about the island, all of which I promptly forgot. For safety reasons, they had us drive in a single file line, [supposedly] 100 yards behind each other. We got suck behind Slow Redhaired Lady twice, which was kind of a drag (pun intended), but other than that, the speed was great.

Jet skis are not hard to drive: just squeeze the trigger/throttle, steer with handlebars, and go. The only trick to get is that the steering is waterjet-powered, and can be delayed by fraction of a second or so -- something you have to get used to and compensate for.

The driver wears this harness thing that has two hand grips on the side for the passenger to hold on to. Since I drove down first, I had the harness on first. When we switched half way through the trip, we were somewhat rushed (since no one else switched drivers), and Tracy didn't adjust the harness at all, and it fit very loosely on her (there's just more of me to love, that's all!). Hence, the hand grips were pretty useless to me, and Tracy almost bounced me off the jet ski a few times. Much, much fun. I highly recommend it.

After the jet ski tour in the morning, we went back to the ship, got lunch on board (although most of the food service had been temporarily moved to the island), and went back and lounged on the beach for the rest of the day (i.e., I sat in the shade and continued the Cryptonomicon).

There was a "repeat cruiser"'s reception where they were passing out Champagne like water, so Tracy and I naturally attended. Got a closer look at the Captain's rank: 4 medium-wide yellow stripes with a big yellow diamond at the top. I think there are a small number of other ranks that have yellow diamonds as well.

The dress at dinner was "formal". I had rented a tux from the ship to wear that night (they tell you ahead of time that two dinners will be "formal dress"). This was Blake's one and only appearance at dinner, and he annoyed everyone by figuring out the spoons game within minutes (I told you he was smart!).

We went to the show after dinner, which was a stand up comedian. He was ok -- somewhat repetitive, but we laughed.

Sidenote: friends of mine mentioned that they didn't want to go on Voyager because it's just too many people -- the tendency to wait in line for things would be just too much. However, I've noticed that we rarely wait in lines very long. They seem to have the crowd/traffic control issues worked out pretty darn well (reference: Engineering with Extreme Prejudice). Yes, there are billions of people around, but once you get past that, it doesn't really impact much. There are, however, a noticeably larger number of children on this cruise than there were on our last cruise (many other people have remarked on this as well).

When we returned to our room, we found a manta ray made of towels on our bed. Very amusing and rather cute -- it was made by the cabin steward when he made up our room. I think our cabin guy from our last cruise did something similar as well. A friend of mine told me that when she went on a cruise, their cabin steward would make crash-test dummies from their clothes. For example, when they came back from dinner one night, there was a pair of legs and feed sticking out from one side of the bed and a body, arms, and head sticking out of the other (all made with their clothes), making it look like the bed had fallen on the crash-test dummy . Funny stuff.


Wednesday, 26 July, 2000

Arrival at Ocho Rios, Jamaica.

We slept in and got room service breakfast (reference: cruise food). We lounged around our balcony and continued to explore the ship before our afternoon excursion into Jamaica.

We signed up for a yacht tour that left right from the same dock as Voyager. The first stop was the Dunns River Falls. The falls were actually impressive enough -- a gently sloping 900 feet in the vertical direction, quite beautiful, and you actually can climb the falls (the main attraction). However, the climb was actually somewhat frustrating, because you are limited by really slow people in front of you, so you can take about 3 steps and then have to wait. So we both walked away from there with a less than "that was awesome" feeling.

The yacht tour continued on to some waters off the coast of Jamaica for snorkeling. We were further annoyed that they didn't have enough snorkel masks for everyone on the boat, and Tracy and I had to wait quite a while for someone to finish before we could go snorkeling. And then the water was really choppy, and Tracy got a little queasy. So all in all, the yacht tour was kind of a bust.

The BFB set sail again around 5pm, heading for Cozumel, Mexico. We went to a honeymooners reception that night, where, again, Champaign was poured freely (who can ignore free alcohol?).

Dinner attire was "casual". Can't remember anything eventful from dinner, but I'm sure it was fun. :-)

When we returned to our room, there was a towel elephant waiting for us.


Thursday, 27 July, 2000

Another day at sea, this time en route to Cozumel, Mexico. We basically did nothing all day again; I continued reading Cryptonomicon and Tracy sunned on the deck.

We went to the Bingo game in the afternoon. They play all week and have a rolling jackpot (more below). We didn't win at all (they play 5 games in one session), but it was fun anyway (must be deep-seated Irish/Catholic roots in me that enjoys a good rowdy, full-contact game of Bingo -- Bingo with Extreme Prejudice).

Dinner attire was formal, so I wore my tux again. I had a blue paisley vest this time, though, instead of the standard black cumberbund that I wore last time. We had a formal portrait taken too (same package as the champagne in our room when we first sailed). But we didn't go to the main dining room -- we went to the quaint Italian restaurant that you have to get reservations for (although everything is still free -- reference: cruise food). The food was excellent, and we got a nice bottle of wine with dinner.

Went to the show after dinner, entitled "Dreamscape" where we met up with Tina, Mercedes, and Marty. The theater is really quite excellent, and I haven't really talked about it much yet, so I'll describe it now. It's a 2-floor theater (main floor seating and a balcony), very nicely decorated such that you can easily imagine that you're in a mid-sized playhouse in London. The stage setup is very high-tech -- they can do many different kinds of effects and have tons of props, curtains, booms, etc. They even have an orchestra pit and movable sections in the state (i.e., in the vertical direction, which was handy during various portions of the shows). The sound booth was in the back on the first floor, and the lighting booth was in the back of the balcony (why do the lighting cronies always get shafted?). Full bar service on both floors with waiters/waitresses, which was nice.

"Dreamscape" was a bit trippy, but parts of it were good. My favorite part was several people dressed up in [apparently] velcro suits that would throw themselves up on a wall (Letterman-style) in various shapes and letters and whatnot. Very amusing. There was also a stand up comedian at 12:15am that we wanted to see, but we had to get up early for our tour in Cozumel, so we didn't go.

I accidentally put the "do not disturb/please make up room" card out facing the wrong way -- it said "do not disturb" so we didn't get a towel animal this evening. But we heard that it would have been a little dog.


Friday, 28 July, 2000

Arrival at Cozumel, Mexico.

We signed up for a rather lengthy tour of the Tulum ruins -- a Mayan city. This is actually on the Mexican mainland, not on the Cozumel island. So we took a ferry to the mainland, and a bus to the city itself. Our tour guide took us around the city a bit and told us all about it. Very cool stuff, actually (note to self: gotta investigate the Mayan numeral system -- the Mayans were really into math and calendars in their lifestyles and religion). Only a few buildings were left standing, but you could walk around much of it.

This was apparently the last city that the Mayans built, and actually enclosed it within a wall (which is evidently unusual for them). They did some amazing things with sunlight -- they made specific holes in walls and buildings so that on the equinox and solstice, the rising sun would appear in specific places in rooms, walls, etc., etc. Truly, the entire city was built with fundamentals and exactness that required Engineering with Mega-Extreme Prejudice. I wonder whether many modern contractors could achieve the level of exactness that the Mayans did (piping sunlight through strategic holes in walls and buildings across the entire city, for example --
amazing).

The city was directly on the coast, too; there were paths down the cliff which the city was built on to walk down to the beach (important for sea trade, apparently). They even had a light house to warn for reefs and whatnot.

After returning from the Tulum tour, Tracy and I ventured out to Cozumel itself for some shopping. I was looking for a good t-shirt, but came up empty (they all appeared cheesy to me. It's amazing how I'll take and wear any freebie computer t-shirt, but when it comes to buying one, I'm extremely picky). Tracy got a silver necklace. We walked around a bit and saw the waterfront of Cozumel, but then had to return to the ship before it sailed.

One surreal experience: on the approximately 3-5 minute cab ride from the BFB to downtown Cozumel, I saw 42 Volkswagen Beetles. Yes, 42 (and that's not even counting the VW busses). Not the new models -- the old-style VW beetles (and many of them were fairly new). Absolutely incredible. If you ever have a desire to get a VW Beetle, go to Cozumel. Apparently they still have a VW Beetle factory in Cozumel, hence, in an amazing show of local support, everyone proudly drives around in their locally-made Beetles yelling whatever it is that proud Beetle owners yell (in Spanish). Either that, or it's just amazingly cheap to buy a Beetle there.

Dinner dress was casual. I introduced Peter to the concept of placing a sugar packet on the handle of a fork (or spoon, but forks give straighter trajectories) and slamming down on the curved end to launch the sugar packet across the room. The heavier sugar packets work better, such as pure sugar cane sugar. It's actually amazingly hard to do right -- it's difficult to get any distance our of the sugar. It's a delicate balance of placing the sugar correctly on the handle of the utensil and hitting the other end just right to get any kind of distance. If you don't perform these steps just right, any/all of the following will happen:

  • the sugar packet will only go straight up (and therefore straight down)

  • the sugar packet will veer wildly off-course and end up in the soup of someone at an adjoining table

  • you'll end up launching your eating utensil across the table/room

What followed was a medley of sugar football, where just about all of us at the table tried to make field goals from as far a distance away as possible. I actually managed to make one down the length of our [fairly long] table into Marty's lap (a perfect 3 pointer, if I do say so myself!). The rest were comical attempts that usually ended up horribly wrong (oops) followed by our whole table pretending that nothing happened ("Jeez, I don't know sir -- we don't have any sugar packets mysteriously ending up in our soup. Must be a problem with your table; you should call technical support."), punctuated by waiters, wine stewards, or any other Person of Responsibility walking by. Great fun was had by all (mothers included!).

When we got back to our room, there was a towel monkey hanging from the ceiling in our room. The best part was that he was wearing Tracy's sunglasses. It was so funny that we had to take some pictures with it.


Saturday, 29 July, 2000

Another day at sea, this time en route back to Miami.

Yet another day of doing nothing (one of the important reasons we took this cruise -- to relax!). Much more reading of Cryptonomicon and jotting notes for this journal down.

We went to the afternoon session of Bingo -- the rolling jackpot was over $10k. It works like this: the last game of the session is always "cover all", meaning that you have to get every number on your board before you can call Bingo. They start the week with a coverall bingo jackpot of some value X (which is some complicated formula that has to do with how many people play, the number of letters in the Roman number representation of number of seconds since midnight on January 1, 1970, and number of revolutions the engines have made since sailing away from Miami). You win the jackpot if you cover your board within the first 50 balls called. If no one wins, the jackpot rolls over to the next session (where a new and entirely different formula is applied to calculate the new value of X to add in).

So anyway, it's not unusual for the jackpot to be huge by the end of the week. During the last session of the week, the jackpot goes to whoever is the first to cover their board regardless how many balls it takes. Hence, everyone and their brother (and their dog, cat, and platypus) shows up for the last session. Tracy and I got to within 2 numbers on one of our boards, but didn't win. The jackpot was actually split between two winners -- lucky sods.

Nothing else memorable that day -- just lots of relaxing. There were some interesting lightening storms off the port side of the boat within the clouds and whatnot; very beautiful. Some rain actually came over the boat, too; Tracy and I were sitting in one of the covered hot tubs at the time and just watched the sheets of rain plummeting down onto the deck, with various thunder claps and lightening flashes. Cool.

There was a "goodbye" show before dinner which had several kinds of acts magic, comedy, music, dancing, etc. Not a bad show.

We played more sugar football at dinner (casual dress). John wasn't there last night, so he was introduced to it this evening. Two of Peter's friends joined us during desert (their parents had already finished dinner and left), so we introduced them to sugar football as well. I repeated my record-setting distance, but also flipped my fork all the way down the table as well, knocking over a glass and scaring the bejesus out of the new kids (no pain, no gain). Again, more fun was had by all. An elderly woman at an adjoining table was glaring heavily at us. Marty pointed her out to us, and as a unit, everyone at our table turned and looked at her (reference: cocky, flippant, arrogant). Most amusing.

The string quartet came by our table this evening and asked for requests. John, being a smartass, asked for "Stairway to Heaven". And wouldn't you know it -- they knew it. I've never heard Stairway rendered on an acoustic guitar, two violins, and a huge bass before. Most interesting. They did a pretty good job, I have to admit! But it was still surreal.

Tracy and I had a final stroll around the ship after dinner, and then went back to our cabin to pack (you have to put your luggage out before midnight so that they can collect it for debarkation in the morning by order of your flight time). No towel animal this evening; bummer.


Sunday, 30 July, 2000

We ran into Marty, John, Tina, and Peter in the morning right before debarkation. Said goodbyes and the like.

Flight from Miami to O'Hare was no problem (although the mysterious ecosystem that we call "airline travel" [hitherto referred to as the Nemesis] somehow changed our flight number and moved back our departure time by about 15 minutes. While this was slightly alarming (since the Nemesis had previously not informed us of this fact), it was actually no big deal because our layover in Chicago was supposed to be over 2 hours). However, upon arrival in Chicago, we discovered that our flight to South Bend had been canceled. Doh!!!

What followed was several hours of standing in line, attempting to communicate with lower echelon Nemesis peons (LENPs), and generally trying to discover a) where our luggage was, and b) how to finish our journey to South Bend. These are seemingly simply tasks, however they proved to be difficult to find answers for.

The location of our luggage is still a mystery -- it is currently lost within the vortex of the Nemesis. We hope to find it tomorrow (Monday); multiple LENPs assured me that it would find its own way to South Bend, and magically be delivered to my door. I attribute this proposed luggage self-exploratory behavior to the non-Euclidian properties found within the Nemesis (reference: price/distance ratios found on such sites as BizTravel, Travelocity, etc.); indeed, to my knowledge, my luggage has never moved itself before, but it is relatively new luggage (just got it this past Christmas), so it may have habits that I am unaware of. We ended up getting a rental car voucher from American and driving back to Sound Bend (which turned out to be uneventful).

Since we got a point-to-point rental (i.e., ORD to SBN), mileage and time don't matter -- the car just has to be at the SBN Avis terminal within 24 hours -- we decided to spite the Nemesis and drive straight to Macri's and celebrate being home with some Big Beers. Most excellent.

We're back in Turtle Creek now. Spoke briefly with Dog on the phone about news from the past week and checked my e-mail; only had 10MB of new mail, or 360 new messages (much, much lower than I thought, but I did unsubscribe from most lists and remove myself from most aliases before I left last week). Read some of the most important-looking messages; I'll check the rest tomorrow. Found several messages for Jeremy Faller on my answering machine (which I find rather amusing -- most were from a woman from his moving services who adopted an increasingly annoying tone that Jeremy was not answering her messages). Also found that the ceiling in my bathroom is leaking from the apartment above me again -- the floor was rather wet and smelly. Gonna have to talk to Turtle Creek management about this tomorrow.


Monday, 31 July, 2000

Well, this journal entry has taken a good amount of time to write, so we get Monday as well. :-)

The LENPs have located our luggage, and indeed, it has mysteriously made its way to South Bend by itself. We picked it up when we returned the rental car. Since then, it hasn't moved by itself (at least when I was looking); it must be tired from the trek to South Bend from Chicago.

Tracy and I spent the rest of the day packing her car with more junk from my apartment. There's now very, very little left. Mainly my TV, VCR, the server, an ND flag, some clothes, and all the junk in my office. Gotta take my stereo receiver in to Best Buy to get serviced, though -- I think 2 of the 3 video channels have been fried over the years (it's under warranty, so the service should be free. Woo hoo!).

Gonna go head in to work now, see if I can catch Lummy before he heads back to Cali, and say hello to everyone in the lab.

August 1, 2000

Platypus face

Finished Cryptonomicon this morning. A mostly good book, but I have to admit that I was a bit disappointed by the ending. It was too vague, and tried to imply a lot of answers but really left a bunch of things unanswered definitively (but not in a "wait for the sequel" kind of way). Plus, some of the things that were tied up in the last few pages of the book were (IMHO) plainly obvious by that point, and it was just a relief to get to the point where they actually stated what you had been assuming for the last 100-200 pages.

It's a monster of a book -- over 900 pages long. There's a bunch of good WWII storyline in there, as well as a somewhat-weak storyline about setting up a data haven in the modern world with a bunch of cool crypto stuff trying the two together. So my review: the first many-hundred pages were ok (indeed, the style of the book takes a few shifts a few hundred pages in), but the ending was decidedly week. Still, I'd recomend it to others.

I spent most of yesterday reading and answering e-mail, but spent a few hours with Jeremiah discussing what he wants to do for a master's project. He was initially leaning towards doing STL in OOMPI (to which end he's been cleaning up OOMPI and gearing it up for 1.0.3 release -- a nontrivial task!). It's been good, I think -- it was an excellent introduction to "real world" computing, and how hard it really is to write Quality Software.

In the past few weeks, he has been running regression compiles and tests on all kinds of combinations of platforms, operating systems, and compilers. He hacked up a bunch of shell scripts to do this, and has generally learned a lot about it (try it yourself -- it's a lot harder than you would think). But this has inspired him to move away from STL/OOMPI and to tackle a long-standing issue for the LSC: a rock-solid regression compiling and testing agent that can be used to perform compiles and runs on all manner of combinations of setups such that it can be used to test software before it is released. We talked about this for an hour or two last night and brought up all kinds of issues. He seems pretty interested in it, and it could be a great project for the lab as well as a good master's project.

Had to fix up some weirdness on wedding.squyres.com today -- it seems that the Apache's were spinning endlessly and creating a huge load. Dunno exactly what caused it, but Ed and Don have been working on their fantasy football pages, so they may have tickled some PHP bug or something. Restarting apache seemed to fix the problem. Gotta setup virtual hosting for their hostname, though. Will do that tonight.

Heading down to Looieville soon -- taking the latest Mandrake CD with me, and will bring my SBN router with me. The SBN router will become the router down in Looieville (hence, the web server, router, mail server, and soon, the DNS server). The current Kentucky router will become my desktop workstation and just sit behind the firewall. Might do other services from that machine (i.e., DHCP, NFS for home dirs, etc.). I plan to setup bind in a week or two, too -- Darrell and I will be secondaries for each other. Hence, my router machine will likely become squyres.com as well.

I'll probably keep the mail services on pennyhost, though. Who knows -- I might take that over as well, but I'd want to find some web-enabled email management software first (i.e., a good webmail client, ability to change forwarding/storage, etc.). A project for a future day.


Just found out that the OIT Solution Center sells W98 CD's, but only the first edition -- not OSR2 (hasn't OSR2 been out for 1-2 years now?). How much do they suck?

Do you know what do I like about the OIT Solution Center? Nothing at all.


Answered some IMPI mail apparently from the guy at HP who is working on their IMPI implementation. Looks like we may have left a sentence or two out of the IMPI standard -- he raised a valid clarification issue. Oops. I've pinged Judy and Bill at NIST to see what they want to do about this (i.e., how to fix the doc).

Tons of LAM and other MPI messages remain in my inbox -- will have to start getting to them tonight...

August 3, 2000

Chocolate moose musings II

Take II on this entry (note to self: write some kind of primitive HTML tag checker to ensure that tags are closed properly in journal entries).


Spent the entire yesterday rearranging the computer room in Tracy's (er... our) apartment. Reconfigured the network to incorporate my router box properly -- now I have a desktop machine (albeit with a flakey 3Comm card... #$!@$!@$!!!!) that is not responsible for the router, web server, etc., etc. Still not finished yet, but we're closer.

The new router is the latest Mandrake (but without the latest Kernel -- couldn't get that to work with ReiserFS properly. Screw it). Its currently running apache/php/mysql and sendmail. Future plans include mailman and bind. I kinds need DNS running soon, 'cause mail is currently kludged to look like it came from lsc.nd.edu (shh!!); need a proper squyres.com name other than wedding.squyres.com. :-) It doesn't appear to be perfect yet (Don's still having X forwarding issues via OpenSSH), but I've already removed the monitor and hidden it under the desk.

The desktop is a compaq desktop with serious I/O suckage. I just backed up all the data on it [temporarily] to AFS, leaving the way clear to upgrade it to the latest Mandrake when I return to KY on Saturday. I'm also a bit wary of upgrading that machine because it has some special SCSI drivers in it that took Dog and I *several* days to get right the last time we installed Linux on here. Let's hope that these SCSI drivers are mainstream enough to be in the main distros these days!


Also spent a bit of time yesterday helping Don and Ed configure their fantasy football league on www.fhffl.com (which is really wedding.squyres.com gotta love DSL!) -- it's part of a long-standing deal which is now probably defunct because Lummy is likely moving to IU, but what the hey. In helping them possibly move to a real database rather than text-file-based data storage, I had to explain a lot of database concepts to them (no DB background at all, but they're smart guys). We're having another infamous "beer-n-computer science meeting" at MBC tonight. Yummy. Will code for beer!


Mmmm... Chemical Brothers... mmm...

While I'm upgrading everything, I just got the latest linux netscape (4.74). Let's see what kind of mess it can create, now!


Went to see a Louisville River Bats minor league baseball game last night with Tracy and a bunch of people from GE (a freebie from the good folks at GE). The stadium is brand new -- only been operating this year. The game was by no means a sellout, but it there was a pretty good sized crowd there. Nice stadium, too -- bigger than the Silverhawks stadium -- it even has an upper deck. Their mascot is a purple fuzzy dude who has some flaps hanging off his arms that are supposed to pass as bat wings. He came out during the later innings with a t-shirt gun. Very amusing -- it could launch tightly scrunched t-shirts into the upper deck from where he was standing near the dugout.

Met several of Tracy's coworker's kids, had some beer, and mmm... ballpark hot dogs. Is there anything in this life as good as a ballpark hot dog and/or brat? Quite yummy. And to top it all off, we won the game. The River Bats had a cool 3-run homer in the first or second, sucked for most of the 2-7th innings, and then had a rally and won something like 10-5.


Now I gotta drive back . Will solve the X forwarding problem later (seems to have something to do with the fact that openssh X auth != regular ssh X auth, and the fact that Goofy's shoes, contrary to popular belief, were at least 2 sizes too small).

August 4, 2000

Jeff's Journal

Tied up some loose ends today:

  • Checked into the error that Arun reported that he was getting with parallel bladeenc; couldn't reproduce it. Sent him the latest copy to try. Turned out to be an embarrassing use of a variable before it was initialized in LAM's mpirun. Additionally, we accrued command line arguments into a fixed-length string that could be overflowed (oh for STL strings...). Doh!

  • Replied to mp3check author dude (see previous entry).

  • Finally fixed the "delete" button in the MPI listing stuff; I think it was malfunctioning before and deleting all the data in the database. Oops!

  • Replied to Bill George at NIST about some pending IMPI errata w.r.t. IMPI_H_ACKMARK and IMPI_H_HIWATER --
    the IMPI doc doesn't clearly state how these values should be arbitrated. Bill and I are discussing what the mechanism should be. Actually, the mechanism is clean: min(a, b). The discussion is between where the value should be applied universally to all hosts or on a host-pairwise basis. I'm [currently :-)] in favor of the latter. We'll see how it works out.

  • Installed GNU mailman 2.0b5 on mail.lsc today. Apparently the previous versions had some security problems. Oops. I tried to setenv CFLAGS to -fast, 'cause there is a small C portion in mailman (most of it is in python), but it still used just "-O". I suspect non-careful use of AC_PROG_CC in its configure.in script (curses, autoconf foiled again!!).

  • Got minime in a compilable state again. Working on a primitive html tag checker so that I won't leave unterminated tags again. It should bitch if you leave tags unterminated when you finish typing the rant, and automatically closes them if you "submit" without fixing them. Simple stack-based thing (gotta love the STL!). I also added warning if it removes "LocalWords:" lines when you submit (not when you re-edit).

  • Finally had a meeting with the Grad School people (they're nice and reasonable people once we all get in a room together and talk over the issues -- they even want to take us out to lunch for our troubles. Free food -- strong>woo hoo!!), and we worked out all the "final" kinks in the ndthesis style. Changed a few things in the sample thesis, and we should be good to go!

  • Helped Jeremiah ship OOMPI 1.0.3. It's on Freshmeat now --
    everyone go check it out! Artificially inflate our stats! Whooo hoo!!

Still to do:

  1. Finish dissertation. Graduate. Earn lots of money. Take over the world (the DomeCam's still down, after all).

Must go join Brian and Pete for wings, beer, and a last "hang out" night at Chuck's old place.

I say, deliver me from Swedish furniture!

Tales of an ND grad student

Got back up to ND. Lummy was here in the office -- I thought he was still in CA; pleasant surprise. Looks like we'll be heading to Berkeley in a few weeks for a few days for some design meetings about the BLD. Should be fun and interesting.

OOMPI 1.0.3 is just about ready to roll, but I found a possible problem; may require more testing...

Inilib is getting closer, too -- perhaps in a few weeks. It still rocks, though -- it's heavily used in this journal client, for example (gotta love pre-release access!). :-)

Spent last night talking with Ed and Don about databases and their fantasy football setup. They bought the beers and dinner, so I guess I couldn't complain. I gave them a database on www.fhffl.com, and they'll start playing with the setups that we described last night. They do some nifty things with pulling down info from other web sites (NFL sites and the like) to feed their data pool.

Mmmm... the power of PHP and MySQL... mmm...

Got a reply from the mp3check author dude (it doesn't work on big endian machines). He claimed to have fixed the endian problems, but I found a bunch of compiler issues (I'm assuming that he's using g++ --
wow, does g++ suck!). Even after getting it to compile, it still doesn't work on big endian machines properly. Bonk! I sent him a reply with tons of info to keep him busy.

It seems that I have way too many MP3s out in AFS space -- I filled up the lums CCSE volume. Whoops! The irony is -- I literally tried to download them all to wedding.squyres.com earlier that day, but realized that I don't have a hard drive large enough for all of them, so I deferred to this weekend when I'll buy a new hard drive large enough to hold them all (I currently have 8GB of MP3s, and that's perhaps 1/4-1/3 of my CDs). Since the CCSE volume was full, I downloaded a bunch of them and deleted them of AFS to give much more working space.

August 5, 2000

Leaving Las Vegas^H^H^H^H^H^H^H^H^HTurtle Creek

I closed a major chapter in my life today. I left the apartment where I have lived for just over 6 years. Indeed, I have lived in South Bend more-or-less continuously (minus some summers and Army time) for a few days shy of 11 years -- the majority of my adult life.

However, I'm about to start a new chapter, too -- I'm moving to Louisville, KY, to go live with my new wife, Mrs. Tracy Payne Squyres.

At the risk of sounding sentimental, I feel compelled to present a few reflections of my mixed feelings.


I have been gradually moving my stuff down to Louisville over the past two months or so. Still, today was the final day of my lease, and (by design) I loaded the last of my stuff in my car this morning, cleaned the apartment thoroughly for the last (first?) time, locked the door, and left.

It was surprisingly hard. I'm not an overly-sentimental kind of guy; indeed, I'm from the MTV generation and have the attention span and short-term memory of a skiddish cat. The apartment itself is pretty crappy; it's small, didn't have too much sound protection from other apartments in the building, had very hard water, crappy cabinets, etc., etc. But it was home. I have lived in that location for quite a long time -- it had become a part of me. I've had many good times, many bad times, and some just downright weird times in that apartment. The good times always come to mind first, which is one of the reason that it was hard.

This morning, as I was cleaning and packing, I was musing on the history of my time in that apartment. This is the end of a 7 year streak -- I initially moved in with Mr. Huy Phan (EE grad student) back in the summer of 1994 (he had some other roommate for the previous academic year; I never knew who it was). Huy eventually moved out and went back to France. Mr. Brian McCandless (CS grad student) then moved in with me. Brian graduated a few years later, and Chuck (EE grad student) moved in. Chuck was only around for a semester and a half; Kevin Barker (CS grad student) moved in before Chuck even left. So that apartment has seen a continuous stretch of a single lease since 1993 -- 6 people. And I got the clean the apartment today. Did I get the short end of the stick, or what?

I found all manner of interesting things in the apartment today:

  • A grand total of 41 pens, pencils, markers, and various other insundry writing utensils. And all of my commonly-used pens are already down in Louisville -- where did these come from? Why did we have them? We certainly didn't write that much. A mystery.

  • I found -- still in shrink wrap -- a mini gas grill. Who the heck did that belong to?

  • I also found a boom box. I have no idea whose it is, nor how it got into my apartment. The left channel doesn't work, but I'll bet that it could be fixed fairly easily. I gave it to Pete and Brian.

  • The couch that Tracy bought (used) in her freshman year and gave to me when I moved in the apartment in 1994 has now been passed on to Pete and Brian. May it continue to give them good service.

  • The Christmas lights that have hung in the apartment for years (literally), and have been on continuously since April or so (it's all about uptime, baby) have also been bequeathed to Pete and Brian as a symbol of Bachelorhood.

All in all, I was surprisingly happy -- albeit sentimental --
about moving out today. This is surprising because I absolutely detest moving; after loading each carful of stuff over the past two months, I always found myself emotionally drained because a little piece of me was leaving. But today was different. I realized that I actually do have closure with this place -- I'm ready to move on and become a husband and start the next chapter of my life. This move has been planned for quite some time now, and I guess that I've been subconsciously preparing for it all along.

Flashback to last night. I went out with Pete (just graduated CS from ND in May, and is just starting as a CS grad here this semester) and Brian (CS undergrad, starting his senior year here at ND) -- the same guys who inherited most of my stuff. For those of you who don't know, Brain has been one of my students for a year or two now; Pete worked for me for about a year as well. We went to BW-3s, had some wings and beer, and played trivia. It was much fun. We went back to Turtle Creek, had a few more beers and pizza, and used the Smoking Table one last time. More fun. In short, it was a perfect evening; we just hung out, were generally stupid, and got a little philosophical at times. These guys will become the next set of urban legends in the College of Engineering at Notre Dame; I am leaving ND in capable hands.

Back to today.

I said goodbye to Troy (one of the maintenance guys at Turtle Creek), and asked him to be nice to me when he does the final inspection of the apartment. He always liked us, and took pretty good care of us (when things broke, he always came pretty quickly and fixed them). I said goodbye to my apartment (it's a thing that I have -- I always have to say goodbye to places that I've lived), and got in my car and drove away.

Metallica's "No Leaf Clover" was playing on the radio as I drove away.

Goodbye, Turtle Creek.

(that was surprisingly hard to type)


Louisville -- here I come!!! Woo hoo!!!

#              ##
### #
# #
##### #
# #
### #
# ##

(yes, that's a Unix banner ":-)")

Chapter 29

9:45pm EDT. Took a little longer than usual because of weather and construction.

But I am now home.

August 7, 2000

Tales of a Fourth Grade Nothing

Spent much of yesterday opening wedding presents. Yummy! Got lots of free stuff. Got lots of stuff that we didn't ask for, but hey --
don't look a gift horse in the mouth (who came up with that expression, anyway?). We cataloged everything in our handy-dandy wedding software database (don't laugh -- there are a good 10-15 wedding software packages out there these days; it's big business! And it was truly helpful in organizing stuff). Now comes the hard part -- gotta write all those thank you letters.

Got a 40GB hard drive from CompUSA for all my MP3s. I'll install that RSN. Got some net books at B&N, too. There's a new Cussler book out, but it's still in the Big Paperback size (which is just about as expensive as the hard cover). The final Reality Dysfunction book is still not out in paperback (bonk!). And the latest Area 51 book is still not out yet. (Ok, I just admitted it to the world --
I'm into cheesy sci-fi and action books for recreational reading. You'll deal.)

Got some replies from Tord about parallel bladeenc. I read them, and I think I understand what he's saying. Unfortunately, action on these items gets pushed on the stack until other things finish up. :-(

Setup GNU mailman on wedding.squyres.com for Don and Ed. Might move my journal mailing list here, too, but probably not before I get bind running to give this machine a decent name. Still haven't quite decided what to do with squyres.com mail yet, because several other members of the family use it, and I don't feel like hosting it. Hmm. Will require some thought.

Went over to Laura and Paul's later because they had tons of extra food from a wedding that they went to on Saturday. Saw Melinda and Reuben as well. Good fun. Came back and crashed afterwards. Mmm... sleep...


Oops - the GNU mailman that I setup for Don/Ed isn't quite functional. Had to fix a few things (I only briefly tested the web interface yesterday before we left for Laura/Paul's). Mailman's woes seem to be related to some sendmail issues, too.

I've now spent a good chunk of this morning fighting with sendmail w.r.t. my firewall and whatnot, and getting it to do what I want (it still doesn't). I remember the days when sendmail setup was simple and easy to understand. Wait... no I don't.

August 8, 2000

I drank what?

Spent too much time on ndthesis yesterday. Hopefully, we're 100% done with it.

Went out to switch my cell phone down here to Louisville yesterday (SBN's Alltell just got bought by Verizon, which is everywhere --
quite handy for me!), and found out that I only had something like 20 days left on my contract. So I ended up upgrading to one of those whacky digital phones that has voice mail, call waiting, no roaming (important because I'll be traveling a bunch), etc., etc. I think it even writes optimized high performance scientific code.

Talked to Faller yesterday; he sounds like he's doing well in Bahston. He had some ideas regarding parallel bladeenc and Tord's replies to us; he's still convinced that we can generate output from parallel bladeenc that is diffable to the serial bladeenc. The crux of the issue is that the parallel and serial outputs are the same up until the last frame of the first slave's output. And even that frame is the same... until a point. This is the point where slave 0 runs out of input data, and therefore -1 pads the rest of the frame (it took us a while to understand that this is what was happening). The next slave's output is completely different from the serial output --
it's not like the serial output is then just shifted down into the next frame (which would be easy to fix). I think it has something to do with what Tord mentioned: that MP3 is only differential within each frame, but does depend on a small number of bytes from the previous frame (which is somehow not strictly classified as differential across the frames -- I think it has to do with framing setup and the like, although it does affect the output data).

Anyway, Jeremy is convinced that we can have the master re-frame the output data from the slaves and thereby create diffable output. He's gonna spend a few days reading the MP3 file formats and papers; we'll talk again when he's done.

I rediscovered the Goodness of Streaming Audio yesterday. Gotta love DSL.


I was hit by two inspirations a few minutes ago, which I promptly mailed off to Arun (who is giving 1.5 LAM talks today):

LAM: The Code to Glory
PVM: The Code Less Traveled

Don't get me wrong -- while I'm certainly not a PVM guy, nor would I ever write any new code in PVM, let us not downplay the importance of PVM in the Grand Scheme of Things. It was the first widespread "standardized", portable parallel code tool ("standardized" is in quotes because it was really only a research project -- it wasn't a real standard). Hence, it was the first time that you could write a parallel code on one kind of machine and run it on others (rather than have to re-develop it for every new kind of parallel computer that you tried to run on). Plus, it worked on clusters -- a prime candidate for development of parallel codes (especially considering that running on the Big Iron costs $$$).

So my statement really reflects the Way It Is Now -- most new parallel users use MPI, not PVM. Indeed, many parallel hardware vendors don't actively develop PVM anymore; they only develop their MPI. However, there are probably uncountable millions of lines of legacy code out there. PVM is like fortran -- it will never really go away.

And this is not to say that MPI won't some day be replaced by something More Useful. I'm quite convinced that MPI is not The Answer; it's just the best that we have right now.


Spent this morning answering some backlogged LAM mail. Will spend the rest of today finishing off all the current backlog of LAM mail, continuing setup of queeg (my Linux desktop -- was having some problems getting SSL/pine to compile), wedding gift reconciliation (one of our registries screwed up an allowed people to buy 3-4 of an item that we only ask for 1 of), and minime hacking.

In the words of the Ancient Masters, "After 3 days without programming, life becomes meaningless."

August 9, 2000

Smashing the stack

Spent time yesterday and today going over the complexities of Health Insurance. I have become convinced that Health Insurance is a scam run by a bunch of ex-patriot armadillos down in Arizona. Only they could dream up such convoluted and bizarre rules, regulations, policies. Or perhaps it was just a committee.

I say, deliver me from ex-patriot Arizonian armadillos!

Engineering: overthrowing armadillos.


In other news, I finished the next round of enhancements for my journal client:

  • it warns you about unclosed html tags, and will [admittedly stupidly] close them if you submit without fixing them

  • it removes some tags automatically, like <html> and the like

  • it warns you and automatically removes "LocalWords:" lines so that you can run ispell on your entry and not have to worry about remembering to delete those lines before you submit

Perk pointed out HTML Tidy, which does more or less what is outlined above, but doesn't do the disallowed-tags thing. But it is much smarter about closing tags, replacing incorrect tags with real tags, etc. It also [unfortunately] automatically adds a <TITLE>, which I don't want it to do.

Who knows -- might replace my functionality with HTML Tidy someday. But this works for today, and prevents one <strong from messing up all journal entries.

All for the glory of LAM.


Sadly my telephone headset is falling apart. I need new ear muff thingies (the current ones are flaking off one little black flake at a time), and some wire in the cord is loose -- it cuts in and out randomly. And you know what they say about hardware problems... Actually who the hell cares what they say? Just go buy another one; hardware isn't interesting.

Finally, I got to spend a little quality time with minime today (woo hoo!). Continued to work on the encryption and authentication schemes for the sockets; not quite right yet, but see an older journal entry that describes the scheme.

Minime: coding for fun and profit. Actually, wait, I'm a grad student. s/fun and profit, leaving us with "Minime: coding for".

Perfect.

August 10, 2000

Cleveland rocks

Got minime to compile on Linux again. A while ago, I did some ugly things with signals in a solaris/sysv-specific way that disallowed compilation on Linux for a while. Finally got around to fixing it today; this marks the first journal entry in quite a while that has been submitted from a Linux box instead of ssh-ing to ND to use the Solaris journal client (which is ironic, actually, since the journal server is sitting right here next to me --
ssh-ing up to ND made the data go much farther to get to its ultimate destination). Whooo hoooo!!

At Brian's advice, I went and got Mozilla M17 (source). It's still compiling.

I love inilib. It does such nice things for me. :-)

Motivation for saying that: Perk and I have been having a conversation about using the "HTML Tidy" program to clean up journal entries before they are submitted vs. using an internal parser (that I have already written). Turns out that "HTML Tidy" is 95% better --
it's much smarter about closing tags, but it does a few icky things. Best way to resolve it? Have a user-definable option! Let them choose between the internal parser and HTML Tidy. And inilib just takes care of storing that for me. Make today an Inilib day.
Favorite phrase of today: "beaten on the head by a Mozilla stick."

Faller asked for a copy of the LSC Coding Standards today. Must be spreading that to the good folks at Analog Devices. LSC: The World Domination Tour.
Did some LAM work today; added auto-generation of man pages from structured comments in source code. It's something a) I foolishly promised on the LAM list, and b) oh yeah, users indicated that they wanted on the LAM user survey. Kinda neat, actually. Had to re-create man pages for MPI_Comm_spawn and MPI_Comm_spawn_multiple --
I had made all the MPI-1 man page comments back on Dec 31/Jan 1 while I was waiting for the world to implode. Had to do some icky things to fool automake into a) putting them in the distribution, and b) installing them when "make install" is invoked. Yuk.

Speaking of LAM, finally resolved Mr. Pascal's issues with LAM/MPI. Turns out that you have to use a special option to the Free Pascal compiler to tell it to link to libc; if you manually link with "-lc", it won't work (for lack of a longer explanation). I asked about such a thing a month or two ago in the initial set of e-mails with Mr. Pascal, but he didn't know about it then. Yesterday, we initiated contact with the Free Pascal developers, and they immediately mentioned this special switch. Oh well, live and learn (but try to avoid Cobol whenever possible).

We're still getting bounced messages from the LAM list from <ptavares@dsg.dei.uc.pt>. Dog claims they're nowhere to be found on mpi.nd.edu's sendmail queue, but the bounces keep coming back. We'll probably get them for another 2-3 days, <sigh> I'll be very happy when we can switch to GNU Mailman (gotta wait for IU vs. ND decision first).

Continued to rip my CDs. It's going nice and slow, but now I have plenty of disk space.

Back to minime hacking...

August 12, 2000

Entry of a 1000 URLs

I'm up in South Bend, and yes, my cell phone works. It's not in digital mode, though (bummer!). I found that yet another company has been sucked into the Verizon Wireless void -- Air Touch Paging. So I tried to send myself a test text message from their web page, but it didn't work. Perhaps it will when I return to digital areas on Monday...

In other news, I had a good chat with Loomsdale yesterday (sorry, that's Dr. Loomsdale to you, Gentle Reader). We haven't really connected recently, especially with this whole wedding thing of mine that happened shortly ago. T'was good to catch up. Got a few more details on the whole IU vs. ND thing (sorry, not at liberty to put them in my journal, so get off my back already, ok?!?!) -- we'll see how that plays out.

This led to more chats with Jeremiah, Rich, Dog, and Brian, which led to nothing productive getting done yesterday. Dog and I finally gave up, got some food at Wendy's, and went to see Scary Movie. If you accept it as a totally stoopid movie, it's actually quite amusing.

I introduced Dog to some of the wonders of PHP and MySQL last night, too.

Stayed at Ed and Suzanne's last night. Saw them this morning and we chatted for a while. Came in to work and did various LAM/MPI things:

  • squashed a bug relating to -laio not propogating down to hcc and hf77 properly when compiling with ROMIO

  • squished another w.r.t. profiling and using both MPI_Init and PMPI_Init in the same program

  • played with CVSweb and ViewCVS, mainly to see if it would be worthwhile to put the LAM/MPI CVS repository out on the web for read-only access (a thought that has been nagging me for a while, especially since 6.3.3 has taken so long to release!). Decided that I liked ViewCVS better than CVSweb. I mailed the LAM/MPI mailing list to see if anyone would be interested.

  • in the middle of fixing some warnings from the Portland Group C compiler

  • played some more with the doctext package from the MPICH group to fix some bugs w.r.t. the nroff-generating code. I'm iterating with Bill Gropp about this -- it affects the man pages that get generated for LAM/MPI.

But now, on to more interesting things! Minime calls. Had some interesting minime thoughts yesterday while driving up for Looieville. We'll put those in a separate journal entry.

August 16, 2000

Is one of us supposed to be a dog in this conversation?

It's been a bit since my last journal entry; the lapse is mostly due to travel. Woof! So here we go...

Added a few new features to the journal client: you can now preview your journal entry in lynx and/or netscape before submitting it. I'll probably add one more option to run HTML tidy (either automatically or manually -- haven't decided yet).

Spent this past weekend at Notre Dame. I was supposed to meet some friends, of whom one is entering ND's law school this semester. Signals got crossed (read: I had the wrong time in my palm pilot --
DOH!!!) and I missed them. So I spent the weekend with Suzanne and Ed, and helped them buy a laptop, second hard drive for their desktop (for Linux, of course), and a new monitor. Spent much of Sunday afternoon/evening installing stuff on the latop and desktop. The desktop's modem was flaky under linux; it was most frustrating. I think I have a spare to send to them.

I found out that I definitely don't have text paging enabled on my cell phone. I got back to a digital area (why is Verizon/SBN still analog? Grumble) and tried to page myself from their web page. It said that the page was sent, but it never came in on my phone. I guess I could pay more for such a thing, but I really don't think that I need it.

Saw Lummy on Friday and Monday; had some good chats with him. The Big News is that he's going to stay at ND. He accepted ND's offer, and we're just going to reap the benefits from it (read: lots and lots of funding!). Some side effects: guaranteed post doc funding (woo hoo!!), a new computer for me (800mhz soon-to-be linux box). Rock on!!

I noticed today that ND's college of engineering started giving out engineering rings this past graduation. I want one! Luckily, I've got a graduation left at ND, so I'll likely get one. :-) Pretty cool things, those rings.

Started looking at Vorbis as an alternative to MP3. I've had a disappointing show of contributions and whatnot from the bladeenc community -- Jeremy Faller and I still have some unanswered questions about MP3. Ogg/Vorbis appears to be a much cleaner process, and an active development community. It is supposedly Much Better than MP3 in terms of quality, documentation, legal issues (i.e., there are none), and encoding speed (the beta encoders are already faster than real time). They even have an XMMS plugin, which means that it's good enough for me!

I started a "has anyone thought about parallelism?" thread on the vorbis-dev list today and got several immediate replies. Talked to one of the dudes who is -- I think -- one of the main contributors, and we came to the conclusion that it should be possible to do a similar thing to the vorbis encoder that I did with parallel bladeenc (although there are still some unanswered questions). So it might be interesting.

Must continue with minime hacking now... must code minime... must code minime... must code minime...

August 18, 2000

Tastykakes

Another day of coding.

Sent off an old modem to dad (mom's modem got fried last week). Sent an old CD rom to John, along with an ISA card and associated cable. Damn, I'm just a nice guy.

I'm re-downloading M17 mozilla (from CVS this time, as if it will make a difference) so that I can get SSL in mozilla (see previous journal entry about how netscape must die). It will probably compile for the next few hours.

inilib meeting with Brian today got pushed off until tomorrow morning.

Did a reconcile of our wedding registry gifts between what the stores say we got and what we actually got. We got more of some things than the stores listed, which means that people found deals elsewhere. I'm all in favor of people saving money when they buy us stuff.

(netscape just finished downloading itself, and is now running configure)

I really need new foam ear thingies for my telephone headset --
they shed on my ears and it looks like I have a five o'clock shadow on my ears (and my, that's attractive!) after using it. Must remember to go to Radio Shack tomorrow...

Getting closer to LAM release. The RedHat folks are freezing this weekend (did I mention this in a previous entry? Can't remember), so they want a version that is "as close as possible", but we're anticipating putting a out a LAM update RPM when it goes stable. Ugh
-- and he (the RedHat Guy, whose name has a non ASCII character, so I can't type it too easily) found an embarrassing bug in the 6.3.3b27 that I put out earlier today. 6.3.3b28, coming right up!

Found a particularly annoying bug in tping today; I thought it would be a simple bug to fix, but turned out to be hard to find until I realized that some buffers were getting allocated too small, thereby creating overflows. Damn the overflows -- LAMming speed!!

I have one major issue that I want to solve before putting LAM through all the regression tests; he can't get PTY support to work on SCO Unix -- at least one LAM node bails before MPI_Init. Hmm. I can't tell if it's his setup or if LAM is actually doing something wrong. Hmm.

dell.com says that my new computer is estimated to be shipped on August 22. Yummy!

I ripped all my Yes and Led Zeppelin CDs today. I have many, many Yes CDs. I'm in the middle of Pink Floyd now. I installed the beta vorbis XMMS plugin, and it works like a champ. However, it takes up 100% of the CPU vs. single-digit% when playing MP3s. Hmm. Let's hope it gets better (it is still beta, after all).

My lynx, netscape, and LocalWords enhancements to the journal client continue to please me.

My DSL alarm light just flashed alarmingly. I think it's a signal for me to go to bed.

Forces of Nature

Spent some time with Brian and inilib today; fixed a bunch of things in the docs, but it looks pretty damn good. Grasshopper has learned much in his inilib time.

"When you have learned to snatch the error code from the trap frame, it will be time for you to leave."
- The ancient masters

Spent some more time with the SCO LAM user who's been having problems. One of the two problems can definitely be chalked up to UTFS; the other may also be (he's testing now, and has to install a new compiler). In which case, LAM may be in the clear for all the regression tests and eventual release!!

Looks like the trip to Berkeley is going to happen next week. So I'll likely be up in the Bend in the early part of the week, and go to CA from there.

August 21, 2000

Of Palm Pilots and Daisys

A productive weekend. Forgive typos; on a low-bandwidth link (minime doesn't seem to want to compile on Linux again... grr...)

My new computer unexpectedly showed up on Saturday (wasn't expecting it until mid this week or so; most likely after I went to CA). Wooo hoo!!! It's decked out to the gills (I can't resist the opportunity to list all its power features):

  • Pentium III/800mhz. 32k L1, 256K L2.

  • 256MB ECC/RDRAM.

  • 20GB disk.

  • 12x DVD drive (and windoze DVD software).

  • 8x/4x/32x CDRW drive.

  • 3 button mouse.

  • Altec Lansing THX speaker setup. This stuff is amazing -- an approximately 2'-per-side cube subwoofer and 4 speakers. We hooked it up to the VCR on Saturday to watch Episode one -- amazing sound!

  • 21" monitor trinitron monitor (19.8" viewable, .25-.26 dpi).

  • 32MB DDR nVidia GeForce2 GTS 4x AGP video card (I don't know what most of those letters mean, so I assume it can be directly translated from Ancient Hebrew to "fucking cool").

  • Windows 98 (I tried to keep a Windoze partition, but it completely barfed with my network card, so Windoze is gone gone gone... Linux!).

It's fast fast fast. However, I have noticed I/O constraints that are typical on Intel architectures. Oh well -- you can't have everything (where would you put it?). But with the speed of this machine, I'll likely do at least some local development rather than ssh to nd.edu and doing everything from up there.

As practically obligatory, I went out and bought the Matrix DVD to test my DVD drive with. Hopefully, I'll get to test it later today (gotta find some Linux DVD software...).

Other things this weekend, did some "apartment" errands; got me a bookshelf, keyboard tray-thing for my desk, a 4 drawer filing cabinet. Tracy got me a warm fuzzy robe for my birthday (because I really liked the complimentary robe on our cruise); soon enough it will be cool enough to wear it around here. Might as well subscribe to the telecommuting lifestyle, eh?

I should point out that this new 'puter ran rip/encode CDs like nobody's business (and what's what I've had it doing...).

August 22, 2000

No anchovies, please

Arun's tattoo has a drop shadow (saw it in person for the first time today). Way cool.

Quotes from the lab tonight:

"Yeah Yassir!"

"To the US Army!"

"He did some amazing work on wicker..."

"There's the 'public journal Arun' and the 'private Arun' that is much cruder and more disgusting..."

"Yeah, I thought I was going to have a life, but when I plugged it all into CorporateTime, it turns out that I can't swing it. But I've got a really cool room... so that's gotta count for something!"


Yeah, I think that about sums up the night fairly well.

That and we saw some fireworks at the end of the Mob quad to celebrate the beginning of the school year (and free food!); fireworks on campus -- a first for me.

August 25, 2000

The first rule of the LSC is...

Extremely interesting quote from the paper on small-world phenomenon:

This we see that minimizing the transmission rate of a network is not necessarily the same as minimizing its diameter... in addition to having short paths, a network should contain latent structural cues that can be used to guide a message towards a target.

I finished the paper today (ignoring all the complicated math stuff that went right over my head and into the wall behind me. I hope I don't get fined for the mark that it left).

CorporateTime may be nice, but it certainly has an interface that rivals that of a blind baboon's arrangement of sock drawer. I can't tell you how many times I made incorrect appointments in ctime last night because it put pm when I was expecting am, or when it put am when I was expecting pm, or, even worse, when it put pm when I really meant pm, but I changed it to am on general principle (or vice versa). I'm guessing that the ctime interface designers were in the Southern hemisphere, where all this makes sense.

I should mention that I went to see Arun's room in Stanford Hall last week. I promised him that I'd put it in my journal, but thought better of it so as not to ruin the surprise for anyone who hasn't seen it yet. So all I'll say is: it's FABULOUS. If you haven't been yet, I strongly urge you to go see it. It's much better than Cats; I'll go see it, again and again.

I'm helping proof a book that Jeremy is writing -- spent much of the day doing that. Hats off to Jeremy for a great use of the word "esoterica". To celebrate, everyone should use the word "esoterica" in a sentence today. Together, we can form a secret personhood of politically-correct dictionaphobics who use big words just for the pure art of it.

Also started Dog doing some LAM development. Yet another reason why LAM will take over the world -- when you have programmers like Dog, who in their right mind will refuse?

August 26, 2000

Colored and mixed paper only

Interesting note that I discovered in pine yesterday and only correctly identified today... I'm on a few ezine lists, and have been for quite a while. Only yesterday did I actually scroll down to the bottom of one of the messages (past all the advertisements, etc.). At the bottom was a note that did not look like it was part of the letter -- indeed, it turns out to be a message from pine itself:

[ Note: This message contains email list management information ]

where the "email list management information" is a menu option. Selecting it brings up a pine screen explaining that the message contains meta information that can automatically unsubscribe you from the list... select here to unsubscribe. Not a difficult thing to implement, but I've just never seen pine be able to do this before, so it must be some kind of standard.

Indeed, it turns out that a line in the message's header triggers it (names changed to protect the guilty):

List-Unsubscribe:

And it seems that pine can handle more than just List-Unsubscribe -- there must be some set of approved tokens after List- that pine knows how to handle. Interesting random note.


Lummy and I rented the Fight Club DVD last night so that he could see it ("The first rule of the LSC is that you do not talk about the LSC. The second rule of the LSC is that you do not talk about the LSC!"). The plan was to watch it on his new Viao (I know... don't even bother mentioning it...). We got back to our hotel (Skanky, Inc.), but the DVD wouldn't play. With a little further investigation, we discovered that the DVD decoding software had not been loaded. Lummy's playing with it now (the Win2K CD was here at the office); we'll give it a whirl later.


I noticed today that Ace of Base's song Wave Wet Sand has some satellite-like noises in the background (not that I've ever actually heard a satellite making noises, but I've seen enough movies to know exactly what they sound like such that I can pick them out of a lineup without any hesitation. "Yes officer, #3 is the same exact sound from the KDP1138 from Enemy of the State"). Coincidence, or plot? Only higher volumes and sleep-induced learning will tell.


Back to proofing the GGCL stuff...

August 27, 2000

Posession of a stolen shovel

Just saw Harold and Maude on DVD with Lummy (on the Viao, but got a bigger screen and real speakers for this). Trippy, yet interesting movie. It's somewhat of a mix of "get everything you can out of life" intermixed with a bunch of really funny suicide scenes (the fire one, I think, was my personal favorite). Definitely a black comedy if there ever was one. Where else can you see a Jaguahearse? Or a mother who wears a different wig every day?

I'd recommend it. It's a funny movie if you're in the right mood. There's a bunch of subtle things in there to keep you thinking, too. Overall thumbs up: I rate it as 5 minutes.

August 29, 2000

xor is not good encryption

It's been a while since I've done a journal entry, mostly because I was traveling all of yesterday. Woof. Let's see what has happened...


Spent most of the day down in the "lower Bay area" at Cleanscape -- the Attol people. Saw overviews of their products (which are pretty cool, IMHO) for testing software. It all started at SC99 when I saw their products/docs at their booth. Pretty cool stuff -- it would represent a fundamental change in the way that we do software in the LSC, but I really think that it would be a positive change, and allow us to write higher quality code.

Saw their presentations all day, met bunches of their people, etc., etc., and had lunch with them. In addition to the Attol line, we also briefly discussed their "qef" tool, which is a "make" replacement. It has a lot of the features in it that we have discussed in the context of the Software Carpentry stuff, but it has the disadvantage of being proprietary, and therefore not useful to us since we want to distribute source code (i.e., users would also need "qef" in order to compile our stuff). At present, it cannot "export" its build process, for example, to work on systems that do not have qef installed. Bummer.

Lummy had another meeting after this, and I went in search of a Fed Ex to send Jeremy the edits that I had made to the GGCL docs. After a good bit of searching (and I didn't even have a map!), I found a Kinko's with a Fed Ex drop, but the last pickup of the day had already happened (it was about 5pm by this point). So then I had to find the real Fed Ex place and then go pick up Lummy.

We chatted a bit more about the Attol stuff. He's somewhat against it, mainly for the reason that the test suites that it generates need the Attol run time systems in order to run. This stuff is proprietary, and distributed in binary form (e.g., libattol.a), and therefore we couldn't distribute it to anyone. Hence our test suites would only run for us, not for anyone else. The Cleanscape people were nebulous about "perhaps we can work out an agreement for distribution of the run time...", but neither of us have faith that that would actually be able to happen in a way suitable for freeware. Additionally, there's a pretty steep price tag. We should be able to afford it, but it's always a concern.

We got the latest/greatest version of the software from them, and will probably install it in nd.edu for Lummy and others to play with (Rich Lee and I played with it several months ago; we both liked it). We may also be able to make a "fake" Attol run time library that would be suitable for distribution -- stub out the necessary functions with little or no content in them. We'll see how it goes.

Needless to say, it's fairly obvious how I'm leaning -- I think these tools would be great for the LSC. It would get us out of the testing framework business, something that has occupied a lot of our time in the past. It also gives us cross-platform testing capability
-- any flavor of unix [that is supported by Attol, which is just about all of them], and 'doze. Could be useful.

We got back to the lab around 7-8pm after unsuccessfully trying to find food in the South Bay area. We got High Tek Burritos instead --
I got the world-famous Godzilla High Tek Burrito. I highly recommend it to anyone coming to Berkeley.

Answered all the e-mail that had piled up during the day, and started on some issues with inilib that Brian raised. It got late, we got tired (I had done a lot of driving...), and we left before I finished.


This morning, I set to work on inilib again, and saw an email from Brian with a key insight to solving the current issue (having to do with the compiler complaining about non-const references in temporaries). Running with that, and with the ultra-cool C++ keyword mutable, I was able to fix things the Right Way. inilib is looking good. We have a code review scheduled for this Friday, but I think we're essentially done. Getting very close to release! I'll plug it into the jjc/Minime later today and see how it really shapes up.

Had several hours of BLD planning with Lummy, Eric Roman, Mike Welcome, and Paul Hargrove. More discussions/arguments/resolutions. Looks like neat stuff. Lummy and I are going to spend some time writing out a list of requirements from what we have figured out in our "round table brainstorming" sessions, and see if the process can move forward more formally after this.

A great word emerged from the planning sessions -- "flamework", which evidentially means something like, "a framework that we're all arguing about."

September 1, 2000

All things being equal, LAM rocks

And version 1.0.6 of the MPI 2 C++ bindings has been released with extraordinary little fanfare. See what's new (it's actually nothing very interesting :-). The test suite still hangs in MPICH, but they say that that's ok, 'cause neither they nor I can figure out why... Seems to be some kind of Heisenbug in MPICH itself (shrudder).


Took the red eye with Lummy last night. Got to Cincinnati at 6am. Got to South Bend around 9am. Came to the lab and have been here ever since.

I gave my talk on the generalized master/slave parallelism stuff at lunch. It seemed to go well, but I wish that I had had a blackboard or whiteboard to use. :-(

Had a code review with Arun w.r.t. LAM/gm. Arun seems to have some kind of medical condition in his thumbs that prevents him from hitting the spacebar -- for this, I forgive him for the enormous lack of white space in his code (making it squished together and hard to read -- but who am I to judge? Oh... wait. I'm his boss). We recompiled LAM and his test program with the Solaris compilers so that he can use bcheck to find some Random Badness (there's at least one write to unallocated in a simple MPI_INIT/MPI_FINALIZE program --
oops).

Spent the rest of the afternoon finishing up the MPI 2 C++ bindings so that it can be released so that Elliott can continue working on what Mike Shepherd started -- finishing the rest of the C++ bindings for the MPI-2 functions. So 1.0.6 has been released and I created a tag in CVS, so now I'll go commit all of Mike Shepherd's stuff. Woo hoo! (also have to re-import the C++ bindings to LAM/MPI... mmm... find stupid CVS manual for 3rd party imports... ggggrrrrrphhhh...)

Gonna go meet Lynzo and some other random bones for dinner after the pep rally. Go Irish, beat Aggies! (I have to admit, I'm not hopeful this year, but good ol' 87 Jabari Halloway is one of the captains -- if anyone can lead that team to victory, it is he).

September 3, 2000

I am serious. And stop calling me Shirley.

Notre Dame won yesterday vs. Texas A&M -- 24 to 10. It was out home opener, it was hotter than two dogs... er... lying around after a big run (it was apparently 116 on the field).

Quote of the Day from Arun when we briefly discussed the yesterday's game when I came in today. I made some remark how I got a little sunburn and showed him my ultra-cool watch band tan line (chicks dig it, just like chicks dig MPI). Arun replied, "It must suck to be genetically inferior that way."

Classic.

Saw many old friends this weekend -- had dinner with Schleggue (although he joined us late), Lynzo, Vern, Pam Tyner, and Rachel Canata at Macri's on Friday. We then went to Corby's and then Senior Bar. As is my moral obligation, I got Lynn nicely drunk on vodka tonics. Game day was fun; hung out with Ed and Suzanne a bit and then saw them later about 10-15 rows below us in the student section. We sat with Dog, Jeremy Siek, Katie, Mike Niemier, Brian Bussing and his fiance Dana Collins. A good time was had by all, and we all drank a lot of water ("it's not just water -- it's Notre Dame Water"). We saw our boy Jabari out on the field, and he looked good -- he made some good plays, had some good catches and blocks; he generally did us proud. After the game, I even saw him take a bit of a leadership role with the guys on the field, further confirming my previous journal entry that if this team is going anywhere this semester, Jabari is going to have a lot to do with it. For those for have never met Jabari, he's a great guy -- really nice, tries to study hard (I can't even imagine trying to get all my work done *and* have a hellish practice and travel schedule; it's hard to be an NCAA athlete at Notre Dame...), etc. Jabari rocks.

We went out to dinner last night at Outback and all of us had too much to eat (Mary and Pete Calizzi joined us, too). A good time was had by all. We saw Ruth Riley there with some of her family/friends/whatever, but we didn't bother her. We went back to Dog's place afterwards and watched the Matrix. Then everyone hit the road (no one was staying close). It was good to see them all again.

I'm here in the lab for an inilib code review (and I'm late, 'cause I'm typing this entry...) with Brian, so with a big shot out to all my homies out there, PEACE, OUT.
(BTW, we're listening to "The Moog Cookbook" here in the lab. Does life get any better than this?)

September 4, 2000

Wedding 2K

Spent the day continuing setting up my new machine; still haven't got X quite right because I can't get KDE working right. I'm working in plain old twm, and it's stifling. Ugh.

Did some more cleanup around the house (it's still a wreck from all the wedding presents), and finally watched our wedding video with Tracy (it actually came last week, but we were both traveling). There are some utter classic moments in there (funny how everyone else's wedding video is cheesy, but yours is fantastic...):

  • Renzo, while we're standing around before the ceremony: "You just give the signal, and we'll get you right outta here."

  • Fr. Hesburgh: "Jerry and Tracy..." (actually, I have to provide some context here -- Fr. Hesburgh was fantastic, and he recovered quite well from his little error)

  • Faller (off camera), "Hey Jeff -- seafood!" (the camera caught this whole scene quite well. Had to back it up and watch it a few times)

  • Dog: "We couldn't get that bastard Sepeta up here because he's hitting on their dates!" (pointing at Barker and Faller)

There was much Meghan in the video as well. It was funny, too, to notice that Patrick got just about all the face time in the ceremony, and Chris got just about all the face time during the reception.

Some other funny scenes as well; some classic dancing/reception footage. One that Tracy didn't even see right away (it's off to the side of the frame, and it happens very quickly) -- we had to back this up and watch it a few times. After the wedding party dance, I stole Diann away from Darrell, who is left standing on the dance floor, looking forlorn. Shipman notices this, runs over into Darrell's outstretched arms, and they start dancing. The look on Courtney's face and her resulting body language is absolutely classic. Renzo quickly steps in with Courtney, and the camera pans away. The whole thing takes about 3-4 seconds.

Gotta answer some LAM mail now, then go to sleep...

September 5, 2000

Miles of code before I sleep

I was updating my xmms RPMs today (for Mandrake), and noticed that they have an ogg vorbis xmms plugin RPM. I installed it and played some .ogg files with it. I was pleased to notice that my previous concern about the vorbis xmms plugin hogging the CPU while playing songs has been fixed (or they just compiled it better than I did); playing a 160+kbps ogg stream has the load hover around 0.05 (i.e., comparable to .mp3). Very nice; perhaps this vorbis stuff has promise!

Spent much of the day working on pending LAM issues:

  • Finally fixed the SCO user's problem. Turned out to be a race condition in the file descriptor passing code. Interesting that it never showed up on any other operating system; it may be a SCO-specific issue (the sender was sending three file descriptors and then closing the pipe; SCO apparently discards any unreceived messages when the sender closes, even if the receiver still has the pipe open). Who knows. Putting a simple sender-waits-for-an-ACK scheme fixed the problem. It's interesting to note how hard it was to find the problem, and how it was trivial to fix it once the exact problem had been determined. It was really hard to find the problem because my troubleshooting was limited to e-mail only; I do not have a SCO machine to test on, and the user's boss ixnay'ed the possibility of me getting a guest account to test with.

  • Found a real race condition in the LAM code to launch executables on remote nodes (at lamboot time, not at mpirun time). It is possible for output from remote nodes to be dropped before mpirun has a chance to see it if rsh exits too quickly. It's not immediately clear to me how to fix this problem... It seems to only have become evident with a few LAM users with the advent of faster processors and networks.

  • Fixed a minor issue with the --with-rsh logic in configure.in that a helpful user pointed out.

  • Added some much more user-friendly "there is no lamd running" messages (via the lam-helpfile) to all the LAM executables and to MPI_Init.

Released 6.3.3b32 with these changes. Pending issues:

  • The race condition with rsh.

  • The MPI 2 C++ still seem to be broken under some conditions (e.g., when using --without-fc). @#$%#@$%#@$%#@!!!!!

  • An IRIX user is complaining about some socket issue at mpirun time. I've pinged him to try the 6.3.3 beta, but I doubt that this will fix his problems. We'll have to see how this one pans out.


My 800mhz machine is fast (provided that it's only doing one thing at a time -- it is still an Intel box, after all...). Times expressed in min:sec:

  800mhz machine Ultra 30 (athos)
Run autoconf and friends for LAM/MPI: 0:07 0:23

Run configure for LAM/MPI: 0:32 1:22

Full build of LAM/MPI: 3:20 12:56

I did the build on athos, which is admittedly not the fastest machine (not only is it only 300mhz, it has limited memory; I should have used a hydra, I suppose, which would have been half the mhz of the intel machine and had a lot more RAM). But the build was about 4x faster (again, with the big caveat that the machine is doing little else at the time).

But these figures certainly do inspire me to do some development locally rather than remotely to nd.edu. Happiness all around!

September 6, 2000

T.P.R. Report / Initech

When I drove down here a few days ago, I noticed some water dripping behind my glove compartment. We didn't go out and have a good look at it until today. We picked up the floor mat (which was good and wet still), and it was soaked underneath with a healthy chunk of mildew growing on my bottom carpet.

Bonk.

I have an appointment on Friday morning to take the car in and have whatever it is that is broken fixed (I am a code wizard, not a car wizard).

Finally got my IO streams book from Amazon today -- I accidentally put the wrong apartment number on the "ship to" address, and UPS got really confused. I called yesterday and they re-shipped it again to the right address (no charge, whoo hoo!). Got the Office Space CD, too. Yummy (already ripped into MP3s, and I'm listening to them right now...).

ROMIO and MPICH released new versions today. Luckily, the new ROMIO is just about the same as the old one (configure/build-wise), so since I had the foresight to document what I did last time, I mainly followed the same steps and ROMIO seems to be integrated into LAM/MPI just fine.

\begin{bitch}

CVS third party importing sucks, for multiple reasons:

  • It does not record which files have disappeared or moved from release to release. That is, the initial import is fine. But when you import a new version over the old one, you would think that it would just snapshot the new one and keep the old one as just history. i.e., files that existed in the first version but do not exist in the second version should not show up upon checkouts. Not so.

    For example, in the MPI 2 C++ bindings, we moved a bunch of header files from one directory to another. I did the 3rd party import in CVS of the new version, and then updated my local copy of LAM. Suddenly I had 2 copies of all the header files -- one in the old location, and one in the new location. Other than cvs remove'ing each old .h file, I didn't see any way to correct the situation. So I just blew away the old 3rd party imports (well, actually, I just moved them... never delete!!), and imported the C++ bindings as if it was their first import.

  • If you third party import a distribution tarball that uses automake, plan to be hosed. It screws up all the timestamps such that it tries to invoke automake and friends when you ./configure/make it. And since it's a distribution tarball, you don't have things like acconfig.h, so autoheader will fail. And it goes downhill from there.

    The only solution that I found was to do a massive touch of all the files in the third party source directory tree such that every file in the tree has the same timestamp. Icky. Horrible. Shrudder. But it works.

    But we shouldn't need something like this -- I'm open to better solutions (perhaps just including the tarball itself...? Hmmmm...!)

\end{bitch}

Did a bunch of LAM work today, but I might have just found a new issue under Solaris. It seems that mpirun is hanging. Ugh!!! Was it something that I did in the extra synchronization that I added for SCO?

Miles to code before I sleep...

September 8, 2000

Hey Pac Man, what's up?

I noticed today that Mandrake is shipping Netscape 4.75 with full 128 bit encryption through its normal "update" channels.

Pretty cool -- you no longer have to go through hoops and hurdles to get a fully-secured (hah!) Netscape with all the kewl plugins and whatnot (i.e., in RPM form).

September 10, 2000

Singing backup chicken

Ahh... the Nebraska game.

It was an amazing game. I really did not expect that ND would play so well -- we were ranked 23/25 (according to what poll you looked at), yet we stayed head to head to Nebraska (#1) for nearly the whole game. Our offense was a little off, but then again, Nebraska has a great offense. We had 2 amazing runbacks (one from a punt, the other from a kickoff) for 14 of our 21 points. At the end of regulation play, we were tied at 21.

We lost in overtime; we got a field goal, they got a touchdown (sadly, overtime has never been good to us). So we lost by 3 points. But it's a helluva lot better than the spread -- 13.5 points. It was a fantastic game. Tracy, Jim, Anna, and I were watching it in a local Damon's (sports bar). When Nebraska finally won, a few Nebraska fans started cheering loudly. I turned to them and said, "You just beat #25." That shut them up immediately.

So even though we lost, I can only picture it as a win. They won by a fluke (and a really, really fast quarterback); it really could have gone either way (and yes I would have been saying that if we had won, too). And then didn't play down to us, we played right on par with what the news media calls the #1 team in the nation.

We must go up in the polls for this (it doesn't look like they've been updated yet). It would be nice to see Nebraska go down, but I don't know if that will happen (FSU, #2, barely won against Georgia Tech -- who isn't even ranked -- yesterday; it looks like Georgia Tech had a pretty amazing game as well). Michigan (#3) had a pretty convincing win over Rice, so maybe...? Who knows. I've become convinced over the years that the two sports polls are based on a random function, anyway.


On a lighter side, Tracy, Jim, Anna and I went to a restaurant (can't remember the name...) after seeing "The Cell" (which I give about 2 minutes; it was... ok, but not good or great). We caught the tail end of the University of Louisville vs. Grambling football game. I've never seen comedy in football before, but this was definitely it. UL won the game 52 to nothing, and the score said it all. The Grambling players really looked like they were trying hard, but their attempts were just comical. I can only imagine that they don't get a lot of funding, or perhaps their coaching and practices are terrible, or... I have no idea. But it was the funniest thing that I've seen in quite a while. UL just stomped all over them (and I'm not even a UL fan!).

September 11, 2000

Do elephants sweat?

Wooooo-eeeeee.... the paper is up to 25 pages now (and I haven't written the majority of section 7 yet!). I spent the entire day revising it. Properly designing a software system is a lot of work. But (like I've said countless times before), it's cool stuff. There are some really delicious issues and problems that would never expect from a plain ol' manager/worker problem. I think I've got one more major revision before I unleash it to some others to read.

Had some more interaction with the guys who are having rsh/lamboot issues. Seems like rsh is not the problem after all. It may be faulty handling of stderr/stdout processing. The guy was running some simulations on his cluster; he said that he would try some new code of mine when that finished. We'll see if we can finally solve this problem.


Blockbuster sucks.

They sent a threatening letter to me at my parent's house in Philadelphia claiming that I had not returned the Fight Club DVD to the Berkeley Blockbuster store for almost two weeks. The happened to mention that the matter had already been turned over to a collection agency. Great.

I checked with Lummy and he definitely remembers returning it (we rented 3 DVDs; Fight Club had to be back in 1.5 days, the others were 5 day rentals). We returned Fight Club before it was due, and watched the other 2 later. I was with Lummy to on one of the "return to Blockbuster" trips, but not the other, and I couldn't remember which was which.

Anyway, I called the Berkeley store and told them that I was absolutely positive that I had returned the DVD. They guy looked it up in the computer and said, "Oh yeah... we found it on the shelf later." Over 2 weeks later, apparently!!

So I was about to be fined and have a big bad black spot put on my credit record because of some clerk kid's stupid mistake in Berkeley. Blockbuster was about to fine me without even checking with me (the letter claimed that Blockbuster tried to call and snail mail me, but I never got any messages or snail mail). What the heck is that all about? And then they send the final notice to somewhere that I haven't lived for well over a decade.

The whole thing kinda pisses me off. I don't know how excited I'll be to go back to a Blockbuster.


Ahhh.... screw it. Miles to code before I sleep.

Mmmmm... code..... mmm.....

September 12, 2000

The cockpit? What is it?

I received 2 packages today -- how exciting! God, Internet shopping is great.

  • The first package was from Amazon, and it contained all the CD's that I ordered (I finally have all the CD's for the MP3s that I own -- some of which I have been looking for for quite some time. See yesterday's journal entry about the word "soundtrack" in internet music search engines... grr...): MI-2 soundtrack, Chemical Brothers/Surrender, Groove soundtrack, Go soundtrack, Fight Club soundtrack, Various Artists for the Masses. The ones that weren't already ripped are finishing MP3 encoding right now...

    I'm listening to the Groove soundtrack. Sound like hip stuff. Nothing earth shattering so far, but it's good coding background music. It's really heavy on the bass (even on my mondo sub-woofer's minimum setting!), so I can't turn it up very much because I live on the second floor of an apartment building. Since I like to have semi-loud music on while I'm coding/working, does this justify my saying "I need a house to support my coding style"?

  • The book Advanced Programming in the Unix Environment by W. Richard Stevens. It came highly recommended by fellow Llama Nick. This book has everything -- would that I had known of its existence before! It could have saved me much exploration and experimentation with pseudo-ttys, various IPC mechanisms, passing file descriptors, random issues with SIGABRT, and other insundry bits of Unix system-level things. <sigh> I was glad to see that I had gotten 5 of the 6 guidelines for daemon processes in Minime, though (I didn't set minimed's umask to 0 -- oops. I was very careful about every file that it opened, but setting the umask would be better).

    I can't remember where I ordered this book from; I found it on www.bestbookbuys.com. I highly recommend this URL for anyone who is buying books off the web --
    it saved me somewhere between $10-20 on this book.

Speaking of handy URLs, someone pointed out http://www.amazing-bargains.com/ to me the other day, particularly their their section about buy.com. They always list some good deals for buy.com, like coupons for "$10 off any order of $50 or more" and whatnot. I wish that I had known about that a month or two ago -- I bought a PCMCIA network card from them. Ah well -- next time.

Still working on the paper. The text portion of the first half of the paper still heavily reflected that I originally wrote this as a list of bullets, and is requiring much re-writing. The second half was mostly ok 'cause I had already re-written much of it. :-)

Happy, happy, joy, joy...

I think we finally fixed the race condition in booting LAM. Many thanks to some helpful LAM users and their patience for helping slog through this obscure issue. We've got a few more tests to run to ensure that it's done, and I sent the new code out to the Debian user who initially reported the bug, but I think that I finally understand what the problem was, and how I fixed it.

I found a new <blockquote> attribute the other day -- type=cite -- that looks really cool in netscape (be sure to check this journal entry out on the web). Doesn't appear to do much over normal <blockquote> in lynx. I wonder what it will do in pine.

Blockquote type eek cite
How I love thee, let me count
the ways, 1, 2, 3...

Rich Murphy points out that if I had listened to his wisdom, I would have owned Advanced Programming in the Unix Environment long ago. Oops.

Rich Murphy, wise man
Woe is he who ignores Right
Yea, a life of pain

I like strawberries.

Strawberry red car
My sugar momma wants one
Hell, she can buy one

The new Chemical Brothers CD that I bought, Surrender, simply rocks. I highly recommend it to others.

Chemical Brothers
Their rocktitude humbles me
True block rockin' beats

Lights! Camera! Act... shit. Call makeup.

I really can't type. I'd like to correct some typos in the last journal entry (the humor value was probably lost because of the mistakes. Sigh)...

Rich Murphy, wise man
Woe is he who ignores Rich
Yea, a life of pain

This was a minor mistake, but the english major in me cringed when I saw it (what does that say that an english degree provides an inner sense of Badness about a Japanese form of poetry?):

Strawberry red car
My sugar momma likes them
Hell, she can buy one

September 14, 2000

Mysteries of the milkshake

Exciting changes today...

Darrell called me with the joyous news that PacBell finally hooked up his DSL today (it only took 2 months. The most comical part of the saga was, after 1.5 months, after 2 house calls from PacBell technicians, Darrell got a call saying, "We finally figured out what the problem with your DSL line is. Your local Bell office doesn't support DSL.")

I spent about an hour or two with Darrell setting up our DNS servers. Darrell already had experience with this, so most of the pain and learning curve was avoided. Seemed pretty straightforward afterwards, but took a little understanding to get there. So Darrell and I are now secondary DNS servers for each other (kresge.com and squyres.com). We did some testing and it all seems to be working. Pretty cool stuff.

Darrell's with NSI (the evil empire), and he submitted his DNS change to them earlier today. They supposedly updated at 5pm EST, but as of now (12:23am EST the following day), nd.edu machines still don't see the change.

I'm with register.com, and it took a little explaining to them exactly what I wanted to do (had to do it on the phone). Turned out that it was their silly web interface that confused me, and we submitted my DNS change as well. They supposedly update tomorrow morning. Indeed, nd.edu machines don't see the change yet, but when I'm on my machines, "whois squyres.com" shows all the new stuff. Cool!

I've already added a few names to squyres.com --
introducing the new, improved JeffJournal! When the DNS change propagates out to the world, the JeffJournal archives will be located at the following URL:

http://jeff.squyres.com/journal/

If that isn't vain, I don't know what is. But hey, I only do it... because I can.
Had to do some screwing with my apache settings to get the virtual hosting stuff working with www.squyres.com, wedding.squyres.com, and www.fhffl.com. Learned some things about how to get Apache really confused today. Could be useful someday.

Arf -- just got a bounced message from nd.edu from an automated message on wedding.squyres.com. It seems that I had router.squyres.com as the first entry for that machine in /etc/hosts, which doesn't exist in DNS. Oops. Fixed.


In other fronts, I was continued to be distracted by getting motivated to figure out what the numbered ports that showed up in netstat -a were on my 2 machines. Turns out that most of them had to do with NFS (which I used between my router and my desktop so that I can server my MP3s from the big disk on my router to the xmms on my desktop).

I got further inspired to ditch NFS because I thought of a truly cool way to serve up my MP3s without NFS -- using http and the streaming capabilities of xmms (I already have a web server running, so...). I wrote up a minimalistic PHP script that allows me to navigate the directories and files in my MP3 directory tree. Clicking any of them invokes a PHP thingy to generate an .m3u MP3 playlist file on the fly, and send it to xmms. With the directory-browsing aspects of the scripty-foo, I can queue up multiple levels of MP3s:

  • My entire MP3 tree (and click the "random" button on xmms for [probably] weeks of no-repeat play!
  • The directory for any artist (which contains all their CDs)
  • The directory for any CD
  • An individual file

Actually, I could have just said "I can enqueue any tree of files, to include the special case of a tree of one file."

It was surprisingly easy. It's truly cool. I may someday be inspired to make it a bit more aesthetic and have more options... but why?

xmms stops just short of offering a full set of remote controls from the command line (I had to add an appropriate application handler for .m3u in netscape to call xmms), but I guess it's sufficient.


Ok, back to work now... the paper is really almost finished. I was halfway through the last code review when Darrell called me today...

(BTW, the jeffjournal client is fantastic -- it just informed me that I left a <CENTER> unclosed from line 37 before I mistakenly submitted it, causing all kinds of formatting madness, and potentially threatening the world's existence. We are pleased.)

September 15, 2000

How to Succeed In Coding Without Really Trying

nd.edu finally joins the rest of the masses in recognizing my new DNS server. Welcome to the new and improved JeffJournal! For all of you out there who bookmarked the JeffJournal in your web browser, it has now moved:

http://jeff.squyres.com/journal/

And remember... I only do it because I can.


Had to re-rip some CD's 'cause their MP3s seemed to be a bit skewed. Sometimes they cut off right in the middle of a song or something like that. I attribute this to when I was ripping CDs on my laptop, which has limited disk space. Turns out that when grip runs out of disk space, it just merrily stops the current song and goes on to the next with no indication of warning. Hence, I believe that some percentage of my MP3s are flawed, so I think I'll have to re-rip some of them over the next few months.

I finally finished a first copy of the manager/worker paper yesterday. There really are some delicious complications in the whole aspect of Things that make it fun. I even wrote the whole paper without writing a single line of code -- it's 100% pure design. There's a good chance that I'll use that paper as a guideline to write a parallel vorbis encoder. Gotta practice what I preach, after all. And it can only make the paper better.

I missed an MPI talk at ND yesterday. Bonk. It sounded like it would have been interesting. :-(

Tracy and I won't be going up to the Purdue game this weekend; her travel schedule was too much this week. Oh well. :-\ Hopefully, the boys will rally with the loss of Arnaz and Irons and the Irish will still prevail.

I'm noticing that my bandwidth between my desktop and my router is really crappy -- I'm just copying over the MP3s that I ripped on my desktop and only getting anywhere between 47 and 69 kB/s. Ick. I see the collision light coming on on my hub a lot; seems like this may be causing too much binary backoff. Might be time to invest $50 in a switch...

Spent some time on LAM yesterday. I noticed an annoying security issue yesterday, and spent some time hacking around in the lamd and the rest of the user-level LAM libraries ensuring that all internal files that LAM uses are opened with "other" and "group" permissions zeroed out. And then it turns out that Solaris doesn't like to abide by the umask when it opens named sockets. Ugh. So I had to go the ssh route and move all the LAM sockets and temporary internal files into their own directory (which does abide by the umask) to guarantee security. Ugh.

That's all for now; more news from Washington as our reporters check in.

September 16, 2000

Do you Yahoo?

A good day. We beat Purdue with a last second field goal to make the score 23-21 in favor of the Good Guys. We watched the game at the local BW-3's, and met some subway alums there. I guess I haven't really watched too many games away from South Bend (where most everyone is an ND fan), and I haven't really met/talked to too many subway alums. They're interesting folk -- no ties to ND, but are completely rabid about ND and its football program. The people that we met were really nice and we had a good time with them. I'm sure that we'll see those folks again, as well as other subway alums here in Louisville (the NBC affiliate down here broadcasts SEC games, not ND games, hence we have to go to sports bars to see the game).

There were some Purdue folks in the bar, too, and they were dumbfounded when the field goal actually went in (to be fair, we were too :-). By the numbers, we probably should have lost that game -- I don't know for a fact, but I'd be willing to bet that Purdue has us beat in just about every stat. Our guys played well, but we lost two key players (QB on offense, and ?DB&? on defense), so both squads were critically short. The new QB stepped up pretty well, but it was his first college game and he made a few mistakes. Still, he did pretty well and I certainly don't fault him for anything. At the end of the day, he delivered, and we won the game. He's got lots of time to improve, and I'm certainly pleased with what he did today. Good job, Greg. Looks like the students were pretty pleased at the end of the game; they were all over the field in and around the players. Rock on.

So we'll see what happens in the polls tomorrow. Purdue was 11 or 12 or something, and we were 21 or something, and I think we'll both be 2-1. We'll see.

We went to dinner with Janna (Jim+Anna) again, which was fun. New microbrew here in town. Not bad beer, but a little too sweet for me. Good conversation, and much fun was had. Janna has a satellite dish, and next week's game is on PPV, so we'll be heading over to their place to watch it. Hmm... actually, checking the network schedules, it looks like it's on ABC. That would make it a bit more convenient...


I finished my paper other day (I think that I mentioned this in as journal entry previously), and posted it to the vorbis-dev list yesterday, too, just for the heck of it. Finally got a response from someone today who said that it was good stuff. Good to hear, but they didn't have any ideas, suggestions, comments. Oh well.


Since my computer has been idle most of the day, I started running the distributed.net stuff. It appears that they're focusing on the OGR project. I don't really know what it is, but it appears that most of the keyspace has already been exhausted from the stats graph. It's really slow. Since I started the client last night around 11:30pm on my 800mhz machine, it's only done about 4.3 OGR packets. Wow.


I haven't been running bind for 72 hours yet, and they just released a new version. Apparenly bind 9.0.0 has been released. I'm a lazy bastard -- I'll wait for the Mandrake RPM. :-)

September 17, 2000

Your spleen and you: do you have a good relationship?

Not much to report today. Spent a little time upgrading my PGP tie-ins to pine, so that it actually does things correctly (been meaning to do this for quite a while, actually). It will decrypt multi-part messages, messages that are signed, or messages that have additional content besides just encryption. Happiness.

Did some more organizing of my finances and finally got my credit card statement to balance with what is on my bank's web page. Woo hoo!

Signed up for a better AT&T plan today. The service is exactly the same, it's just an arbitrarily complicated pricing scheme to make plans seem different. It's amusing, though, 4 of AT&T's big plans (and don't consider these descriptions legally binding -- go to AT&T's web site for full descriptions) are:

  • $0.10/minute, any time of day. This is apparently what plan we were on.

  • $0.05/minute from 7pm-7am, $0.09/minute from 7am-7pm ($5/month minimum).

  • $0.05/minute, any time of day, with a $7.95/month additional charge.

  • $0.07/minute, any time of day, with a $4.95/month additional charge.

The interesting thing is that AT&T marketing makes it sound like they have actually calculated the mathematical derivative for each plan. For example, and says, "You should use this plan if you are spending over $x.xx a month, or if you are spending over
$y.yy, you should use this plan..."

But here's the kicker (as I'm sure all good, thinking people out there noticed): spending $x.xx on which plan?!? I hate marketing dweebs. Do people actually fall for this stuff?

Anyway, we did the math (i.e., compared the plans over our last 3 phone bills), and signed up for the $0.05/minute any time plan. Indeed, 2 months ago, this would have saved close to $40 on our bill. Yikes! (Granted, there were some pretty long wedding planning phone calls, but still...)

September 18, 2000

The Art of Barbering

When was the last time you were in a barber shop?

I just got a haircut today in a local Louisville barber shop. I have a long-standing theory that you can tell a lot about a town from their barber shops.

Barber shops are a mostly male-oriented club. True, you'll see mothers in barber shops to bring their sons in for haircuts, and you'll even see the not-too-uncommon female barber (indeed, the barber shop where I went in South Bend had one male barber -- the owner -- and two female barbers). I guess it would be more correct to say that the clientele is almost entirely male.

Humorous anecdote: I went to my typical barber in South Bend a few days before my wedding to get a trim. The woman asked me if I wanted my normal military high-and-tight cut. I told her no, I was getting married in a few days and my bride-to-be told me that she wanted "some hair on my head lest flashbulbs reflect off my head and ruin all the pictures." An older guy was getting is haircut down the row from me. In a low, grisly voice, he said, "You're getting married? Come over here, boy, we gotta talk."

The conversations that flow around barber shops tends to reflect the popular attitudes of the area. Here in Kentucky, I hear about tobacco crops (they actually have pro-tobacco ads on TV here), the military, and University of Louisville and University of Kentucky football.

In Frank's barber shop on the campus of Notre Dame, it is filled with ND memorabilia. Frank loves to hear about student perceptions on campus, football, the band, ROTC, or any other ND-related or military-related topic (he was in the military himself, in younger years).

At the Ft. Knox barber shops, the talk is actually fairly sparse. There's some chatter, but mostly people are there because they have to be there (regulation haircuts and all); it's part of the job. But there are some retired folk who sill come on base for haircuts and the gossip with the barbers and soldiers.

The barber shops that I used to visit back outside of Philadelphia are much the same. Typically somewhat 40-60 year old male barbers who have the look and feel of someone who has seen and done everything. The ability to strike up a conversation about any random topic. Sports are common, the military is another. Politics, of course (especially with this being an election year), is a big topic as well.

My conclusion is that the barber shop is a social island in the midst of hustling and bustling metropolises. The pace tends to be a little bit slower there than the rest of life. Granted, South Bend and Louisville aren't huge cities, and neither are the suburbs outside Philadelphia where I would get my hair cut. There's typically some kind of talk going on about something, and -- especially in a small barber shop -- the barber knows many of the patrons by name and how they usually want their hair cut.

Indeed, I've asked most of my barbers why they chose to cut hair for a living. Most of them laugh and make some kind of remark about how the never-ending demand (how often have you ever walked in to a barber shop and been seated immediately?), but then they have all said that it's for the people. Many had careers before becoming barbers, but left them for one reason or another and became barbers because of the wide variety of people that they would meet. Hence, they're using barbering as a vehicle -- it's not for love of cutting hair, for example -- to see a sample of the world that we live in. The local barber probably has a pretty good feel for the community around him/her -- probably more so than most. Indeed, the Art of Barbering (as I call it) seems to have little to do with cutting hair. It seems to be very similar to salesmanship, or bartending. Some people are good at it -- naturally easy to talk to, good listeners (yet still expressing their opinions in order to keep the conversation going), etc., etc.

This is hardly a startling conclusion by any stretch of the imagination. But I sometimes wonder what a long-term study of barber shops, their clientele, and the conversations that occur there would show. Who knows -- it might even be worth some kind of degree in Sociology or something. :-) But the barber shop is something that many of us take for granted and rarely notice. It's just something that you have to do once a month or so.


There was no point to any of the above. I'm just pointing out something that most of us take for granted, and that we rarely notice. No real reason.

...but at the same time, has anyone else ever noticed this?

The Art of Barbering Too

Followups for the Art of Barbering. Any other comments are welcome:


From Rich:

Absolutely true! In San Diego (I believe America's 6th largest city), the barber shops are remarkably similar to South Bend, or anywhere else I've been. (Ask Jason about Vitos... the cops... etc.)

There's just something about going to a place where they do your side burns and the back of your neck with host shaving cream and a straight edged razor. (To me, there's something particularly Arun-esque about this line of conversation.)


From Arun:

Interesting comments, I hadn't really thought about it, but thinking back it must be quite interesting. I imagine the barber shops/beauty salons of Las Vegas Hotels must be especially interesting. I got my hair cut at one and in the short time I was there there were 3 wedding parties passing through in one stage or another.

This raises an interesting point -- are there [at least] two fundamental kinds of barbers? Those who have a handle on the local community and those whose community is mainly composed of transients (e.g., tourists)? And of the second type (I have to admit, I don't think that I've met any of those type):

  • Why did they get into barbering? The same reasons?

  • What do they yield from the Art of Barbering? It certainly isn't a feel for the local community -- there isn't one. What do they get a feel for? What are the conversations in their shops like?

And in this case, I suppose the Art of Barbering can be abstracted to a higher level, such as those who primarily interact with tourists (but then again, Vegas is truly unique!). For example, what are the differences between clientele of the T.G.I. Friday's in South Bend vs. the clientele of the T.G.I. Friday's in Vegas?


Again, this has no point. Just idle wonderings of someone waiting for X latency between squyres.com and nd.edu...

September 19, 2000

Goulash or spackle: you decide

My car looks fantastic!
I had to take it in to be detailed to get rid of the mildew smell from when my AC self-imploded (read: the output valves got clogged and all the water ran off into my front passenger footroom. Eeewww!!). I took the car in this morning, and when I went to pick it up, I was amazed: the car looks 5 years younger. They vacuumed and shampooed everything, and used the make-the-plastic-look-new stuff. The buffed and shined, and gave my car a complete exterior car wash. It looks amazing.

I could see people that I drove by gaping at my car, then touching their nose, pointing to my car and saying to their neighbor, "You see? That's what a 1993 Honda Civic is supposed to look like."


Spent too much time on LAM/MPI today. But I resolved some important bugs:

  • We finally got confirmation that we fixed the lamboot race condition. Hurray for the good guys!

  • I found a bug in the lamd today such that any new process that it forked (e.g., via mpirun) would inherit all the file descriptors of the named unix socket client connections that the lamd had open. Oops. The spawn code now closes everything except stdin, stdout, and stderr (which it replaces with whatever mpirun/lamexec gives it, anyway).

  • I made the show_help() function a bit more robust in that it will try harder (and smarter) to find the helpfile. It will even display a specific error if it finds the helpfile but can't open it (e.g., if the process is out of file descriptors). Indeed, we now save errno properly so that when we use the %perror or %errno tokens in the LAM helpfile, it will display the correct errno, not just the last one.

We still may be having issues with really large numbers of nodes, though. Theoretically, we should be able to go up to 1024 -
(stdin, stdout, stderr, and a socket to the local lamd) ranks since that's how large the type fd_set that is used with select(2) can handle, but we seem to be falling way short of that for some reason. There's a user in Germany who is trying to use LAM with 528 nodes (he was thrilled when I gave him a copy of the 6.3.3 beta with lamhalt in it -- he says that a lamboot can take up to 10 minutes!). I am still investigating this.

An engineer from GE Aircraft Engines mailed me today, concerned about the [accidental] inclusion of the GNU license in LAM 6.3.2, because they want to use LAM internally. I told him that all was well -- its inclusion was accidental and I would never cut off my shuga-momma's company like that.


Other random acts of goodness:

  • Hooked Janna up with John's extra ND/Stanford tickets.

  • Saw a neat article today (from dad) about how Scott Malpass has really, really grown the ND endowment since he started managing it. Did you know that ND was one of the initial investors of Yahoo!?
  • Got into an interesting discussion with Arun and Rich yesterday about barber shops when Rich said something about "Arun-esque". This triggered a long forgotten memory about the word "Arunesque", which I shared with them. Long story short: "Arunesque" means "to celebrate", or "to perform a ceremony for".

  • Since they don't seem to broadcast News Radio down here, I have had to replace it with something else. The Drew Carey show seems to do nicely. I've always liked Drew Carey, and his shows are pretty funny. I highly recommend them to anyone who hasn't seen them -- I'd rate most of them at 17.5 minutes.

  • I took the most recent copy of LAM's inetexec.c (the code that uses rsh to spawn things on remote machines), C++-ized it, and started working on it to do tree-based boots, and to allow nodes to fail during the boot. I stole a bunch of minime code to do this as well -- the result will get merged back into minime before it gets merged back into LAM -- because I wanted to do it in a small system first. Minime isn't large, but it sure isn't small (12,000+ lines of C++ code).

  • Tracy's music group at church had a little "congrats" reception for us last night. Free food and wine, plus they gave us a bunch of gift certificates. I love all the free stuff that you receive when you get married; I should do it more often. No, wait...

Miles to code before I sleep...

(I've pointed this out before, but I just love jjc. It pointed out 3 places where I didn't close my HTML tags properly,
and let me go back and edit it before I submitted. With all the
<code> tags that I used in this entry [which pine
does not show, sadly -- href="http://jeff.squyres.com/journal/">see the web page], I
accidentally repeated <code> instead of the proper
closing tag a few times. Happy, happy, joy, joy...)

September 21, 2000

El Blockbuster sucketh

The saga continues.

Blockbuster sucks.

How much do they suck? Let me count the ways...

My dad mailed me today that I got a nasty letter from a collection agency demanding the return of the Fight Club DVD to the Berkeley Blockbuster. This is after I got a threatening letter from Blockbuster a while ago saying "return the Fight Club DVD or else". I had already called them and got it straightened out (I did return it on time -- they lost it... and later found it). See previous journal entries for the story so far.

So anyway, this collection agency is threatening to screw with my credit for some mistake that I had nothing to do with. I had to call the Berkeley Blockbuster store again to figure out what was wrong. The manager pulled up my account and said, "I see we cleared you on the Fight Club problem, but I see a late charge on Hot Boys..."

WHAT?!?!

I've never even heard of such a movie, nor does it sound like I would want to see it. Ever. I conveyed this to the manager and he sounded very skeptical.

"Did you report your card as lost?" he asked me.

"No -- I have it right here in my wallet".

Puzzled silence from California.

"Oh wait... I'm looking at someone else's account; they rented Fight Club as well. How do you spell your name again?"

<sigh>

So he finally pulls up my account. "Oops... looks like we marked you as credited here in the store, but no one notified the collection agency..."

Yeah, no kidding.

Thanks Blockbuster. You suck. Let's hope you get it right this time.

September 24, 2000

Internet, internot

Bummer. We lost to Michigan State yesterday, and in the last few minutes of the game, too. Bonk. So much for the season...

We watch the game at Janna's house, and had a good time with them. We stayed for dinner. I hooked Jim up with a new version of WinAmp afterwards, and I have a bunch of his and Anna's CD's to rip this week.

Many errands to do today -- clean the apartment, thank you notes (no, really!), etc.

September 27, 2000

I am pepperoni

Heisenlocks are hard to fix (where "Heisenlock" == "a deadlock where you can't know the deadlock and it's location at the same time", a la Heisenbugs). Particularly the ones that seems to move around.

How do you know when you have fixed it? You stop getting deadlocks. But if it only locked periodically to begin with (as is the nature of Heisenlocks), how do you know that you just haven't tested enough to run into a deadlock?

I pose this question because a) it's happening to me today, and b) it happened to me with PIPT. After months of testing, the PIPT decided to lock up right in front of our sponsors. After I finally figured out the problem (several days later, mind you), I noticed that I hadn't changed the problematic code in a long time. That is, the bug had survived for months without causing deadlock. But then it suddenly did. <sigh>

It' rare to encounter Heisenlocks, understand the whole picture, and say "Aaaahhhh.... yes, this is exactly the problem that I am looking for." Indeed, the code is typically so complex and the race condition so thorny that it is difficult to get the overall picture until after the fact.

Hence, we have one of Jeff's laws of multithreaded programming:

Easy race conditions are typically obvious to find. Heisenlocks tend to be caused by extremely subtle race conditions that usually "could never happen" because of x, y, and z, where one or more of x, y, or z (or, more likely, some previously unconsidered "tautology" w) is proven to be false -- typically after multiple days of hacking, around 3am amidst much wailing, gnashing of teeth, and caffeine.

I certainly do not believe in changing random things until something seems to work as a whole solution. Sometimes I am reduced to this behavior (e.g., when I run out of ideas), but I always work to pin down the exact reason for success/failure after I find something that "seems to work". It is crucial to understand why it works, lest you fix only a symptom of a problem, not the real problem. This is the only way to be sure to fix a problem rather than guess that it is fixed because it "seems" to be fixed.

Heisenlock quandary
How can this be happening?
Effect without cause

October 1, 2000

I have failed

I have failed.

I noticed that one of my students -- we'll call him "Fred" to protect the guilty -- had the following process running yesterday on one of the LSC machines:

fred pts/17 Tue 7pm 3:09 telnet rodrigues-8a.student.nd.edu

I am greatly saddened; all the Righteous have long since struck "telnet" from their working vocabulary, and save it only for debugging of ASCII protocols such as SMTP and HTTP, and use some form of encryption for normal remote access (e.g., ssh).

Alas, Fred, where did I go wrong? How did I not stress the importance of security? I feel like a parent who has just found out that their child has been a habitual drug user for multiple years.

Oh yea, the way of telnet is easy -- it is fast, universal, and yea, it may be ingrained in typing habits. But the path of the Righteousness is never easy. Installation of ssh takes time (but is not difficult), and requires remembering to type "ssh" instead of "telnet" (half as many characters, I might add).

And so spoketh the great System Administrator in the Sky:

...He who uses telnet for personal use shall be damned in the fires of script kiddies. His boxen shall become IRC bots, and be owned by demons half his age. He shall be scoffed by his new owners as yet another useless academic. His boxen shall become slow and bogged down with new traffic, and there will be great wailing and gnashing of teeth. None shall hear his screams (for the Righteous do not look at unencrypted traffic).

Fred (you know who you are): you need help. If you don't get help from NDLUG, please, get help somewhere.

October 3, 2000

Mangos and Margins

After the whole hydra time sink, got some good things done today...

  • Officially re-opened the hydra for business today.

  • Someone noticed a minor error with parallel bladeenc last week, and I finally got around to checking it out (in between compiles of real work today). Turns out he found a bona-fide bug in the shutdown routines -- it only showed up under MPICH because LAM rocks (i.e., if you do a singleton init with MPICH, you get MPI_COMM_WORLD == MPI_COMM_NULL, which is icky). I noticed that I had a few unreleased things in parallel bladeenc, but I didn't release them -- I just edited a 0.92.1b4 tarball with the fix, and called is 0.92.1b5. Freshmeat announcement in in their queue. Maybe someday I'll test and release the unreleased stuff that I have in CVS, but not right now...

  • I hooked John up with SSL/IMAP on www.squyres.com (a.k.a. shipman.ws -- my first non-.com hosting!). I also hooked him up with authenticated and SSL-encrypted SMTP access -- pretty cool stuff. So he can relay through www.squyres.com to his heart's content, because he's fully authenticated using SASL, and all of his traffic (not just his IMAP traffic) is SSL-encrypted. Gotta figure out how to make pine do that (encrypt and SASL-ize SMTP traffic); he's using Outlook Express.

  • I hit the RedHat guys up for some free stuff for SC'2000. I hope it's not too late to get stuff from them...

  • Talked to Regina today, more about buying a house. She had some good advice.

  • Called and volunteered at my church. I'm such a great guy. ;-)

Turns out that I'll be leaving for ND Thursday morning and staying there for about 1.5 weeks. The Stanford game is this weekend, and then I'll be staying on to meet Rusty when he comes to campus next week, and for various meetings, etc., etc. Larry Augustine is coming to ND this Thursday, and I might get to meet him. Should be fun and interesting.


Had a pleasant experience with headsetzone.com today. I ordered a new telephone headset the other day (once you start using headsets, you'll never go back. They're geeky looking, but, man, they're fantastic! The telemarketer-grade ones are truly awesome [which is what I have]) since my current headset is getting fritzy. They called me today about my order because I ordered an AC adapter, not realizing that the amp already comes with an adapter. So they kindly whacked the extra adapter from my order before sending it on its merry way.

I think that .com's are starting to realize that service is very important -- you can't just put a bunch of products up on an https and expect people to buy.


Random question: what happens when you put version control meta directories under version control? Apparently, that's what one former LSC student tried to find out. I ran across this directory today by accident (line broken up for web/browser display purposes, and name changed to protect the guilty):

~lsc/ccse/lums/Archives/Students/STUDENT/xmpibackup/RCS/RCS/RCS/RCS/\
RCS/RCS/RCS/RCS/RCS/RCS/RCS

Do you think that God uses CVS? If so, what version are we? Are we a branch, or the main trunk? Can you imagine meeting a later version of yourself? Just think of all the new, cool features that you'd have!

A: "Ah yes, this is Jeffv1.7. The current version, Jeffv13.2 is much more advanced -- it has additional pincher claws, direct audio/visual/pseudo-senseing input feeds, extra-sensory perception (v7.2), electro-skeletal implants for strength and flexibility, web slingers (not spider-man like, these are the real thing), he's on the Space Football team as first string quarterback, etc., etc. Oh, and it can code like nobody's business."

B: "But Jeffv1.7 can already code 'like nobody's business!'"

A: "Yeah, but this is better."

B: "How much better?"

A: "11.5 better."

B: "Ah, so he goes to 11 then, does he?"

October 5, 2000

Blueberry pineapples

Candles from Pier 1 seem to burn poorly. I will not buy any from there in the future. But then again, perhaps it is Louisville's great altitude above sea level...

Got all the nmap stuff working in my threaded booter. Cool stuff!

Tried to import boost into my project today so that I could start using GGCL and a cool progress meter class that they have, but I was sadly disappointed in the usability aspects of it. For one thing, it extracted itself in ".", not in a separate subdirectory. Then there is no README or INSTALL files, no Makefiles, no configure, no nothing. Just a bunch of files and you're left to figure out how and what to use. Disappointing.

I started a rant about this on the boost list, and one guy is being somewhat silly. I decided to wait a few hours before responding again just so that I don't really start slamming him; I am new on the list, after all.

I watched the Voyager season premier tonight. Good stuff. Left some hooks for later in the season, too. Could be very interesting --
this is the last season, after all.

Brian reminded me that I totally forgot to put the XMPI hooks into LAM. Doh. So I spent an hour or two on that tonight. Adding a single function in LAM requires many things:

  • A new file in share/mpi with the body of the function

  • Modify share/mpi/Makefile.am to add the new file

  • A new fortran binding for the function in its own file in share/mpi/f77

  • Modify share/mpi/f77/Makefile.am to add the new file

  • If adding profiling versions of the function, add entries in share/pmpi/Makefile.am and share/pmpi/f77/Makefile.am

  • Add a new "block" type (essentially an enum for that function) in share/include/blktype.h; shift the hiwater block type up to accommodate the new function

  • Add a new string for that enum in share/etc/blktype.c

  • Add the appropriate prototype in share/include/mpi.h

  • Add the four name #defines in share/include/MPISYS.F (eight if doing profiling versions of the functions)

  • Write a man page for the function in its file in share/mpi

It's off to South Bend in the early AM tomorrow. Miles to drive after I sleep...

October 7, 2000

Caffene-free Microsoft

Didn't get a lot done research-wise today, but it still seemed like a good day.

I made some progress in LAM; cleaned up a little code, made a fix that a helpful LAM user suggested, etc. We currently don't have a hope of compiling LAM with a C++ compiler -- it was originally written with pre-ANSI function declarations. As such, there are still billions of them throughout the code, and it would take a long time to convert them all the real ANSI declarations (which C++ compilers require). Don't quite know what to do here -- it doesn't seem like it would be easy to write a scripty-foo to automagically convert everything... Harumph.

Talked with Jeremy about boost things; reorganizing the directory tree, a potential build process, etc. I sent our ideas to a guy on the boost list who I was discussing this stuff with. He replied, but I haven't had time to look over what he said yet.

Talked with Arun about LAM progress. Seems like it is going well, but annoying mid-terms will halt its progress for about a week. Similarly with Brian and XMPI.

Went to Larry Augustin's talk today. No real shockers in his talk
-- we've heard most of it before (open source will save the world, etc., etc.), but it wasn't a bad speech, I suppose. Others didn't like it at all. Oh well.

Had to make a command decision on the SC2000 paraphernalia today --
the company couldn't do beach balls in the time that we needed them. :-( So we opted for footballs; we'll see if they can do those in time.

Arun and I listened to "Slut" for several hours this evening. Wonderful. The song is not what you would expect at all -- it's quite hauntingly beautiful. I suppose that my image of the song would be shattered if I actually listened to the lyrics and found out that it's some kind of pig-worship satan song or something. It's amazing how I could listen to that song on repeat for hours on end and not be able to tell you a single word of what they were singing. It's that good.

I opened up the LAM/MPI CVS tree for anonymous read-only access tonight. We'll see if people actually check it out...

October 11, 2000

A reddish green

I completely forgot to mention Stoopidcomputing things...

We're giving out cool freebies at SC'2000. The orders went in this morning:

  • The pocketknives got nixed. With extreme prejudice.

  • 500 LAM LED-light keychains. They'll be translucent blue and have a white LAM logo on them.

  • 900 mouse pads (I don't know what the hell we're going to do with the extras -- having 900 mouse pads in one place just sounds like an inherently dangerous operation. Are there FCC rules against that?). They're all LAM/MPI mouse pads, with the LAM logo and URL in the top right, "Dept of CSE/ND" propaganda (phone, fax, URL) across the bottom (Kogge paid for it all, after all), and a bunch of MPI function bindings across the majority of the surface area.

    The cool thing is that we've got three flavors mouse pads (300 each):

    • C
    • Fortran
    • C++

    That is, they vary in the language of the bindings that are on displayed on the mouse pad. We're actually predicting that the fortrans go much faster than the C or C++ ones.

Anyway, it's all cool stuff (mainly working on the assumption that if it's free, it's gotta be cool). Should be a fun time down at SC'2000. A picture of our booth is available at http://www.indiana.edu/~rindiana/. A map of where we'll be located on the show floor is at http://www.sc2000.org/exhibits/floor.htm (scroll down to the bottom -- we're a purple booth, number R701).

October 13, 2000

Fuzzy ethernet

Some food for thought.

PBS is just plain sucking. It's unfortunately been flakey ever since we upgraded it. :-( I did find a bug in our AFS/PBS shepherding code a few days ago that resulted in tokens being allowed to expire during PBS jobs that ran longer than the length of your initial token (which I think it defaulted to 10 hours, regardless of what your real default is), but that was our fault, not PBS's.

Yesterday, there was one job that was "stuck" in the queue and wouldn't die. The job was long done and gone, but PBS thinks that it's in an illegal state, and won't let it leave the queue. Hence, the node that that job was on wasn't released. Today, there are many more jobs like that (but those jobs are still running). I have no idea what the problem is, and I'm kinda annoyed.

We asked again for PBSPro (i.e., the commercial version) -- we first asked about 3-4 weeks ago -- and the PBS guys replied that it was taking them longer than they thought to setup their online store (even though PBSPro is free for educational users). :-( I'm kinda hoping that PBSPro will fix some of this flakiness that we've been seeing. :-(


Rusty from Argonne was here yesterday. His talk was good; I'd seen most of the material before, but it was good stuff anyway. We had good chats with him about optimizing MPI collectives (there are some really cool algorithms for this out there..), the future of LAM and MPICH, MPICH's Abstract Device Interface (version 3), my threaded booter (I gave him a copy of it, too), MPICH's mpd, etc. We had dinner at the Lumsdaine Grill, because Someone forgot to get a babysitter so that we could go to the LaSalle Grill. Ah well -- it was a good home-cooked meal, so I shouldn't complain. :-)

I downloaded the ADI-3 document, and it's huge! Compared to the spartan RPI (request progression interface) approach in LAM, ADI3 is a gargantuan.

I just noticed a post on the Beowulf list -- someone posted LAM vs. MPI/Pro (a commercial MPI) vs. MPICH results. The TCP numbers are clearly in LAM's favor. This, obviously, is because LAM rocks. However, MPI/Pro and MPICH have VIA results (which are obviously better than TCP results)... we need a VIA device... You see the results for yourself. LAM ROCKS!!!.

I've been working on IMPI stuff this week. I got the IMPI attributes on communicators working (i.e., on MPI_COMM_WORLD -- since we don't do anything other and MPI_COMM_WORLD yet, we don't have to maintain these attributes on other communicators, which would take some additional bookkeeping, because relative rank order can change, etc., etc.). I also got MPI_Bcast working in fairly short order.

I noticed a good number of typos and one inconsistency in the IMPI standard. Hence, I am proud to say that I am personally responsible for every item in the IMPI errata document. Well, ok, I only helped discover the first one (an issue with the protocol hiwater/ackmark values), but I still had a hand in it.

This is all for the SC'2000 IMPI demo with HP and MPI/Pro -- we're going to run a GUI Mandelbrot program across all three MPI implementations. Should be pretty cool, actually. We had our second teleconference today, and things appear to be going well. We plan to test the stuff across the internet next week. HP and MPI/Pro have been using LAM to test their IMPI implementations. I gave them instructions for CVS access today, so that they can get the MPI_Bcast and color stuff.

I just can't help it -- LAM ROCKS!.


Seriously, though, it is very cool to be working on a project that matters. That is, LAM is probably only used by a few thousand people around the world (at most), but there are many devoted fans who use it every day. Indeed, many people's software relies on ours to function properly -- much real-world depends on what I do in LAM to function properly. It's very cool.

The level of responsibility can be a bit scary at times (indeed, I remember the first time that I noticed a .mil site downloading LAM; I told Lummy about it, and he just smiled and said, "sleep tight!"). Real world stuff uses my code. Hence, if I fuck up, Bad Things can happen. For example, I know for a fact that companies like GE and Exxon use LAM/MPI.

But isn't this the level of responsibility that a good engineer should embrace? I think so. Being Careful about what you do is not just a state of mind, it is a way of life.


Saw a talk from Vince's advisor today about link-time optimizations. Interesting stuff. Similar to things that are available in Solaris (e.g., -O5, where multiple runs generate profiling feedback data that speed up subsequent runs), but it was neat to hear how it works. He was using it in conjunction with MPICH, so I set him straight in his ways -- since they're using TCP/IP, if they really want asynchronous message passing, they should use LAM since we can do it (via the lamd mode, which has its own tradeoffs -- the asynchronous message passing mode isn't free, so to speak).

He sounded intrigued, and said that he would get the latest version of LAM and give it a whirl. And so we progress, one user at a time, towards world domination...


Well, ND's network is going to start shutting down for maintenance in about 15 minutes, so I'm outta here. Next journal entry will be from home.

October 14, 2000

Calamari airlines

ND vs. Navy -- finally a fun game to watch.

Aside from two big mistakes the defense made late in the game (and to be fair, it was at least our second- or third-string who don't have too much experience), we dominated the game. Those are what I like: boring and dominating. This is the whole reason that the wave cheer was invented -- the fans need something else to do to occupy their time.

But CBS's coverage of college football really sucks. They don't get good angles, their camera operators get faked out and don't follow the football, they rarely show replays (even on penalties). And their announcers talk more about anecdotes than about the game that is being played. They suck.

NBC's games take forever, but you get the whole nine yards (hah!) with them -- tons of replays, game strategy speculation, etc., etc.


In other news, ND's network seemed to come back up without much of a hitch. I was on briefly at about a quarter of one this morning and it was back. And the latency from squyres.com to nd.edu seems to be a lot better (granted, there's no students on campus right now, so traffic in and out of nd.edu is probably pretty low. But at least I'll probably have good connectivity for the next week. :-)

October 15, 2000

Clairvoyance and Corn Flakes: Coincidence or Fate?

Last night, Tracy and I went to see a local production of Dracula. I'm a big fan of theater, especially after having done a bunch of productions in stage crew in both high school and undergrad college. The production was actually quite good -- it was theater in the round, with a fully-functional single set.

The technical setup was actually quite impressive (being an engineer and an ex-stage crew type, I tend to notice these things). I couldn't find the control room, for example -- it was that well hidden. Or perhaps the control room was distant from the actual production area, and the techies watch by video (I'm guessing here, but that would be a pretty cool setup).

This production had a few extra twists that separated it from others that I have seen. For example, Lucy had a female friend, Nina, who died before she did. Nina came back as a vampire and started attacking children around London.

Props to a bunch of the special effects, too:

  • Some various pyro, bangs, pops, flashes, etc.

  • Using deep sustained bass noise, very hard to hear -- the kind of sound that you subtly feel rather than hear -- that created a feeling of dread and fear. Very cool.

  • The professor killed Nina with a wooden stake through the heart while she was sleeping in her coffin. Since it was theater in the round, it actually happened right below me -- not 10 feet away. The stake actually appeared to go into Nina, and blood squirted everywhere. Again, very, very cool. That alone made the price of admission well worth it -- who wouldn't pay to see a beautiful vampire seductresss screaming in the throes of death, with blood squirting everywhere?

  • Once or twice, there a character had a sudden moment of clarity and realization. The clock in the corner of the study suddenly got very loud (tick, tick, tick), as if the focus of the world suddenly got very narrow. And then the ticks got subtly farther apart
    -- creating the illusion of slowing down time, and heightening fear.

  • Dracula "disappeared" at one point by means of what I assume was a hydrolic trapdoor in the floor of the stage (I caught a glimpse of it). He was surrounded by a cloak, which suddenly fell to the floor, and he was no longer in it (having been in theater for a while, I was proud of myself for anticipating the classic misdirection designed to make you look away from him for a second while his head disappeared downward -- no one else that I was with noticed it). Most excellent.

  • In the final scene, where they drive a stake through Dracula's heart while he's sleeping in his coffin (more blood squirting everywhere -- yummy), they kill him, and then close the coffin. A few seconds later, his hand pops through the top of the coffin in a feeble attempt to strangle the professor, who successfully evades his grasp. Seconds later, they open the coffin again to really kill Dracula, but all that is there is a skeleton. Cool!

All in all, a good production. The actress who played the maid was a little weak, but the badass transformation of Count Dracula to a Vampire (multiple times, too!) made up for it.


The Director's Cut of the movie The Abyss was on TV tonight; I hadn't seen it in quite a while. Most people aren't aware that there is a 10-15 minute sequence at the end that was chopped from the version that was released. It was all about war and violence in the human race (a sort of commentary on today's society), and how the water people almost killed everyone on the planet with enormous tidal waves. With this sequence, much more of the movie makes sense.

I'd advise renting it to those who haven't seen it -- I'll give it a rating of 10 minutes.


We finally finished all of our thank-you notes from the wedding today. Woo hoo! We had gin and tonics in the excellent ND drink glasses that Brian/Arun gave us.

And speaking of alcohol... I think Arun's proclamation of not drinking until Momar's 40th anniversary is a sham!! He admitted in his journal that he had Kalua pancakes, and later had One Enormous SuperPankake with some kind of flavored liqueur in it.

Hence, I think Arun's thin guise of "not drinking" has fallen away
-- we now see him for the closet alcoholic that he is. Was it really "Sprite" that he was drinking all Sophomore year (by the gallon, I might add)? Does he really like "water" and "Dr. Pepper" that much? I think not, gentle readers. Yes, it's true
-- Arun was even kicked out of the 1996 Olympics (Bulgarian all around gymnastics team), for his excessive indulgence in what he called "pixie sticks", and "Mr. Pibb". Said Mr. Rodrigues at the time, "I just love pixie sticks and Mr. Pibb. Don't knock it until you've tried it! Now don't bother me -- I've got to go practice my Triple Lindie."

(...catch the rest of this exclusive story in a special expose section in this week's National Enquirer)


My fricken' router has frozen 5 times tonight. Destroyed a good uptime, too. It seems that one of the NICs is getting overloaded (I'm trying to ftp/scp/whatever 4.5GB from my router to my desktop, which hangs the machine after a while). Sucks!! I don't quite know what to do about this yet -- I need to get that data over to my desktop so that I can burn a CD of it. Arf!

In other linux woes, during one of my router crashes this evening, it caused the xmms on my desktop machine to freeze. So I did a "ps" to kill it. I found no less than 662 copies of xmms running. No joke.

My desktop has an uptime of over 37 days, and I've been logged in to a single KDE session for probably over half of that time. I guess there's some kind of leak in xmms that's causing that to happen. Weirdness. For example, I see that there are already 11 copies on my desktop now.

Some testing shows that a new one appears every time a new song starts. I'll bet that they are terminated-but-not-reaped threads (remember: linux emulates threads with duplicated processes). <sigh> Open source software can suck sometimes. :-(


Did some LAM work today. Turns out that I was a bit sloppy and checked some crap back into CVS that didn't work. Oops. :-( Caused Arun a bit of pain, too. Double oops. :-(

But it's fixed now -- it compiles (and seems to work) with and without IMPI support. I also added some stuff for XMPI to drop communicator name traces during MPI_Init for MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT (if it exists). I added man pages for MPI_*_set_name and MPI_*_get_name, too, just for good measure. I've got to finish the IMPI extensions to MPI_Reduce tomorrow.


Found a new "hauntingly beautiful" song today. It's not quite "Slut", but it might be close. It's Tori Amos' "Carnival", from the MI-2 soundtrack. I've put it on repeat, but my router (which streams my MP3s to me) has been rebooting, so I haven't heard it continuously enough yet. I'll keep you posted.

October 16, 2000

Pumpkin flavored telephones

Happy happy, joy joy!!

The amazing show News Radio is now showing down here in Louisville!

(or, more specifically, I just found out that it is showing down here in Louisville -- it may have been here for some time)

It's on A&E at 6:30pm and 12:30pm.

After fighting PBS all day, I am bounding with joy to be able to watch News Radio again (the floor of pi is 3).

Life is Good.

October 23, 2000

You insult me.. and of course, my cane.

It's been a few days since I did a journal entry, mainly because I've been traveling. Let's catch up...


Left on Friday night to go to Chicago. Tracy and I flew Southwest from Louisville to Midway. Flying Southwest is an interesting experience. It's a cross between the best of "People's Express" (where you sat on milk crates in the hold, but they were damn cheap tickets) and the Orient Express (there's some really shady people on there, and most people don't speak English). Got to Midway around 8pm, picked up our Avis car, and drove to Jill's.

Seeing Jill was great -- Jill owns her own condo on the north side, right near the lake. We had dinner and caught up with Jill, which was much fun. The next day, we walked along Lake Michigan (very cool) and went to the Hogshead Bar to watch the ND vs. West Virginia game. The game itself was kinda sloppy; we had moments of brilliance, but all told, the final score didn't tell the story of the game. We won, but save a few critical plays, WVa almost beat us.

I randomly ran into some people that I knew at the Hogshead -- two of my old roommates, Mike and Brian (it was good to catch up with them), and an old CS grad named Dan (journal policy not to put in last names to protect the not-so-innocent). He works at a .com in Chicago called www.ubid.com. We chatted about that for a while. His brother is a froshy at ND, and is thinking about CS. Good for him!

After the game, Jill and Tracy and I ran to Marshall Fields to pick up a wedding gift for the reception that Tracy and I were going to that night (stoopid Marshall Fields -- they don't have their wedding registry online yet!!). Tracy and I raced up to Lincolnshire for the reception (the wedding was about a month ago, in Italy) and made it pretty much just in time.

It was fun -- I didn't know anyone (it was one of Tracy's co-workers who got married), but we saw a bunch of GE people that Tracy knew, and they were nice folk. We had a good bunch of laughs, and a good time was had by all. Hell, the booze was free -- how can you go wrong?

By the end of the evening, however, my ears hurt from the music. They had a live band, and they were actually pretty good -- it was a Benny Goodman orchestra-style band, but played all kinds of music. Their singers were quite good, and very lively (dancing on the dance floor while singing, etc., etc.). They even had a mixer boy, but I came to hate him because I saw him keep edging up the "master volume" slider. Bastard. I hope that his MPI programs rot in hell.

We flew back Sunday morning and got back here around noon. I did e-mail but was otherwise uninspired to do any work, so I lazed around and watched TV. A good Sunday. :-)


Bandwidth to nd.edu is sucking again. Well, it's not sucking, but it's certainly not nearly as nice as it was during break last week. For example, streaming MP3s from nd.edu to squyres.com is pausing all the time. Icky.


After having been gone for the weekend, I am shocked to discover that my Mojo level has fallen to about 850,000 (it was about 980,000 when I left). This amazes me -- I left my mojo server running all weekend, but I personally did nothing with it all weekend, and yet somewhere in there I spent about 130,000 mojo. How could that happen?

That's not the whole story, of course -- I do have about 100,000 mojo "coming in" (when people spend mojo with you, it doesn't necessarily come in right away; there's a credit system for totals up to 10,000 mojo -- see http://www.mojonation.net/ for more details), so I actually didn't lose all that much -- but it still seems wrong. That is, I have mojo going out at a much higher rate than it is coming in!

I hope that it's just still bugs in the system. It doesn't take an accountant to realize that even though my consolidated total isn't much less than when I started, you can't spend what you don't have, so if mojo [actually] is going out faster than it is [actually] coming in, you're screwed!


Did some more research into DSL for my church. They want to get DSL for the following reasons:

  • They have 3 separate computing resources right now that they want to consolidate into one bill:
    • The Youth Center, which is physically distant from the church's main administrative offices, uses e-mail, and has a $9.95/month Juno account.

    • The main admin offices have an AOL account at something like
      $21.99/month, with 7 e-mail accounts.

    • They have a web site that's hosted at a local company for something like $19.99/month.

    This comes to a total of something like $42/month. DSL will at least double that, but there are other factors as well...

  • They only have a total of 8 e-mail accounts, but have at least 12-15 people who need e-mail. Hence, they're maxed out right now, and need to expand.

  • They only have so many phone lines at the church; when people are on the phone for e-mail or web, that's one (or more) phone lines that can't be used for regular business.

And actually, the admin offices are already wired on a LAN, so they're pretty well setup. After some preliminary investigation, prices in this area for 192kbps/SDSL (the church is technically considered a business, so they can't get the cheaper residential rates) are between $100-120/month.

Still need to contact a few more vendors (I'm doing it during lengthy compiles and/or network transfers nd.edu<-->squyres.com) to get some more options. It's not just the base bandwidth that they charge for
-- they all have different services in terms of number of mailboxes offered (for free), how much web space they offer, whether there's a dialup line (for the Youth Center), Etc., etc.


WHOO HOOO!!!

My boss from my army unit just e-mailed me -- he got me a tentative position in Army high performance computing; apparently I'll be in the "hacking" group. This could be interesting!

This is just the results from a few preliminary meetings that he has had with a group (in Aberdeen, MD, I think). We'll see where it goes.

But it least it looks like I won't be forced to go back to be a signal platoon leader somewhere. Whoo hoo!!


I've changed my "Dissertation" topic on the journal to "Technical", because I find that most of the "Dissertation" stuff that I send is only sometimes related to my dissertation work. Most often, it's just some techincal stuff that may or may not be related to my dissertation, or anything at all, for that matter.

There's enough bad vibes in here to run a Vodoo factory

I did much work on IMPI today.

Lesson for the wise: never write/debug parallel programs with only two nodes. Always use at least three. Three is probably better than four, actually, if your program has to work for all general cases.

I already knew this, but I discovered it again the hard way today. I'm working with HP and MPI Software Technology on our IMPI demo for SC'2000; I thought that I pretty much had LAM ready to go on Friday. Today, I tried it with three clients (instead of just two, up in nd.edu) -- i.e., two clients in nd.edu and a client down here in squyres.com for a local display (the demo is a GUI plot of the Mandelbrot set --
the plotting is calculated in parallel, and the results are sent to the display master to be shown on X).

Everything worked great with two clients, but started barfing horribly with three clients. Ugh! I had to go around and fix all the places where I had made bad assumptions and whatnot.

So, kids, please don't program in parallel with just two nodes --
always have adult supervision and use three, four, or two hundred nodes.


It didn't help that there were actually other bugs in the demo code that we're supposed to run (the parallel Mandelbrot stuff was originally written by the MPICH guys and then modified by the NIST folks for specific purposes of the IMPI SC'2000 demo). I found at least two bugs today (remember: broadcasting pointer values across multiple architectures is meaningless) -- possibly more, but I think I've blocked them from my memory to prevent further trauma.

I also had a few bugs left in LAM -- the code for calculating host and client colors and sizes looked like a Darwinian experiment gone horribly wrong. I had to evolve that code into something better and greater -- to make it more than the sum of its parts. Now, it rocks with the rest of LAM.

I just can't help it -- LAM rocks.
It all seems to be working now. It's happily checked back into CVS, and hopefully I'll be done with that for a while...


Conversed with a guy at GE Aircraft Engines today. They're using LAM for somethingorother. He asked for a good feature on Friday (see his post on the LAM list); so I moved our discussion off the list and we'll iterate through a few things trying to get it right.

In related news, GE acquired Honeywell today. And "Just Jack" will stay on as CEO for an additional several months (he was going to retire next April, IIRC) until the end of 2001. You just can't go wrong with "Just Jack".

Glory be to the Father, the Son, and GE's stock price, amen.

Past present participle future improbably never tense

(this is a few days old -- I started it before last weekend. So take all present tense to be past tense)


Learned some wisdom today. It was painful, so I'm going to share in the hope that others may save some time...

On the eternal quest to have "proper" Makefiles, we had quite an elaborate setup for dependencies in LAM/MPI (the automake stuff for generating dependencies is broken for non-GNU make). The only problem was, it didn't work for VPATH builds. We were somehow under the mistaken impression that you didn't need make depend in VPATH builds.

Sidenote: For those of you unfamiliar with VPATH builds, it's a slight variation on the GNU standard "./configure ; make all install" Scheme of Doing Things. It allows you to use one source code tree to build multiple binary trees. i.e., you download a random tarball, expand it to its source tree, and then run "./configure ; make all install" multiple times simultaneously. What's the benefit? For building on multiple architectures, and/or with different configure options, of course! If you think about it, this is a really handy mechanism.

It works like this (I slightly lied above): you expand the tarball, and make a new directory to build in. And then run configure (and make) from that new directory. For example:

unix% gunzip -c foo-1.0.2.tar.gz | tar xf -
# ...makes foo-1.0.2/ directory...
unix% mkdir build
unix% cd build
unix% mkdir sparc-sun-solaris2.7
unix% cd sparc-sun-solaris2.7
unix% ../../foo-1.0.2/configure \
--prefix=/yadda/yadda/yadda/sparc-sun-solaris2.7
# ...much output...
unix% gmake all install

(The final "gmake" is necessary because Sun's native make isn't VPATH enabled)

Hence, you can have multiple of these puppies running simultaneously, all from the same code tree. This is really handy in development, too, when you need to test on multiple architectures simultaneously.

But now I see the error of my ways (it took developing on Solaris and Linux simultaneously with the same code base to show me this piece of wisdom). Hence, I set about to make our depend target work properly for VPATH and non-VPATH builds. Easier said than done.

Although I already knew this, I have finally and firmly decided that make's rules for syntax (particularly quoting) SUCK.
We use the GNU tools automake and libtool to build LAM/MPI (the use of libtool doesn't actually matter here, I just wanted to use it to mention our sponsors -- buy GE products today). Now previous journal entries have shown how automake can be your friend, but automake can also be your enemy (very similar to power tools, in this respect). This journal entry has nothing to do with automake (buy GE appliances).

In our automake setup, we include a top-level Makefile.depend file that has our "depend" target. It was fairly lengthy and involved, and it applied to the whole tree, so this made sense.

For an hour or two, I tried to make it do VPATH stuff properly. This involved the following:

  • Getting the source file list
  • Running makedepend on all of the source files

Sounds pretty simple, eh? Not so, gentle reader, not so. Here's why:

  1. First off, GNU make sucks. I don't know if this is a documented "feature" or not, but it certainly makes no sense to me. So when you have a list of source files (e.g, "BLAH = foo.c bar.c baz.c"), GNU make happily prefixes each of them with the VPATH for you.

    Whoo hoo! This saves a lot of trouble of doing it manually. After all, none of the source code files are actually in this directory -- we have to add some kind of prefix to get to each of them.

    However -- closer examination reveals gmake's suckage. The last file in the list does not get the VPATH prefix applied! Why? I have no idea. But it pretty much fucks up the whole scheme -- it's pretty useless to get all but the last one.

    It's not ok if you only get five chicken McNuggets when you order the six-piece combo at the drive through. Heck, no. You get all six or its throwdown time.

    As such, I had to write code to a) strip off the VPATH prefix from each entry (if it was there), and then b) add it back on to every entry. Not that this was extraordinarily difficult (but escaping the sed expressions in the Makefile was a bitch...), but I shouldn't have had to do this.

  2. With the re-VPATH-prefixed list of source files, you can run makedepend. But oops, it barfs. It seems that it can't find the file lam_config.h. Arrgh -- that's the one that configure generated via autoheader. It seems that automake isn't smart enough to add -I$(top_builddir)/share/include to CFLAGS --
    it adds -I$(srcdir)/share/include instead. What the hell is the point of that?

    (translation: automake is adding a -I for the source tree, not the build tree. But the config .h is always put in the build tree -- not the source tree. So I'm not quite sure what the logic is here)

    So we have to manually add the -I for the build tree. Not nice -- we shouldn't have to do this -- but very easy to do, so move on.

  3. Whoo hoo! It seemed to work! Checking the generated Makefile... #@$%@#$%@#%@#$%!!!!!!!

    All the dependency entries are for "VPATH/foo.o", and "VPATH/bar.o", etc. instead of "foo.o" and "bar.o". That is, we're building foo.o, not ../../foo-1.0.3/src/foo.o. Hence, the Makefile has to show the right dependency.

    CRAP.

    So we have to add some more sed mojo to post-parse the Makefile and strip out the VPATH prefixes from the generated dependencies.

  4. Ok, run again. Seems to work this time. Let's try it on the whole source tree...

    Barf-o-rama. One of the source directories in LAM has almost 250 source files in it. Adding "../../lam-6.3.3b44/share/mpi" to every entry in the list quickly overflowed the shell's buffer for a single variable. Hence, it just dropped all the additional filenames.

    So I had to add a loop around the file list to only process about 20-25 at a time. <sigh> This really became painful at some point; I hurt.

    Trying once more... #@$%@#$%@#$%@#!!!!!

    Since we're running makedepend multiple times, it only saves the output of the last run in the generated Makefile. Hence, it saves the dependencies of the last 20 or so files; all the previous dependencies are snipped each time makedepend runs.

    Luckily, makedepend has a -f option to specify where to send the output, so we can save it in a temp file and tack on successive results to the end of the Makefile.

  5. Try again.... <sigh> Still no love.

    Now it's not ditching the previous results at all. Since makedepend isn't running on the main Makefile, it doesn't snip the previous dependencies. Hence, we have to do it ourselves. Redirect some input to ed to snip out all lines after "# DO NOT DELETE" (seems pretty ironic, doesn't it?) and catenate the new results on after that.

  6. Finally... it works.

That whole process actually took quite a while -- adding additional quoting for make (especially in the sed expressions) made it arbitrarily difficult. So somewhere near the end, I said fuck it, and moved the whole thing off to a bourne shell script. It actually became much easier at that point -- I should have done that much earlier. The depend target actually became pretty small at that point; it just calls that script with a small number of arguments followed by the list of files (also as command line arguments to prevent single-shell-variable-overflows).

The moral of the story: it works now. It works for VPATH, it works for non-VPATH. If you want the script, LAM's anonymous CVS access
-- it's config/run_makedepend. The depend target itself is in the top-level directory, a file named Makefile.depend.
Save the planet: reuse code. Feel free to steal/improve this depend target. Your country depends on it.

It's all about the subliminal.

October 25, 2000

I love Kung Fu movies...

Some quickies...

  • Dad got the "LoveLetter" virus on all his 'doze machines at the store yesterday (it spread itself via mounted drives and went rampant across three machines). Viruses suck; it automatically overwrite all .jpg and .vbs files on all three machines. It's not quite clear where it initially came from, either. Dad had up-to-date virus protection, but he had an older version of Norton AntiVirus, and it wasn't automatically checking e-mails, so I suspect that this is where it came from. <sigh>

  • Possibly going to see "Rent" with Janna in December. That should be fun. I saw it in London, and laughed uproariously at "You can take the girl out of [New] Jersey, but you can't take the [New] Jersey out of the girl." Being from Philadelphia, this is enormously funny to me (we make fun of New Jeresians all the time). But it's apparently an American joke, because no one else laughed.

  • And old ROTC cadet of mine (Trent) is now out of the military and working at GE Appliances. Small world.

  • Not sure if I'm going up to ND this weekend or not; should know by the end of the day.

  • The HP guy (CQ) found some bugs in my IMPI code for synchronous sends. Ugh. This is proving troublesome to track down...

  • The motherboard/PROM on the Airmics mail server is fried; it is crashing multiple times a day. Suckage. They trying very hard to get the new server setup, but it just takes time...

October 29, 2000

There's no private property the LSC!

Many days, no journal entry. The usually nemesis is at fault: traveling.

I've been up here at ND for the latter half of this week. Mainly for SC2000 coordination (the freebie mouse pads arrived way early. Yay!) and other miscellaneous tasks. I also made my famous "hockey puck" chocolate chip cookies this week for the efforts of the Engineering Graphics department (ok, mainly because Joanne from EG said that I owed them cookies for their efforts). For the uninitiated, it is widely known that I make the World's Best Chocolate Chip cookies. They're roughly the size and shape of hockey pucks (hence, the name); none of these twice-the-diameter-of-a-nickle and paper thin kinds of chocolate chip cookies for me. Hell no. Soft and chewy in the middle with a 1lb pound bag of chocolate chips in the mix just "so that there should be enough". One of these cookies can serve as a meal. A double batch made 12 cookies this week.

Anyway, that all went well, and we finished up our virtual posters for SC2000. I had to use some evil powerpoint animations in them, but they'll be ok. We still need a result graph from LAM/myrinet (more on this below) for the slides, but everything else is finished.

Sidenote: Myrinet is a proprietary network that runs at gigabit speeds. i.e., orders of magnitude faster than 100Mbps ethernet. You can run TCP/IP over Myrinet -- they provide a driver for it -- but it's at a significant cost in performance over "native" communication over the Myrinet hardware. "Native" communication is provided though a library called "gm". Hence, we're adding a "native gm driver" to LAM to utilize this ultra-fast communications over Myrinet in LAM directly, rather than relying on TCP/IP over Myrinet. This is what Arun has been working on since the beginning of the summer. We want to have [at least] a beta of this stuff working to show off at SC2000.

Arun and I tried to make a result graph for LAM/gm -- just a basic one showing "TCP over Myrinet is good, but gm over myrinet is better!", but unexpectedly got bad seg faults and couldn't produce anything. This generated the rest of my Friday evening, and most of Saturday morning.

Before all we could launch into extensive debugging, though, we had to otherwise finish up the slides. Got some good slides for LAM/gm (Arun), XMPI (Brian), and IMPI (me). After everyone else had left, Lummy wandered in (while I was still working on the slides; perhaps 6:30pm or so). Had a long chat with him about the future of LAM and whatnot. It was especially interesting with the prospects of MPICH's going through an entire re-write (with the focus on their ADI-3 work now -- already a 70+ page document!). MPICH 1.2.1 is probably pretty near the end of the line for that code base; MPICH 2.0 will probably have some elements stolen from MPICH 1.2.x, but will likely be mostly from scratch. This is really cool stuff, actually.

I spent the rest of the night upgrading the version of GM that we had. We reported what appeared to be a bug in gm to Bob -- one of the authors (a very helpful guy, actually), and he said, "you're using a really old version of gm -- you should upgrade and see if the problem just goes away)". Ugh. How embarrassing! Turns out we were using gm-1.1.3, and the latest is gm-1.2.3. Oops.

myri.com is apparently connected to the world through a 300 baud modem; it took about an hour to download the 1.2.3 tarball (only a few megs). It took a few tries to get it installed properly -- we have really old Myrinet hardware (probably a few generations behind current stuff). Myrinet utilizes a kernel module in Solaris, so you have to take some care to build and install it properly. And compiling on the Solaris 2.5.1 140Mhz machines is just painfully slow. Ugh.

So I finally got everything up and running around 11pm or midnight. I ran some test programs, and finally decided that everything was working properly. Then I ran a simple test program through bcheck. Badness. Lots of "read from uninitialized" errors from within libgm itself. Crap!!

After a lot of source diving in libgm, I determined that the problem was a buffer that was supposedly being initialized by an ioctl() call into the gm kernel module. The upper libgm was providing the buffer and expecting the lower kernel module to fill it in. It took a lot of hacking around and source tracing in workshop to absolutely verify that the lower kernel module was, indeed, filling that buffer properly, but it remained a mystery to me as to why bcheck would think that the buffer was uninitialized. Worse yet, sometimes bcheck reported that everything was fine -- no read-from-uninitialized. Hmm.

Hesisenbugs suck.

It didn't occur to me until Saturday morning that bcheck couldn't possibly know that the buffer was filled -- bcheck only monitors the process under debug; it doesn't monitor the kernel module at all. So it makes perfect since that while the buffer is initialized by the kernel module, bcheck simply has no knowledge of it, hence, it reports it as uninitialized when upper libgm reads from it. Although this doesn't explain why bcheck sometimes reported that all was well, I'm 99% sure that this is what is happening. Bob later confirmed my suspicions, too.

Hence, I [effectively] added a memset() to the upper libgm code, and bcheck finally only reported Truly Bad Things --
similar to what we had to do in LAM when we know that uninitialized buffers are ok ("when you optimize code, all coding guidelines and rules are out the window, and painfully splattered on the ground below").

I then set about trying to debug a simpler example than NetPIPE (which a de facto MPI latency/bandwidth benchmark program) -- the program that we were trying to use to get some result graphs for LAM/gm. I made a simple "hello world" ping pong MPI program, and tried to debug that. Arun came in around 11am or so, and we set about stepping through the internals of the gm progression engine inside LAM. Not for the meek.

It's good that Arun came, 'cause he wrote the stuff, and I wasn't completely familiar with it (indeed, I had only seen the internals once before -- when we had a code review about a month or so ago). So his explanations and rationale were quite helpful. We finally tracked down a repeatable kind of error in the simple ping-pong program, but then had to leave for the football game.


ND vs. Air Force. Wow. A real nail-biter, there. I can't believe that we won. It's horrible to say, but our offense really did not look good at all during the game. We had one decent drive, and it was full of 3rd and longs. The rest of our points were off lucky Big plays and the like. :-( Granted, I was in the stadium and didn't have the benefit of instant replay and the like, but it didn't look pretty from the student section.

Our defense was kinda shaky, too. We had some great stops a few times -- held them to 3 points at least twice, for example, a blocked field goal (which put us in overtime -- and later gave us the game), etc. But they were able to throw all over us all day. Our pass defense was just not good.

But in overtime (!) we managed to win the game. Air Force went first and we held them to 3 points. We then came back and got a touchdown, putting the final score at 34-31. Amazing. It's our first overtime victory -- we were previously 0 for 3 in OT.

Some other random points about the game:

  • Great flyby from 3 F-16s (or F-18s...?) during halftime. Well timed, and it was lead by some 1LT who graduated from ND in '97.

  • There was some woman behind us who was clearly visiting some friends here at ND. Whenever she opened her mouth, stupid came out. Some memorable quotes:

    • (during the band's halftime tribute to the military, where there were various military people on the field with the band, the American flag and the flags of all four services were flying on the field) "Is this some kind of Halloween thing?"

    • "I just love that Leprechaun guy! I just wanna scoop him up and hug him!"

    • "So they're not really downs, are they? They're attempts at downs, right? So why does everyone call them downs?"

  • Saw Tony and some other JeffJournal fans after the game. Felt kinda silly, because I didn't recognize Tony right away (it's the beard! I swear it!) -- duh. But then later, I realized that I really hadn't seen Tony since last spring, and I felt [somewhat] better. :-)

  • Tracy and I went to see Pay it Forward afterwards. Not a bad flick. Not quite as complicated and intricate as I had hoped, but still not bad. so I think I give it an official vote of "sympathy".


This morning... back in the lab, and I think I've narrowed the problem down in LAM/gm to an unexpected receive. A pointer is not getting reset properly in the gm progression engine, and when an unexpected receive (definition below) comes in, a linked list is attempted to be searched for a request that is no longer valid (and has actually already been freed). Hence, sometimes it works, and sometimes it doesn't.

I suspect that this is just an error from the "translation" of the TCP engine to gm. i.e., we literally copied the TCP progression engine and gm-ized it; I suspect that this bug is just an error in the gm-iziation process. Hopefully, this will be the last Big Bug...

An "unexpected message" is one of the Big Concepts for MPI implementors. It is possible that a user does a send from one rank before doing a receive on the target rank. Hence, the message may actually arrive at the target before the necessary bookkeeping has occurred to setup to receive that message. Hence, when the target gets such a message, it files the message in the "unexpected" queue. When the matching receive is finally posted, it first checks the unexpected queue to see if the message has actually already arrived before going to actually check the message passing hardware for the message. There's a lot more to it than that, of course, but that's the gist of it.

Hence, when LAM/gm receives an unexpected message, it's checking the list of outstanding receives improperly.

I'm off to go squash this fucker, then a visit to Chez Lummy, then back to Looieville. Rock on!

November 1, 2000

But it my leadership that got you in that dress

This is prep-week for SC2000 -- so most entries are likely to be technical. Deal.

The boys from HP have done it again -- they found a rather gaping hole in my IMPI implementation in LAM. Doh!

Quick explanation:

  • When a "long" message is sent across IMPI boundaries (where "long" messages are defined as longer than an agreed-upon number of bytes), it is broken up into 2 or more packets, where packets have a previously-agreed-upon maximum length. The first packet of a long message is sent "eagerly" (i.e., right away), and is marked as "first of a long" -- it is called a DATASYNC packet. When the receiver gets a DATASYNC, it allocates enough space for the whole message, does some other bookkeeping, and then sends back a SYNCACK telling the sender "go ahead and send the rest of the packets; I'm ready."

  • Messages in MPI are identified by the communicator that they are on (essentially, a unique communications space) and the tag that they use (a user-specified integer that distinguishes between messages). Messages that have the same source, destination, communicator, and tag (for purposes of this discussion), have the same signature -- meaning that multiple messages with these same characteristics would be judged by MPI to be the matching messages.

  • When you send a message in MPI, you have to receive with the same signature. Hence, the signature of a send and a receive must agree.

  • Note that the signature has nothing to do with the contents of the message. Two messages with the same signature may contain completely different data, and even be completely different sizes. More to the point -- the message signature is a user-specified set of attributes, so it's up to the user to assign meanings to them; MPI just provides a flexible way to distinguish between different messages with the signature mechanism.

  • MPI has a message ordering guarantee for single-threaded, non-wildcard operations. That is, two messages sent with the same signature must be matched in the order that they were sent by the receiver. That is, if I send message A with signature Z, and then immediately send message B, also with signature Z right behind it, A must be received before B. If you think about it, it's pretty intuitive, actually.

For a longer discussion of MPI, see the MPI Forum web site. For a longer discussion on IMPI, see the IMPI web site.


Something that I hadn't thought about before was that long and short messages have different protocols. Take the following example:

  • Send long message A with signature Z.
  • Send short message B with signature Z.

Given what was discussed above, only the first packet of A will actually be sent to the receiver, whereas the entirety of message B will be sent to the receiver (because it's shorter than the length of one packet). However, A has to wait for an ACK and then the rest of the packets from the sender before it can be fully delivered to the receiver.

My implementation of IMPI didn't take this into account at all --
it just served up messages as soon as they became available. It didn't take into account the fact that long messages may be "in progress" and a short message may sneak in before the long message completed, and thereby violate the MPI message ordering guarantee. Doh!

Hence, I had to spend the majority of today writing diagrams and flow charts, and then implementing a "gate" at the delivery end of IMPI such that it watches for long and short messages, and has a somewhat-complicated state machine to only allow messages by when long messages are not already in progress. If a long message is in progress, the just-received message (even if it's the first of a long, itself) is queued up. When the long at the head of the queue finishes (i.e., we sent the ACK back to the sender, and the sender sent us the rest of the packets in the message), the rest of the queue can progress until either the queue drains or the first of a long is encountered (then we have to send the ACK back to the sender, wait for the rest of the packets, etc.).

Not a simple undertaking.

After all this, I got it working with HP's IMPI implementation. I found a bunch of memory leaks in our proxy agent (the "impid") and fixed up all of those. There's still a ton of "blocks in use" when the impid quits, but those are all from the internals of Solaris and there's nothing that I can do about them. :-\

After fixing those, I released LAM 6.4a6 to HP and MPI Software Technology (the two MPI vendors that we have to demo this stuff with next week at SC2000).


I love bcheck. I can't imagine how I programmed before I discovered it. Go RTFM if you don't know what it is.


One thing that bugs the crap out of me, though, is our implementation of what is called the IMPI server. The IMPI server is basically used as a rendevouz point at the beginning of a run. All the MPI implementations meet there, have some coffer, exchange some meta information, and then go off than shake their booty.

Needless to say, this all contains lots of socket code. The server allows you to specify what port it sits on to listen for the MPI implementations to meet at, or take a randomly assigned OS port. It's frequently convenient to use a fixed port for repeated runs, so that you can just do !! (or up-arrow) in the server and client windows and not have to change the port number in the various command line arguments.

However, sometimes when one tries to fire up the server again, it complains that the socket is "already in use", and you can't reclaim it for several minutes while the OS times out. Result: you have to go change the port number in all the command line parameters, which is a pain.

The thing is, I don't know why it says that the port is "already in use" -- I don't know the conditions that lead up to this. Indeed, take something like sendmail or apache -- it can always fire up on the correct port (25, 80, respectively) no matter what state it was previously shut down in. This suggests that it's not a client action that guarantees that the port will be open, but a server action. But I'll be damned if I know what it is. :-(

If anyone has any insight here (and is still reading this :-), please enlighten me...

November 8, 2000

Would you like a mouse pad?

Been a busy week.

We're all here at SC'2000.

Things were not going well at the beginning.

They appear to be going well now.

The schmoozometer has been set on ultra.

Lots of people at lovin' LAM.

Must go.

Miles to code before I sleep.

November 10, 2000

Would you like a mouse pad?

There were some rocky parts, but I think we had a good SC2000 overall.

This is an epic journal entry. Cope.


Sunday

Some of us met in the lab where we gawked at the LAM and LSC shirts that Jeremy picked up on Saturday night. They rocked. The nd.edunetwork went out around 11:45 (note: this is important for later).

Long flights, a three hour layover in Midway (what are crazy place), arrival in Dallas. Lummy met us at the Dallas airport. ATA lost Arun's luggage, but we waited there for a while anyway. Got in, had dinner (which was Much Fun), and started some slides in Arun/Brian's room.

The hotel has high speed internet access, but nd.edu was down. Luckily, nd.edu's vBNS link was still up, so we could get in via Berkeley or Argonne. So life was still ok -- we could still get to our e-mail and do some work. No biggie.


Monday

We got to the exhibition floor somewhere around 9am. We appraised the situation, said hi to all the good IU and Purdue folks, and started to get our stuff together. The commodity link to nd.edu was still down, so we started downloading LAM and XMPI via Argonne.

Then.. BAM!!!

nd.edu's vBNS link went down.

And stayed down.

Life sucked.

As Arun said in his journal, epics have been written about less. We cobbled together [mostly] working versions of LAM and XMPI from backup and working copies at Berkeley and IU and our laptops. Ugh. We were all cursing Ameritech (supposedly the cause of the nd.edu's outage, but I still blame the OIT).

It was a race against time to get all our stuff downloaded, assembled from the various repositories around the country, couple it together with some missing software from ftp.gnu.org, battle a shaky SciNET (the network on the SC2000 show floor -- it kept going in and out), and get it all working.

The deadline was 7:30pm -- my IMPI demo. We finally got enough downloaded, and I met with the HP people. We were further confounded by the fact that the union folks made us clear the aisles in order to lay all the carpet between all the booths. Hence, I couldn't travel to the HP booth to coordinate with CQ (HP's IMPI guy -- his real name is Asian, and probably unpronounceable to us Americans, so he goes by "CQ"). I finally got over there around 4pm, and we did some testing.

After a bit of futzing, we got it up and running with HP as the master and displaying on his machine (we had to download and install ssh because they didn't have it, and the IU demo machines didn't have telnet (yeah!). But it all worked out.

After some more battling (battling low battery power, shaky SciNET connections, and pesky sales droids), we got it to work properly with the IU booth machines as the master. Whoo hoo!!

We also converted Matt from Purdue from MPICH to LAM. We reduced the complexity of his Makefiles dramatically, and showed him the goodness of lamboot, mpirun, etc. He said, "I'm a convert!". Another happy customer.

MPI Software Technology (MST), however, wasn't quite as lucky. :-( They didn't bring the right kind of fiber connectors to get on SciNET, and then the local Fry's was out of the right kind. Their IMPI implementation was not quite finished, either. I managed to download a recent copy (nd.edu came back up that evening) of LAM's IMPI distribution tarball. I downloaded a copy to their LAN and helped him get it up and running (they previously had some problems trying to install LAM, but I don't quite know why...). Rossen thanked me, and started debugging.

So my demo went off at 7:30pm and it seemed to go off well. I had a varying size of crowd watching. I was a bit annoyed, though, because literally at the last minute, I got switched to the other Imersadesk, and nothing was setup right. It took a good 10 minutes to get it setup right just so that I could bring up my slides. It was somewhat embarrassing because the NIST folks (the people who funded our IMPI work) were standing there waiting for me to start talking. But it eventually turned out ok.

We gave out a surprising number of LAM key chains (they were quite popular!). We walked around a bit and saw a few people, and it was generally pretty good.

We left there, dropped our stuff off back at our room, and went to the Beowulf Bash (which was conveniently in our hotel). It was pretty cool; when we got there, they were announcing that more deer was coming immanently (and it did :-). We chatted with Dave from Myricomm (and ND grad) and swapped ND stories. I also chatted with Doug from Paralogic, Don from Scyld, and and Dan from Scyld.

Dan chatted with all of us for a while -- they do some really cool stuff in Scyld for their clusters. They have an rfork() call that forks things onto nodes (and an associated rkill()), and do process migration all over the place. They directly load the BIOS to boot linux in 3 seconds, and the get everything else from the cluster master. I don't know all the details, but it sounds good.

I also chatted with Dan about the parallel MP3 encoder that I wrote a while ago (he downloaded it was amazed that he downloaded something from a .edu site -- particularly the LAM/MPI site -- and he ran ./configure / make with his MPICH distribution, and it just worked). He also wanted to talk about a parallel ogg vorbis encoder, and wants to write a paper about it on Linux Journal (I think it was LJ -- can't recall offhand). This could be really cool. I think we might do it.

I sent Dan an e-mail later saying, "let's do it -- how do you want to precede?" We'll see what happens. Also, Scyld is interested in LAM -- to do so, we would probably need to ditch the lamd. In such a case, Scyld would have to provide some services like process management (which I think they already do), an out-of-band messaging channel (which might be harder), potentially trace gathering, and name/value publishing. We'll see how this all works out.

After all the schmoozing, Brian and Arun and I had cigars downstairs and had a good chat about all kinds of things. Rock on.


Tuesday

Saw some MPI papers in the morning. Two were about one-sided implementations. The third was about... er... something. One guy presented results with LAM. Whoo hoo!!

We schmoozed all day. We officially ran out of key chains. We got several t-shirts from several companies, including a really nice button down shirt from Veridian (the PBS folks).

We talked to all kinds of people -- so many that I actually don't remember everything that happened that day. It was good. I do remember chatting with the Myricomm folks quite a bit, though, and chatting with the PBS folks, NIST people, HP,

I stopped by to see how MST was doing with IMPI. They were still having some problems, but I didn't have time to debug with him. I came back later and helped some more -- turns out that he wasn't zeroing out the upper 12 bytes in the IPV6 address, so LAM wasn't able to find a match in the source address. Hence, dropped packets. This turned into goodness; the MST/LAM ping pong tests started working.

Dinner was with the Research@Indiana folks at Fish: An Upscale Seafood Restaurant. All us ND students sat together (except George, who sat with Jesus, 'cause they got there a bit after us). Our conversation was mostly about the GPL, licenses, etc. It was pretty good, all around. A good time was had by all, and the food was excellent.


Wednesday

Got in a bit early to setup the LAM and XMPI demos. We had some real problems. :-( We uncovered some bugs in XMPI at literally the last minute, so I canceled the XMPI demo, and we did just the LAM demo. We actually had some problems there, too -- we had problems making a user MPI program fail in a controllable way (we wanted to show the usefullnes of running an LAM/MPI program under a debugger). But we finally got it, and it worked out ok.

However, we did have major problems with the Sun Workshop debugger -- we just couldn't get it to run. gdb didn't work, either. We had 4 UltraSPARC 10 machines to run down here, but they weren't quite setup the way that we were expecting. In particular, we asked for tcsh to be our default shells. But after some painful processes of elimination, we proved that the tcsh that was installed on those suns was broken -- it caused gdb to fail, and it sometimes caused logins to hang and have tcsh CPU usage to go around 95%
or so. VERY annoying, and very difficult to track down
-- how often do you actually suspect the shell itself? No, you assume other things are wrong (like your . files, the OS, etc.). But switching to csh fixed everything. I've never see anything like it before.

But we didn't figure this out before the LAM demo, so we actually run on nd.edu machines and used gdb (firing up the Workshop debugger invoked just too way too much time). The demo and talk actually went well, though.

I talked with a whole bunch of people throughout the rest of the day -- we wandered the floor some more, talked to some ASCI people, Tony and company at MST, the Compaq sales guys, etc., etc. During my "booth duty" time, I chatted lots of people about LAM/MPI and ND (including some people whose sons/daughters are currently at ND), and particularly with a guy from Sweden about LAM who mentioned that he wanted the ability to checkpoint LAM/MPI processes so that he could take his nodes down and do maintenance on his cluster. And then when he's done, restart the process and keep the MPI job going. I initially said no, you can't do it because of the "socket problem" (i.e., you can't checkpoint sockets -- more info below), but then I started thinking about it, especially with respect to the Condor checkpoint library (very cool stuff). We chatted about this for a while, and I ended up putting it in the background because other things were going on.

Spent a bit more time with Rossen and his IMPI. I don't recall what the exact error was, but we found it and fixed it, and after Rossen worked out the rest of the details, it later worked with LAM/MPI in the pmandel code. Woo hoo!

Spent a good amount of time debugging XMPI and LAM's demo (and figured out the tcsh/csh issues). After figuring out the csh problem, LAM pretty much fell in line right away. Brian and I spent the rest of the afternoon debugging XMPI and stayed after everyone left. We fixed up most everything and fixed up some nagging bugs.

Renzo called in the middle of this and we setup stuff for the BC game at ND this weekend. He's in Vegas this weekend, so no family dinner with Lynzo and the chunky monkey. Bonk! :-(

One of the problems was actually an error in Sun Workshop 5.0's <fstream< implementation. VERY ANNOYING. It turns out that using getline(fstream&, string&) to read in a blank line will start returning true for eof(). ARRGGGHHH!!!

Once we figured this out, Brian and I left for dinner (around 8pm). We passed the Myrinet folks, and chatted with them for a while (lots of laughs -- we share the same exact feelings about writing software, users, distributing software, etc., etc.). They recommended an Italian restaurant for dinner.

Brian and I headed out for dinner, and I brought up the checkpoint/restart problem with Condor's library. We talked about this for a while (we were in one of those cool Italian restaurants with paper tablecloths, so we could draw on it with the provided crayons, etc. Very handy!). A good dinner, with good food. We caught a cab back.


Thursday

More LAM pimping. Had more good chats with Myricomm/Bob Feldman; seems like we could have quite a future there. Near the end of the day, Talked to infiniband people about using their stuff as a high speed fabric for LAM. Had a look at some other booths; talked to the NPACI people, who had some REXEC people, and shared some info about LAM (since REXEC has some common elements with LAM).

Went over to the RealWorldComputing booth; they have some cool stuff, including SCore MPI. Meant to look at that last year, but...

Then we talked to a few linux integrators, pimping LAM. One hadn't heard of LAM (bastards!), but the other was Linux Networks. "Hey Jeff... we talked last year" was the greeting. Amazing. And apparently, Dog and Brian had been there about 5 minutes previously. But we had a nice chat and he gave us t-shirts.

Then the expo was over. We cleared our stuff out of the Research@Indiana booth and went back to the hotel. As we were getting on the shuttle bus, I said to Arun, "hey... some Swedish guy came up to me yesterday and gave me a great idea about checkpointing MPI jobs in LAM..." and then I stepped on the bus. I heard behind me, "Hey... you're the LAM guys! We've been meaning to find you!"

Turns out that the Condor grad students were standing right behind us and heard me mention checkpointing and noticed who we were. It further turns out that they've been having similar ideas -- wanting a checkpointable/migratable MPI. So we chatted on the bus, and then chatted some more in the bar before they had to catch a cab back to the airport. REALLY cool stuff, and we think we can do it. There's some delicious complications, but the fact of the matter is: no one else can do this, and it would be truly fantastic if we could do it.

Condor wants a checkpointable MPI and one that they can schedule/migrate around in Condor, and we want a checkpoint/restartable MPI. This could be the start of a really, really cool collaboration. I'll jot down the notes that are in my head in a technical journal entry after this. I'm still brimming over with goodness about this; I actually think we can make it all work (and get a bunch of papers, become famous, and take over the world). How cool is that?

We then met everyone else from the LSC and went to dinner at the Spaghetti Warehouse in the West End. Good food, and good conversation -- a good time was had by all.

And now I'm back here, typing it all in so that I don't forget it.

Now on to the technical journal entry about Condor/LAM...


So all in all, it was a good SC2000.

Yes, I would like a mouse pad.

I forgot to mention that I am Mouse Pad Pimp Daddy. We came to Dallas with 900 LAM mouse pads (300 C, 300 Fortran, and 300 C++). WE HAVE NONE LEFT!!! I think that I personally handed out about 700 of them.

Rich from the OIT told me that I could be a used car salesman.

November 13, 2000

On diatribes and dianetics

A good weekend.

I haven't finished typing up my technical thoughts on LAM/Condor yet; that's forthcoming.

This weekend was good -- I got back to SBN on Friday evening and briefly stopped at Ed-n-Suzanne's for a most excellent tuna sandwich. I then met up with Renzo and we ended up going to Senior Bar, where we ran into lots more people, Stina, Jason [current 'bone section leader], lil'Putt, Jill B., Jason B., Deli, Catherine K., etc. It was a good time. We then hiked to my office to get the parking pass.

The next morning, I was blading to where Renzo and Schleggue were parked when I ran into Jill B. again. During the conversation, my phone rang; it was Renzo, asking where the hell I was. Oops! I was now very late. But I eventually got over there, and Schleggue, Renzo, and I had some good conversation before we ended up heading over to the Putt tailgater.

More fun was had there by all. Tracy eventually joined us (she drove up that morning), and we all headed into the game. We smuggled Renzo and Schleggue into the student section, which was cool. Mike N., Brian B., his fiance Dana, Jeremy S., and Katie M. joined us as well. It was a fun game; a few nervous moments, but we ended up stomping on the hated BC Eagles, so the day ended well.

Thinking that we were smart, we ordered Papa John's right after the game from the stands on the rationale that it would take forever to get the pizza and we'd be at Oak Hill long before it arrived. Indeed, the PJ person told me that it would be 60-90 minutes before the pizza came.

We ran into Vernon my the car, and invited him along. Jason Brost left a message on Schleggue's voice mail (apparently the #$@%@#$%
wireless circuits get very busy in SBN during football games, and many calls don't get through, so they get switched to voicemail) indicating that he might drop by. So we decided that we didn't order enough pizzas. I called PJ back (it was 30 minutes after our first order at this point) to see if we could add another pizza to the order. The PJ person told me that the delivery guy had already left. DOH!!

So Tracy and I got out of the car (which was stymied in a long line of cars waiting to exit the Hesburgh library parking lot area) and started jogging to Oak Hill (me), and to PJ itself (Tracy). I didn't beat the pizza guy, but he went to the wrong address anyway, so he ended up coming back not long after I got there (which was a good bit before Renzo et al. arrived in the car). Good exercise to jog from the Hesburgh parking lot to Oak Hill, but God, I despise running...

Tracy and I went to mass at the Basilica the next day, but it was so crowded that we had to stand in the vestibule for the whole mass. After a brief trip to the Grotto, Tracy headed home, and I went to a SC'2000 roundup meeting at Lummy's. We chatted about LAM, SC2000, and future directions. Looks like Jeremiah, Ron, and Brian will eventually be joining the LAM Team. Woo hoo!!

Ron also mentioned an ANSI-izer tool that we could use on the LAM source code. Mmm.... I've been wanting to do that for quite a while. Since there ate 900+ source files in LAM/MPI, the standing rule has been to ANSI-ize each file whenever you edit it (it's just too much to go through and do them all at once). But having a tool to do it would be fabulous...

Ron also mentioned the LXR, which we might use to create an annotated, self-referencing hyperlinked version of the LAM source code. That too, would be quite cool. Lummy's big on web-enabled groupware things, so we're probably going to explore a few of those for the LSC as well.

I drove home, took care of a bunch of emails and things that popped into my head while I was driving, and then watched the X files with dinner.

Now on to finishing that technical discussion of LAM/Condor...

November 14, 2000

Candelabra

Ok, so I didn't spend much (any) time on the Condor/LAM stuff yesterday. I spent most of the day finishing up the Password Storage and Retrieval system (PSR) originally written by Dale Southard. We use it with our batch queueing system (PBS) to get AFS tokens when jobs are submitted, and to automatically refresh tokens before they expire so that AFS authentication lasts throughout the entire submitted job.

It's pretty cool stuff -- it uses public/private keys for storing the user's password and whatnot. I've made it fully automake-ized, cleaned it up a bunch, added it to CVS, fixed a few bugs, ensured that it works with both Transarc's proprietary development AFS libraries and the krb4 freeware AFS libraries, and updated the patch to the OpenPBS source code (it's dynamically generated now, too). I finished early this morning and sent it off to Dale for review, and to Bob at PBS so that he can give the patch a once-over.

Hopefully -- that will be it, and I'll be able to release it and get it out of my hair.

Today will be spent answering 3 old LAM emails and working on the LAM/Condor description:

  • Keith from Citifinancial: he has discovered that when in fault tolerant mode, if you mpirun before the lamd's have discovered that one of the other lamd's is down, mpirun will get the wrong information and sit forever trying to spawn a job on a node where the lamd is gone. Hence, deadlock. Need to fix this.

  • Dave from GE: wants to get the native signal/error handler fired when LAM intercepts a SIGSEGV, SIGBUS, SIGFPE. Seems like a reasonable request; need to work with him a little more to get the details right.

  • Patricia from Dec: thinks that she has found a problem with MPI_Intercomm_merge in LAM. Need to check this out; I think she sent a sample program that shows the error.

Off to work...

November 16, 2000

Winter is the finest 7 months of the year in Wisconson

Been cleaning up LAM code for the past 48 hours. Trying to make it compile with a C++ compiler. You have no idea how painful it is.

And just when I thought I had a handle on it (I got liblam.a and libmpi.a and a bunch of supporting apps to compile cleanly), I moved into the lamd tree.

Oh, pain, pain, pain!

I'm in function pointer hell.

The original Llamas did everything in the pre-ANSI way, which was to simply declare a function pointer with the right return type, but with no arguments in the argument list. I guess this works...?

Part of the problem is that many of the lamd functions are supposed to return function pointers to the [effectively] to themselves. More to the point, they have to return pointers to functions that have the same signature as themselves. That is, function A has to return a pointer to function A (or a function that has the same signature as A).

After dinking around with this for quite a while, I sat back and thought about it, and it turns out that C/C++ can't do this legally. i.e., you can't declare a function that returns a pointer to a function with the same signature. It's a recursive problem -- trying to do so changes the return parameter type, which then changes the function signature, which then changes the return parameter type... etc., etc.

A more concrete example:

ret_type func_name(arg_list);

The goal is to have a function signature (call it func_sig) that encompasses all of that. However, func_sig must equal ret_type, which, if you think about it, can't be. Hence, C/C++ is unable to describe this abstraction.

This is actually very interesting (to me, at least), because I've never run across something that C/C++ just couldn't do because of its language specification. Sure, there are tons of things that C/C++ is not good at, but I can't recall ever running across something that it just couldn't do because of its language.

Anyway, getting tired -- off to bed before I screw up the LAM tree...

November 17, 2000

Extra thrifty lima beans

New version of Mojonation came out a few days ago. I noticed this because I suspected a memory leak in Mojonation because my router would become increasingly slow (although I never checked its memory usage... doh!) and swapping activity would become much more pronounced (I have a loud disk drive in that machine :-).

So I restarted mojonation today, and it told me that there was a new version available on the web site. Among other things, it fixed a memory leak. :-) We'll see how this bad boy performs now...

Additionally, Lummy sent around a hot tip about Linux's hdparm which allows you to tweak the performance of your IDE hard drives. I tweaked a bit on my laptop and got a good amount of speedup. Same for my router -- tweaked a bit and got some improvement (from about 4.something MB/sec to 6.something MB/sec). On my desktop machine, the performance increase was dramatic. I went from 4.83 MB/sec to 25.50 MB/sec! That rocks!


Per request, I created web archives for our LSC staff internal mailing list today. Some peals of wisdom have been mailed across the list (C++ tricks, location of Friday lunch order files, etc.) and been lost. Web archives fix that.

I also made it a real mailing list instead of a sendmail alias. GNU mailman ROCKS.


I forgot to mention in the journal that a few days ago (or was it last night? Time has no meaning...), I formally released the Password Storage and Retrieval system (PSR) that allows OpenPBS jobs to run with AFS authentication. I also pinged the Condor guys about it (today) since I seem to recall that Dale said something about how they were interested in it. But I could be halicinating.

Speaking of Condor, I mailed off the huge technical entry about LAM/Condor (curses -- it just occurs to me that I set the category incorrectly on last night's journal entry!) to the Condor folks. Erik says that he'll read it this weekend in depth and discuss it with the other Condors next week.

I wonder if they refer to themselves as Condors as we refer to ourselves as uber-auth^H^H^H^H^H^H^H^HLlamas.


Off to do some LAM debugging, and them more dissertation writing. Gotta get a skeleton together at the very least.

Got to Hell, Costas

The Moog rocks.

I found Arun's The Moog Cookbook in my laptop as a leftover from SC2000. So I had to rip it into MP3s and have been enjoying it all day on my surround sound speakers. It's no "Slut", but it's not bad.

And of course, I'm gonna have to buy the damned CD now. Damn morals... arrghh...

Had a dentist appointment this morning. He tells me that all four of my wisdom teeth are gonna have to come out, as well as one more that's as rotten as a skunk roadkill in Alabama in the middle of July. And baby. that's rotten.


Spent the majority of the rest of the day finishing typing up my notes on Condor/LAM. I'll send those in a separate journal entry.

I did spend a little time looking into anti-virus software for my church. What a scam. You basically have to subscribe to anti-virus software these days -- pay a yearly fee for the privilege of continuing to get anti-virus updates. On the one hand, I can see how the company is continuing to provide a service, and that service should be paid for. But on the other hand, it's more like a tax -- if you run in the Windoze or Mac world, you need to have anti-virus software. Hence, you will have to pay whatever they charge. And it's not like there's tons of competition in the anti-virus world: there's essentially two companies, and their prices after 2 years of subscriptions are essentially a wash.

Don't let me get started on a rant here, but have you noticed how the whole security industry is founded upon the mistrustful nature of humans? Remember ARPANET? (of course, few of us "young 'uns" actually remember the ARPANET, but we've all read about it) There was no security -- everyone just trusted each other. There were no passwords, no secured protocols, no encryption. It just worked.

Such a system is inconceivable these days -- releasing the 'net to the rest of the world has brought out the worst in humans. Online scams, cracking, stealing of information, viruses -- it's all now commonplace and people almost expect it. Or, even worse, they have the attitude, "I don't have any important information -- no one would bother to hack into my system..." But that's a whole different topic; I digress.

So to combat this, the whole virtual security industry sprang up pretty much overnight. It's probably a multi-billion dollar business. And it can't even offer any guarantees. And it's all because humans suck, morally speaking. Especially the high-school punks who break in just for the sport of it, and don't realize that each of their pranks actually cost thousands of dollars. These kids don't even have a realization that what they are doing is wrong. It doesn't matter how easy it is -- it's still wrong. Just because I know that the Smiths leave their front door unlocked during the day doesn't mean that I actually walk into their house and start poking around.

And viruses. What the hell is the point of that? They're not directed attacks. They are potentially wide-spread attacks with massive collateral damage to innocent people who did nothing wrong other than open an e-mail attachment. Why? What could the virus writer possibly derive from that? Some kind of sick, twisted joy at the fact that their virus brought down hundreds of mail servers (e.g., Melissa), or wiped out thousands of hard drives around the world? My dad's hardware store got hit with a virus recently. It instantly went out across his Windoze network and infested 3 workstations. Luckily, the virus was fairly benign -- it only whacked all his .jpg and .gif files. But it could have been much, much worse. And that computer network is his livelihood -- it all that data goes away, he's screwed. All because some high school kid thought it would be fun.

I'm grossly stereotyping here, sure. So sue me, but I'm mad.

This may seem to be a bit of a stretch, but bear with me... I talked to a guy in GE Medical Systems one day -- he was a manager in their produce development section. I told him that I was a computer scientist. He said he loved to get newly graduate comp sci majors working for him. He said that without fail, within the first month or so of all new comp sci hires, he would take them down to a hospital and show them real patients whose lives depended upon the software that they wrote. A bug, a simple seg fault, an overflowed buffer, a bad logic test, and someone will die.

So the things that we do on computers (as computer scientists) we tend to imagine all stays "in the computer", and it can be hard to realize that what we do actually affects real life. But it does. The medical systems example is rather extreme, but I even went off in a previous journal entry about how LAM/MPI is used in people's daily lives, and the things that LAM/MPI is used for are in even more people's daily lives. Indeed, my favorite example of one project that uses LAM/MPI is the US Naval Surface Warfare center (SWAPAR). They use some of the MPI-2 dynamic process management features of LAM/MPI to simulate large scale naval battles, and use that to help shape navy tactics and policy.

So what we do is real. It matters. And it matters when that punk releases a virus that goes off and destroys a few thousand random hard drives. It matters a lot to the people whose hard drives it crashed. And it offends me that others in my profession do these kinds of things.

But to end this very random and wandering diatribe on a positive note, the next time you're sitting in a movie theater watching some naval battle and some "military smart" friend tries to explain the actual tactics to you, just nod sagely, touch your nose, and say, "Yes, I know. I wrote the book that wrote the book. I am an uber-author. I am the alpha to this omega. I am a Llama."

Migrating racks of LAM

I've got a bunch of things that I want to put down about a possibility about making LAM/MPI be checkpoint/restartable. I'll break it into multiple parts:

  • Some LAM terminology
  • The "checkpointing sockets" problem
  • Possibilities
  • lamd problems
  • Possibilities with Condor
  • Checkpointing without Condor
  • Making this portable
  • Other problems


Some LAM terminology

Since others will be reading this text, I'm going to throw in some LAM definitions that I'll be re-using throughout the text below:

  • lamd: The lamd is the LAM daemon that is run on every host in a "normal" LAM run-time environment. It provides several services to running LAM/MPI jobs, such as process control, an out-of-band messaging channel, key=value global publishing, a scoping mechanism, etc.

  • C2C: An acronym for "client-to-client", meaning that MPI communication goes directly from the source process to the destination process. This is usually via TCP sockets, but can also be via shmem or GM (myrinet), or whatever other network connects to MPI ranks.

  • nsend() / nrecv(): the function calls in the LAM/MPI implementation that are used for the out-of-band messaging channel. That is, MPI ranks can use nsend() and nrecv() to send messages to each other. These messages go from the source rank to the local lamd, then to the remote lamd, and then to the destination rank. Hence, the out-of-band messaging channel goes through the lamd, not through C2C channels.

  • LAM universe: one instance of the LAM/MPI run-time environment. That is, the LAM run-time environment is typically instantiated with the lamboot command and a file specifying a list of hosts. The LAM universe then exists among that set of hosts.

Here's a few assumptions that we make because of the LAM/MPI environment:

  • LAM/MPI is completely user-level. All processes belong to the user -- nothing runs as root. That is, each user has their own set of lamd's and user MPI programs.

  • LAM/MPI currently cannot "overlap" universes except in batch systems. By "overlap", I mean have multiple, different LAM universes of the same user on the same machine. i.e., while a user can run as many MPI programs as they want in a single LAM/MPI universe (and even have them share the same machines safely without interfering with each other), you cannot have multiple LAM/MPI universes on the same machine without a special exception. It will be trivial to make LAM be able to overlap universes in a Condor environment, but I felt that I should mention this.


The "checkpointing sockets" problem

So the Condor project has a library that can checkpoint a running program and start it up again at a later point. It can even migrate it to a different machine. That is, it serializes the entire image of the process (stacks, heap, program, data, etc., etc.) and dumps it into a file (or socket, apparently). The astute reader will recognize that things like open files will present a problem in this scheme -- particularly in the case of migration. i.e., if a process has an open file and it migrates to a new node, what happens with read() and write() calls in the process to that open file on the new node?

The answer is that the library leaves a "proxy" agent (I think their terminology for it is a "shadow process") back on the original node. So read() and write() calls on the new node are proxied back to the original node where the real operation takes place, and the result is piped back to the new node where the program is running.

This is all fine and good for most system calls -- i.e., intercept all system calls, shuttle them back to the proxy agent, and then pipe the results back -- but it doesn't work for sockets. More to the point, it could work with sockets (at least I think it could), but then performance on the sockets will suck, and that is unfortunately important to us in MPI-land (i.e., latency would rise dramatically, and there could be potential bandwidth issues as well, depending on the proxy implementation). Hence, we have "the socket problem".

The solution is to close all sockets before allowing an MPI job to be checkpointed, and then re-establish them after the job has been restarted. Multiple problems arise from this, though. The MPI job will assumedly still know where its sibling ranks were located (and could therefore reestablish sockets to them), but zero or more ranks may have moved -- so trying to establish sockets to the old addresses may not work anymore. LAM needs to become aware of which ranks moved and where they moved to.

This is particularly problematic with LAM's shared memory/TCP scheme. i.e., if rank X migrates, it needs to re-figured out if rank Y is on the same machine or not. Specifically, it needs to re-initialize its entire connection table and either [re]connect its sockets, or [re]setup shared memory to communicate with Y. Even more generally than the TCP/shmem problem, this is definitely going to change the RPI somehow.

There are other issues as well -- how do we start up a LAM job under Condor? LAM currently uses a separate daemon process (the lamd) for a bunch of additional services, such as process control (fork/kill), an out-of-band message channel, and a global database for arbitrary key=value pairs (for MPI-2 MPI_PUBLISH). I guess it also functions as a scope mechanism as well -- providing a "universe" for a single user.


Possibilities

For efficiency reasons, we may only want to only checkpoint/migrate some ranks -- not all of them. Hence, there are two kinds of ranks: a rank that will get checkpointed (and possibly migrated), and a rank that will not. It seems to make sense to notify the entire parallel application (i.e., all ranks) when even one rank is checkpointed with intent to exit (e.g., because it will be migrated). So there's even two types of checkpoints: (a) one to just save the process's state (i.e., checkpoint the entire parallel application just for save/backup purposes), (b) and one to migrate one or more of the ranks to a different node.

We'll discuss (b) first (checkpointing for the purpose of migrating), because it lays the groundwork for (a).

Checkpointing for migration: the checkpointed rank

So it seems that LAM needs to take some actions before it allows itself to be checkpointed, and them immediately after it restores from a checkpoint. So if a LAM job can get some signal when it wants to be checkpointed (possibly via nrecv() from the local unix named socket, which we currently implement with SIGUSR2 so that the MPI process knows to go check the socket), a signal handler can be fired, read the message, realize that it wants to be checkpointed, flush and close down and invalidate all its communication channels (including the local unix socket to the lamd [or lamd-like underlying services] sockets, GM ports, shmem, etc.), and then checkpoint itself. This will require at least one new RPI function so that we can keep the RPI abstraction clean and apply this to all of our RPIs --
close/invalidate procs (with the assumption that no new communication will happen before we re-invoke _rpi_c2c_addprocs() to re-add all the communication channels again).

The Condor guys tell me that there is a checkpoint_and_exit() function that, when called, dumps the state of the program out to a file (or a socket), and then exits. Very handy! When the process is restored, it just returns from this function. Ultra cool!

So after returning from this function, an MPI rank must obtain the [potentially new] locations of its sibling ranks. I'm thinking that this will come from an nrecv() from the underlying infrastructure (i.e., Condor) -- it will get an array of information saying where everything is (how to do different RPI's? GM ports vs. TCP addresses/ports, for example? Might have to re-init those as well; re-look for open GM ports, etc.).

That is, the run-time system that potentially moved the ranks in the first place will know precisely where all the ranks are, so it can provide the location information to each rank. Once this information is provided to each rank, the ranks can effectively re-do some of the stuff that they did during startup (contact their local "lamd", establish C2C communications with the other ranks by calling _rpi_c2c_addprocs(), etc. I'll explain why "lamd" is in quotes later).

Specifically, the sequence of events on a single MPI rank will be something like the following:

  • Receive SIGUSR2.
  • nrecv() a message indicating three things:

    • One or more MPI ranks is going to migrate.
    • Whether this rank needs to checkpoint.
    • Whether this rank is going to migrate.

  • Flush all C2C and local "lamd" communications.
  • Close down all C2C connections.
  • Close down connection to the local "lamd".
  • If this rank is to checkpoint:

    • If this rank is to migrate, call checkpoint_and_exit(). The steps below will commence when the rank has been migrated and starts up again, and returns from checkpoint_and_exit().
    • If this rank is not going to migrate, call checkpoint().

  • Re-establish a local socket with the local "lamd".
  • nrecv() a message with new location information on all MPI ranks.
  • Repeatedly invoke _rpi_c2c_addprocs() (and whatever else is necessary, perhaps _cpi_c2c_init()?) to re-establish C2C communication channels.
  • Return from SIGUSR2 handler and continue processing in user code as if nothing had happened.

I think that's essentially it. There's a bunch of details in there, of course, particularly in the re-initializing C2C connections bit, but that should all be resolvable with some clear and potentially clever re-entrant C2C init code. Hence, when we go through this checkpoint/migrate phase and re-establish C2C communications, we essentially re-initialize the C2C subsystem -- do the exact same thing as when we do it the first time. That would probably be the cleanest approach.

Checkpointing for migration: the non-checkpointed ranks

Upon further thought, I guess there is little difference between checkpointed ranks and non-checkpointed ranks. There could be a slight optimization in that it is really only necessary to send new location information for ranks that have migrated -- the old location information is sufficient for any rank that has not migrated. However, it may make it easier in terms of less complexity to only have one code path -- just receive all new location information.

However, the question does arise -- when one MPI rank out of a parallel job is migrated, what happens to the other ranks while the rank is in process of moving? There are two approaches:

  • Make the other ranks freeze and wait for the migrating ranks to be restored and C2C communications have been re-established. This certainly makes implementation of the MPI side easier -- the non-migrating ranks can just sit blocking on the nrecv() waiting for new location information. The underlying "lamd" can just delay sending the new location information until the migrating ranks have been restored.

  • Allow the other ranks to continue in the user program while the MPI rank(s) in question migrate. They would have to freeze at the first blocking communication involving the rank(s) that are being migrated. Any non-blocking communication can continue (e.g., Isend, Send_init, etc.), but would have to be "suspended", indicating that they just get put in a queue, and will only be attempted when the destination rank(s) are actually restored from migration and C2C communication has been restored to them.

    This will add complexity to the MPI implementation, and it slightly changes the scheme presented above -- the non-migrating ranks will have to delay the second part of the scheme (i.e., starting with the nrecv() to get the new location information) until they get a second signal indicating that one or more of the migrating ranks are now ready.

    This could get arbitrarily complicated -- take the case where N ranks migrate. What if they get restored at different times? i.e., if one rank gets restored much earlier than the rest -- does the underlying "lamd" signal the other ranks in the job with just the new location information for that one rank? Or does it wait for all N ranks to be restored before signaling everyone? The coarse-grain approach is clearly easier; the question is what actually happens most of the time: does Condor (and others) piecemeal restore migrated processes, or all at once?

So this raises some interesting questions:

  • With the "easy" model of making all MPI ranks wait until all migrated processes are restored, is there really much of a difference in migrating one rank versus migrating all ranks? Since they all block waiting for the one migrated node to be restored, particularly if that one rank can't be restored immediately. For example, the MPI rank that was migrated was running on an idle workstation that suddenly became non-idle, forcing the MPI rank to migrate. But say that there are no more idle workstations available, so this MPI rank must wait in limbo for a while for another machine to become idle. But during this time, the entire rest of the MPI application must also wait. What happens to the accounting records during this time? Are Condor users "charged" with the time that the rest of their MPI ranks are blocking?

  • There is also the argument that most MPI programs tend to operate at least in some kind of lock-step. i.e., the MPI ranks are at least loosely synchronized (e.g., per iteration). So even if the non-migrating ranks are allowed to continue, they'll eventually block anyway because they'll try to communicate with a rank that is in process of migrating (or, by the domino effect, try to communicate with a rank who is blocking trying to communicate with a rank that is in progress of migrating, etc.), which could potentially (and usually!) eventually cause the whole MPI process to block anyway. More to the point: is there anything gained by allowing non-migrating MPI ranks to continue while one or more MPI ranks are in process of migrating? My gut feeling says no.

Hence, it may make sense to really only migrate the entire MPI process at once, or only migrate ranks when it is known that they can be placed immediately. This may not be possible, so it may be easiest to just make all MPI ranks block until migrated ranks are restored and C2C communication is restored. The accounting issue still needs to be addressed, though.

However, I have very little experience in the dynamic process migration area -- I'm curious to what the Condor folks have to say about these ideas and questions.


Checkpointing for saving state (no migration)

For checkpoints that do not involve migration -- i.e., checkpointing just for the purpose of saving state -- it may or may not be necessary to close all communications channels. On the one hand, no rank is migrating, so it would seem silly to close and re-establish communications with the exact same location information. On the other hand, if we want to re-start the checkpointed process later, the re-started process will return from the checkpoint() (notice -- not checkpoint_and_exit()) function. If we re-start the process on an entirely different set of nodes (e.g., a PBS or Condor job is checkpointed and then later fails because someone powers off a node, so we restart the job in a later PBS/Condor job -- the ranks will be on entirely different machines and have a different topology), we will need to re-learn the location knowledge and re-establish C2C channels.

Using this argument, it's probably better to treat a backup/save checkpoint (even with no migration involved) as a checkpoint with all ranks migrating (per the procedures shown in the previous section), so that all ranks close all communications channels and then receive new location information from the underlying system (lamd/Condor) and then re-establish all communication channels.

This would allow the most flexibility for re-starting a job. That is, even if the job does get restarted from a set of migration files, it doesn't matter if it is on the same set of nodes or not -- it will re-establish all C2C communication channels and continue from where it left off.


lamd problems

The lamd is really helpful in standalone environments. But does it really make sense in a Condor (or other run-time system)? We mainly use the lamd for the following kinds of services:

  • Process control (startup, shutdown, abort)
  • Out-of-band messaging
  • key=value publishing
  • File transfer (mainly for non-uniform filesystems)
  • Scoping mechanism

Normally, each MPI rank is associated with a single lamd that is located on the same machine. They communicate through a named unix pipe. When the lamd sends a message to an MPI rank, it pushes a message down the socket and then tweaks the process with SIGUSR2.

Note that there may be multiple MPI ranks per lamd --
it is common to run multiple MPI ranks on a single machine. In this case, they all share a common lamd (although the MPI ranks don't know or care that they are sharing a lamd).

It should also be noted that the out-of-band messaging can also be the primary message channel for an MPI job. That is, C2C communications aren't necessarily setup. It's a run-time flag to mpirun -- the user can specify to use the lamd for all communication instead of C2C. Although this imposes extra hops on the all messages (even MPI_Send / MPI_Recv messages), it can provide true asynchroncity (sp?) for non-blocking messages. That is, LAM/MPI is single threaded, so it can only make progress on messages while it is inside of LAM/MPI function calls. In the "lamd" mode, once a message is given to the lamd, the lamd is a separate process, so it can make progress on the message independently of the main thread of control in the user program. While this may seem counterintuitive and incur too much extra overhead, several LAM users who rely on non-blocking message passing have told us that they can get significant speedup using this mode as opposed to C2C.

So LAM's normal model is that each MPI rank has a single lamd that it is associated with. This may be problematic with Condor (or any other run-time system) for multiple reasons:

  • If the MPI rank ever migrates off a given machine, the lamd will also have to be migrated with it. Hence, both processes will need to be treated as a single process by Condor, which I assume would create some special exceptions in the Condor code. This is not attractive.

  • Even worse, if multiple MPI ranks are sharing a single lamd, if one of those MPI ranks migrates and the others do not, what happens to the lamd? It would seem that we need to create a new one on the machine where the MPI rank migrates to, and then have the network of lamd's reorient themselves to include the new lamd. Or, if the MPI rank migrates to a node that already has a lamd, it can just join that lamd, and no new lamd is necessary. But this would seem quite complex to implement!

Hence, it would seem desirable to be able to ditch the lamd when running in some other run-time environment (such as Condor).


Possibilities with Condor

Our short conversation with the Condor folks is that a LAM/MPI program will need to interact with their "starter" somehow, or have a custom LAM/MPI starter written that knows things about MPI programs.

My first impression (and admittedly, I don't know much about how Condor works) is that the least-cost solution here would be to have a custom LAM/MPI "starter" that can mimic the lamd services. It would seem that Condor must already provide most of what we need; the starter can simply provide a translation between what LAM/MPI expects and the native Condor underlying services. Hence, the majority of LAM/MPI wouldn't need to change -- it just opens up a local unix socket to what it thinks is the lamd, but in reality it's a Condor "starter" (or whatever).

More specifically, some of LAM's calls such as nsend(), nrecv(), rploadgo(), rpdoom(), etc., can probably translate to Condor semantics without too much trouble. So if Condor can open a socket and effectively have an nrecv() implemented locally, it can receive local packets from MPI ranks, and then process and interpret them.

Admittedly, this would put more of a burden on the Condor folks, but I think we could help out a bit as well. :-)


Checkpointing without Condor

In a non-Condor environment, it would still be highly desirable to be able to checkpoint. Can we do this without the rest of Condor? I would assume that we could make it so. I think that the key for doing this outside of Condor would be a new pseudo-daemon in the lamd to handle these kinds of things -- to furnish the new location data, for example. We'll probably also need a command like rempirun to restart a checkpointed job. Possible scenarios include:

  • A separate LAM executable (mpicheckpoint) that can checkpoint a running MPI program to a set of rank files. The checkpointing will follow the same scheme as outlined above. A run-time flag can specify whether the job should stop or continue after the checkpoint. It might also be desirable to provide a LAM-specific API call for this as well (MPIL_Checkpoint(char* directory, int stop_flag) or something). Note: we're not talking about migrating here; see below.

  • A separate LAM executable (rempirun) can take a set of rank files from mpicheckpoint and restart the job on an arbitrary set of nodes. Note that this would not have to happen in the same LAM universe -- it could have much later, for example, after the LAM universe that the original job was running in has been destroyed and a new one takes its place. Some extra condor-checkpoint-library bootstrapping is probably necessary to restart the job, but after that, it just uses the lamd to get the new location data, etc., just like it would in a Condor environment.

  • A separate LAM executable (lammoverank) can migrate one or more ranks to different nodes within the current LAM universe. This can work exactly the same way as it does in Condor. As mentioned above, this will require an extra pseudo-daemon in the lamd to know where ranks are moving and provide new location data to all the ranks.


Making this portable

There is desire to run LAM/MPI in other run-time environments (as alluded to in comments above) in addition to Condor. Scyld is an obvious target, since they have their own set of process control stuff (bproc) and whatnot. Scyld might be a bit more challenging because they seem to only support process control, not the other services that we need. Someone (Jeremiah?) suggested that we might be able to get away with one lamd somewhere in the system; I'm not quite sure that this would work, but it will definitely take a) further thought on the issue, and b) investigation of bproc and the rest of the Scyld infrastructure.

PBS is another obvious target (as well as any other batch schedulers). It would be nice to ditch the lamd in a batch environment, and rely on the batch system's underlying services for process control (the benefits are obvious, not the least of which is job accounting and guaranteed cleanup, a notorious problem for non-native support in batch schedulers), but the out-of-band messaging and global publishing still need to happen as well. PBS's TM can do the process control and can do the global publishing too (IIRC), but I don't think it provides any kind of out-of-band messaging. That will require more thought... Our initial ideas about PBS/TM (from a while ago) didn't include ditching the lamd, but perhaps this is a bit more natural extension of making this whole concept portable (i.e., replacing the lamd with underlying services, when available).

Or will a "one lamd" idea work here, too? Not sure how such an idea will work, but it's worth thinking about.

The real trick, however, will be to do this in a run-time-decidable way. That is, it would be nice, at run time to decide which underlying service to use -- native lamd, Condor, PBS/TM, Scyld, etc. That is, a user can take the same executable (assuming that their LAM was compiled for support for all of them) between all systems without having to recompile/relink. That would be nice, but not an absolutely necessary goal.

Upon a moment's reflection, from the proposed schemes above, the difference between native lamd and Condor would not be known to the MPI process -- if Condor truly emulates the lamd, there's no need to know. Whether or not the LAM has been compiled with checkpoint/migrate support is an entirely different issue (because I assume we'll need to get some Condor headers/libraries and some #if code for the checkpoint/migrate LAM code).

In order to make this workable for PBS/TM and/or Scyld (i.e., to keep the abstraction level clean), we'll have to implement lamd services in the lower levels of PBS/TM and Scyld as well. Hmm. I guess we'll have to cross the line into the root-level services earlier than we thought!

For PBS/TM, all the TM stuff is in one file, so extending that should be easy. But to do true messaging, it may take a bit more --
we may have to do some actual hacking in the MOM itself. It could be as simple as adapting the lamd's to fit in the MOM. We'll have to see. As for Scyld, I have no idea. :-)


Other problems

  • Voluntary vs. involuntary checkpointing. Is there much of an issue here? Probably not -- I don't see why involuntary checkpointing can't work just like voluntary checkpointing.

  • How about open files and whatnot? Particularly after a migration? Condor can proxy this stuff back to the original node, but does this make sense in a batch situation? What if we don't own those nodes anymore? This might be ok for Condor, but about about PBS / Scyld? It would seem bad for PBS. :-(

  • Are we trying to solve the "node goes down" problem? i.e., involuntary checkpoint at timed intervals (to files, not sockets...?), and if a node crashes at some point, we can rempirun the set of checkpoint files (which would seem highly desirable). But what about open files, etc.? If the node crashes, there's no Condor proxy to take the request back to on the original node ('cause it's down). So does checkpointing with the Condor library solve the "node goes down" problem? Or perhaps only in a limited scope (i.e., your open files won't be preserved)...? Granted, anything outside of the MPI API is outside the scope of what we need to worry about, but this does seem to be a "real world" concern that would be good to take care of. Even if it just means setting open file descriptors to -1 or NULL upon restoration of the process so that the job can know that the files are closed or something.

  • So what happens to lamboot and lamhalt under Condor? Does they effectively become noops (we can't ditch them, because users will still invoke them)? And then mpirun talks to various Condor services (for example) to do the things that the lamd would have done? One of the current functions of mpirun is to serve as a rendezvous point for the ranks so that they can all become aware of each other. Does this still need to be? It would seem that it would need to be changed somehow -- since the migration problem changes all the location information anyway, Condor itself must provide a way to get this information, potentially making mpirun's rendezvous point irrelevant.

  • Does this (running under Condor, PBS/TM, or Scyld) make sense with the MPI_COMM_CONNECT and MPI_COMM_ACCEPT models? i.e., how does a Condor job get more nodes? Or how do multiple Condor jobs join together? In vanilla LAM, only jobs in a single universe can join together. Will this be true in Condor (etc.)? More to the point:

    • What would it mean to allow multiple LAM universes together? What about the obvious security concerns with this?

    • How will a universe be defined in Condor? Will you have to (for example) ask for M nodes and start M different jobs and have them CONNECT / ACCEPT to each other?

    • If this is the case (still only connect within a single universe), is CONNECT / ACCEPT useful within a Condor context?

    • The same question applies to SPAWN -- does the user have to request a maximum number of nodes ahead of time? Or, when SPAWN is invoked, does this have to allocate nodes from Condor dynamically and then spawn on them? This scheme would seem attractive, but it may cause the MPI application to hang while waiting for nodes to become available?

    • In a dynamic environment like Condor, is dynamic processing useful at all, given that a SPAWN may have to block waiting for the underlying system to make nodes available? Does the whole MPI application (or, at least the ranks who invoke SPAWN) have to block waiting for this to happen? (no one has answered this yet -- it's not even defined in the MPI standard)


Summary

So these are my initial thoughts. In spite of all the unanswered questions listed above, I believe that this can work. Some trips Wisconsin<-->South Bend and some teleconferencing and a ton of e-mail will likely be necessary. But this is ultra cool stuff, and will be immediately useful to lots of people in the real world. Plus, we'll get lots of papers out of it, become famous, and one or two people might degrees out of it. :-)

November 19, 2000

17 days and a wakeup

We effectively stomped on Rutgers yesterday. Woo hoo!!

We looked a bit sloppy at times; their quarterback was quite good, actually, although he was a bit too hasty and kept taking high-risk passes. So we kept intercepting them. :-) Aside from a few nervous points, it was a fun game to watch. Go Irish!


Spend some of yesterday playing with modules in LSC's AFS space. I preliminarily made up modules for PBS, LAM, MPICH, Workshop, and Forte6. We will probably make up modules for all the GNU stuff (although they'll be broken up into several modules -- the compilers and auto* and libtool, Gnome, and the rest of the GNU stuff, or somesuchlikethat). Lummy wants to go a bit hog wild and have our own copies of latex, X, etc. We'll see -- we've been trying to have a higher bandwidth discussion about this for a few days and keep missing each other.

This all precipitated because I'm genuinely worried about having all the GNU file utilities first in our path rather than the Solaris ones. If I want to work in Linux, I'll work in Linux. If I want to work in Solaris, I want to work in Solaris -- not Linux. I've been burned a couple of times by having the GNU stuff first in my path (ar, ranlib, make, etc.) rather than the Solaris stuff, and I don't want that to be. It just scares me, 'cause we'll end up coding for GNU-specificisms without even knowing it. And that will suck (that's one of my pet peeves: people who code for GNU-specific extensions and say, "just use gcc" everywhere. They don't understand what they are saying. Although I have personally discussed this with many people, I'll put it here in my journal to get it on the record: take the Alpha processor, for example. When you switch from Tru64 to Linux, you lose at least 10% of the performance [there are hard numbers to prove this]. And when you switch from custom compilers to gcc you lose at least another 10% of performance [I'm speaking of high-performance applications, of course]. gcc just doesn't have the punch on all platforms. Portability is only half the story).

Anyhoo, we're going to split it up somehow. The exact mechanism remains to be seen. Modules are pretty nice, actually, and surprisingly easy to setup and maintain. Although we've been meaning to do this for quite a long time, we really should have done this a while ago.


Saw the movie "Bounce" with Ben Affleck and Gweneth Paltrow (sp?) last night with Janna and Tracy. Yes, it was a concession to the ladies (who wanted to see it). I'll give it a sympathy, but that doesn't really rate the quality of the movie because it's just not my kind of movie. So if you want an honest rating, go see it yourself.


Today will be spent putting together a real skeleton for my dissertation. I've started this a few times, but really need to carry through and actually put all the .tex into one place and start shaping it up to be a real dissertation.

Off to write... whoo hoo!

November 21, 2000

Who needs green beans?

Dentistry, while painful, is interesting.

Here's some interesting factoids that I learned this morning while having a cavity filled:

  • Dentists' drill tips are made of a diamond/metal carbide. They spin at many thousands of RPMs, and when combined with a little spay of water, vaporize whatever they come into contact with.

  • The jaw nerves are split in half. So when they give you novacain, it only numbs up half of your jaw/face. Right now, the right half of my chin all the way up to (and including!) my right ear are numb.

  • Modern cavity fills are multiple layered: I forget the name of the first one, then a "primer" layer, and then a bonding agent. The bonding agent (IIRC) is light activated -- so they have a "light gun" that shines a many-watt highly-intense light on the tooth to make the bonding agent cure. There's an orange shield around the nozzle so that the dentist can watch/direct the light without being blinded.

  • It's difficult to talk when half of your tougne is numb.

  • We have nerves in our teeth only for the sake of knowing when something is wrong. i.e., the nerves in our teeth on serve as warning indicators. Sharks do not have nerves in their teeth. Godgineer must have figured that since sharks lose teeth all the time (and promptly grow new ones to replace to lost ones), it would be less efficient to put the warning indicators in there. Since we humans only get two sets of teeth, having the "failure alert system" was a good engineering decision.

  • It feels really, really weird to drink something and only feel it on half of your tounge.

  • Dentist drills can go at different speeds, not only for the different types of work that they do, but also because it is possible to resonate within the jaw and within specific teeth. Hence, if patient starts resonating with a given drill, the dentist can switch to a drill with a different set of harmonics. (No I'm not making this up; it happened to me this morning!)


My sister is hosting the big Squyres Clan Thanksgiving Dinner this year; just about everyone in the family will be there. She came up with the bright idea early yesterday afternoon to rent a PlayStation "for the boys", and called my brother-in-law at work to go rent one (apparently his work is literally right across the street from Blockbuster). So he popped across the street and found a PS. But wait.. it wasn't a PS... it was a PlayStation2!! They apparently only have one, and someone had returned it literally 5 minutes previously. So Rob rented it along with several games and took it home to hook it up.

He didn't go back to work.

It should be much fun!


I've been playing with modules in the LSC AFS space. I have them pretty much stable and working now. There's two distinct sets of modules: ones that are cross-platform (e.g., LAM, MPICH), and several more that are platform-specific (e.g., we only have SSL/pine compiled for sparc-sun-solaris2.6 and sparc-sun-solaris2.7). Loading the lsc module loads a default set for a given architecture -- the default cross-platform ones and then a platform-specific lsc module that loads any platform-specific modules that we have for that platform.

All in all, it's pretty neat stuff. Kinda annoying, though, since aliases aren't inherited by the shell. So you have to go through some extra hoops and hurdles to make that work right.

It's also kind annoying that the IRIX machines on campus have their own modules, but use a much older version of the modules package. Hence, in order to interoperate -- and yes, this is counter-intuitive -- we have to use the older modules version, not the newer version. Go figure (using the newer module version with the older modules causes seg faults, but using the older module version with the newer modules works fine). So that causes some extra hoops and hurdles as well. Ugh. It would be nice if there was one uniform version of module stuff across all campus.

But they certainly do make it easy to switch between versions of things, and make maintaining packages easier because each package has its own discrete module.

December 1, 2000

Dave, I'm not really in the box

What, you expected some kind of regular entries? Pshaw.

As usual, my travel has interrupted my regular flow of journal entries. Here's on that summarizes the last week or so...


Flew back to Philly to my 'rents place with Tracy for Thanksgiving. We flew via Cleveland which was apparently getting snowed in when we arrived. So we diverted to Cincinatti and came back to Cleveland before being able to land. Luckily, the good folks at Continental were able to get us on another flight to Philly that night, so all was well.

Had a big clan gathering at my sister's in Allentown the next day, which was pretty cool. Everyone was there with the exception of my cousin Maggie. The PlayStation 2 was killer, too. I whomped my younger cousins at Tekken Tag Team, too, which was very cool, 'cause they have traditionally been much better than me at video games (go figure). "Who's your daddy?!?!", "Hey Chris, let me show you how to DIE", and "You're so weak, your momma tried to give you up for adoption and the Lemming family wouldn't even take you" were all popular phrases during this session.

We watched the traditional annual showing of "Airplane"; a Squyres classic. Brilliant movie.

  • "Give me Ham on five, and hold the Mayo."
  • "No thanks, we gave at the office."
  • "The cockpit? What is it?"
  • "That's when my drinking problem started."
  • "It's a damn good thing he doesn't know how much I hate his guts."
  • "Looks like I picked the wrong week to quit sniffing glue."
  • "I've got to concentrate... concentrate... concentrate..."


Spent part of Friday working on mom's computer at home hooking up DSL to it. Part of the problem is that mom's Windoze installation is really broken somehow. It takes over 6 minutes to boot (i.e., to get to the Windoze login popup). It seems like it's timing out while looking for something during bootup, but I never figured out what it was. The first time I installed the DSL software, it was really flakey. It's with Verizon DSL, and they use this weird (IMHO) PPP-over-ethernet stuff. So you still have to "dial up" to get connected to DSL. And the IP address comes over that, too, so it's not regular DHCP. This kinda killed my plan to hook up my Linux laptop to their DSL and do things like check mail, etc.

There might well be some PPPOE Linux software out there; I haven't had a chance to check yet.

Anyhoo, I got it working more-or-less properly, but it didn't help much that the hard drive on that machine is failing. Every time I ran scan disk, it would find more bad clusters. Not good. The PPPOE installation finally go so flakey that I removed mom's pre-existing Netscape and the whole PPPOE installation and started from scratch (gotta love the non-deterministicness of Windoze!), and that seemed to make it much happier. I installed Zone Alarms firewall, too, which was kinda neat. Since they don't have a fixed IP, and aren't "connected all the time", the chance of an attack are smaller, but are still there, so I installed it. It's not perfect, but it's not a bad firewall.

I was going to spend some time on Saturday trying to figure out why it takes so long to boot that machine, but I caught some weird 18 hour flu that's going around the north east right now, and it killed me for the whole day. Tracy and I were supposed to go out to a nice dinner that night, so that didn't happen, either. Bonk. But the Irish had a convincing win over USC to round out our season. So it's probably a pretty good chance that we'll go to a bowl. Woo hoo! I've been hearing Fiesta, but I haven't been following it too closely.

Dad and I used the small business pricing from Dell to order a new Windoze computer for Tracy (she currently has a P1-133 with 24MB of RAM, which is painfully slow) under the "Wayne True Value" auspices (since my dad owns a small business, they apparently don't check, but it is legal since my dad bought it, and I do lots of consulting for him). It's actually supposed to come today (1 Dec), according to UPS.

I also got plane tix back to Philly in about 2 weeks to install DSL at his store. He's got a LAN of machines that need to be adjusted and whatnot such that DSL will be safe to install (need to change all the IPs, harden up the unix server a bit, etc.). I won't be able to use linux as a firewall (my initial plan) because of PPPOE issue, so I'll have to use Windoze 98's internet connection software (blech; although I will be looking for some Linux PPPOE software so that I don't have to do this).

Tracy and I flew back on Sunday morning without any major incidents.


I drove up to ND on Monday morning to get there in time for Kevin Barker's MS defense. It was all about the percolation model. Pretty neat stuff. It still performs poorly right now, but it's still in the early stages of development. He passed! Whoo hoo!!

Went to dinner with Kevin, his parents, and Shannon. It was good to see Shannon again; she's funny. She's also starting a PhD program (this Spring, IIRC) in Ohio; rock on! Dinner was good (at Basil's). Then I went back and hung out with Suzanne and Ed and watched a few episodes of Level 9. Not a bad show; it's got a good mix of techno geek stuff and action.

Kevin turned in his thesis the next day, and ended up staying that night as well, so a bunch of us took him out to the Mishawaka Brew Co for a few beers: Mike N, Dog, Jeremy, Ron, Shannon. Great conversation all around, and lots of laughs. Good to see/hang out with Kevin again. Perhaps he'll come back to ND; that would rock.


@#%#@%#@$ I had forgotten to fill out my reimbursement form for SC2000, so I had to wait until the CSE offices opened in the morning before I could leave to drive back home. I had to drive straight to my church where I'm doing some volunteer consulting with them for their various computer things (as I think I've mentioned before, they have a LAN with about a dozen windoze machines on it). We talked some more about DSL (we're putting it up to the budgeting committee in about 2 weeks), ordered a site license for Norton Anti-Virus, and discussed a few other random things. The anti-virus media should arrive in a few days; we planned on me coming back next week to install it on all the machines.

Planning for DSL takes a surprising amount of details:

  • Moving their web site; it's on some local Louisville hosting service now, but DSL comes with 20 free MB of web hosting space
  • Moving their DNS name for the same reason
  • Re-training their web masters to use the new location, not the old location (should be easy, but...)

  • Change the IP addresses that each machine has; I think they're random right now. I'll have to change them to be 192.168.x.y or 10.x.y.z or whatever.
  • They use AOL for all their mail now; this DSL service comes with 20 free mailboxes.
  • Changing all their e-mail addresses to be @churchofepiphany.com (and decide what the format of the e-mail IDs will be; an internal policy decision for them).
  • Change everyone over from AOL browser/e-mail to Netscape/IE (haven't decided yet) and some mail client, likely Netscape/Outlook/Outlook Express (haven't decided yet). This will involve re-training everyone.
  • Ensuring that everyone's address book and web bookmarks can be snarfed from the AOL software to the new software.
  • Setup/ensure that the dialup works for the one workstation that they have off site (this DSL provides a free dialup for remote users).

  • Shut down the AOL accounts fairly quickly after this all happens to prevent the two-email-address syndrome.
  • Shut down the Juno account that the off-site user is currently using for the same reason.

As a good engineer, I have to document everything that I do for the above. Most importantly, however, what needs to be documented is the firewall configuration. This DSL service comes with a Netopia router which can also act as a DHCP server and firewall. It's supposed to be easy to configure, but we'll see. This needs to be documented because I won't be there forever.

Some other projects that they may wish to investigate after the DSL stuff gets all happily installed (probably mid-late January):

  • Shared fax on the LAN (should be easy, I think).
  • Group scheduling of resources (conference rooms, the community center, etc.).
  • Random training classes, perhaps even some "intro" and "advanced" kinds of classes.


Had all my four of wisdom teeth out yesterday, as well as one more molar that wanted (he wanted to keep hanging out with the wisdom teeth, apparently). I was knocked out for the procedure. I think I pseudo-surfaced in the middle of it, 'cause I felt some rather strong forces (not pain, just pulling, etc.) on the right side of my jaw. It only took about an hour, actually. Apparently, my upper right wisdom tooth gave them a few problems (nothing major), but everything else went fine.

My jaw was fairly sore all yesterday; they gave me some mild pain killers and some antibiotics so that nothing gets infected. I go back next week to have my stitches removed. All in all, it wasn't nearly as eventful as I thought it would be (I guess I expected much more pain). My jaw is still fairly sore, and I'm not back on solids yet (checking is somewhat painful), but that's supposed to go away in a few days.

It's funky, though -- I can feel the end of the line of teeth with my tounge where that last non-wisdom molar used to be (on the upper left). So I can feel the end of my tooth line, which I have never been able to do before. Funky.


I'll be heading back to ND next week to visit with Bemen Dawes (sp?) from the Boost group. He's coming to visit with Jeremy, Rich, and Andy. I'll tag along for usability and other kinds of user-concerns, but probably not too much in the design and other stuff.

That's about it for now. Gotta get back to work...

December 2, 2000

Your *what* hurts?

I have to admit, I've never seen it actively snowing in Louisville.

It's not really accumulating, but it is kinda nice (I'm a cold weather person).

My jaw is doing ok; a bit sore, but manageable. I actually managed to have a few slices of pizza last night.

Tracy's new Dell 800Mhz came yesterday, and I spent a good amount of time setting it up. Copied a lot of stuff from her old Windoze machine to this one (it was a pain in the butt to export/import her addressbook and message folders from Outlook 98 to Outlook Express 5.5.1 [Outlook 98 isn't available anymore, and Outlook 97, which I have on CD, doesn't do IMAP]). Finally got everything over, though.

One annoyance, though -- Outlook Express has some nice rule-filtering capabilities such as "take messages with such-and-such subject and automatically put them in folder foo". Very handy. But it doesn't work with IMAP inboxes! Why not?!

Got the latest distributed.net client and installed it on there, too. But it only seems to want to do RC64 --
it simply won't do OGR. Weird!

The new version of mojonation sucks. It keeps coming up with an error than causes it to lockup. That is, it only runs for about 5-10 minutes and then locks up (something to do with bad XML parsing). Woof. I also wonder why they don't use their mojonation-announce list to announce new versions; it's advertised on their site, etc., but I never get announcements from it. I only get new versions when I happen to notice them. Weird.

Just a few quickies today. Gonna spend some time on the ogg vorbis encoder and LAM today (-pty fix in Linux and still trying to get some reasonable fault tolerance issues worked out).

December 6, 2000

Kudos to you, sir. And kudos again!

These are interesting times that we live in.

A series of random things have been occurring. Hence, this will be a random journal entry, written while I eat my lunch.


I have discovered that the HTML element <HR> is not always centered by default, particularly when it is less than 100%
of the width of the browser. I don't know if this is specified in the HTML spec or not, but I have found that KDE's Konquerer browser (which actually isn't a bad browser, surprisingly enough!) does not automatically center the following:

<HR WIDTH=50%>

which means that all thousands of screaming jeffjournal fans out there that are viewing my journal archives in Konquerer are wondering why there are half-line separators on the left in their browsers. Oops.

I have now mended my ways, and write dramatic half-line separators like this:

<CENTER><HR WIDTH=50%></CENTER>

All is right within the world.


Along the same lines, it's quite tiresome to type out "<CENTER><HR WIDTH=50%></CENTER>" (particularly when you have to escape it to write it in example form so that you can read the "raw" HTML in HTML). So I think I need to add some special "escapes" to the jcc such that things like this are automatically done for me.

I'm thinking of escapes for:

  • Dramatic half-line separators

  • URLs will automatically be linked

  • In a wikki/doctext-kind-of-way, make lists easier (where "easier" == "use some abbreviated syntax that jjc will expand into the correct HTML for <UL> / <LI> / </UL>, etc.")

  • In a wikki/doctext-kind-of-way, make linking be easier (similar definition of "easier" as above)

  • In a wikki-kind-of-way, make <code> and <strong> and <em> be easier, 'cause I use them all the time.


Also, Lummy has managed to get SourceForge running on our web server. He claims that some of the features in their diary stuff are superior to jjc (not surprising). However, I might have to steal some of them and put them in jjc, since I've kinda grown attached to it.

Stealing such features, however, has a fairly low priority.


I'm actively working on a parallel ogg encoder (stole the oggenc code, adding a whole new parallel personality to it, and renamed it to poggenc -- oggenc effectively has a brother now). I re-read my white paper on generalized manager/worker using both threads and MPI (and wow, it was long!) to remember all the thoughts that I had about that. Found a few minor errors, and was annoyed to discover that my formulae at the end were just about entirely wrong. Math sucks.

So I've coded up a bunch of the framework so far, and have started classes for the input, worker, and output threads. I've added the necessary #define's for thread safety within Vorbis (which is isn't yet, in the way that I need for this -- it can't handle multiple threads simultaneously encoding on the same stream), and #define's for MPI. Seems to be going well.

Happily hacking
Abstract turns into concrete
Parallel vorbis

Hacking hacking hacking...


I was manually archiving the web logs on www.lsc.nd.edu yesterday (really gotta finish automating that process someday...), and I did the normal "bzip2 combined_log". I did this on a Hydra node (400Mhz UltraSPARC II). After a good many minutes, it didn't show any sign of finishing (the logfile was approximately 189MB).

I had anticipated it to take a while, but it was actually taking longer than I expected. The idea popped into my head: "I wonder how much faster the new Sun UltraSPARC III would be able to do this!" We actually have a SunBlade (750MHZ ULTRASPARC III) on loan from Sun (shhh!!!), so I copied the log file to its local disk and started a bzip2 of it (Solaris 8 seems to ship with bzip2 -- rock on).

After 50 minutes, the SunBlade finished. The Hydra node looked like it was about 1/3 of the way finished. Ouch. I then though, "uh oh -- I don't know if these two versions of bzip2 are the same; am I comparing apples and apples?" So I checked. Oops -- the Hydra was using bzip2 0.9.0b and Solaris 8 (the SunBlade) had bzip2 0.9.0c -- both of which dated back to 1998.

So I went out and found that the current version of bzip2 is 1.0.1. I downloaded it to both machines and compiled it with "-fast -xarch=native -xtarget=native", and re-ran the test (both from the local hard drive, of course).

The SunBlade finished in 7 minutes flat. The Hydra node finished in about 15:30. Wow.

Morals of the story:

  • The bzip2 that we have (had) on AFS sucked. I recompiled the new one with optimization and put it out on AFS. I got the OIT to update theirs (Solaris 7 tree), too.

  • The bzip2 that ships with Solaris 8 sucks.
  • The SunBlade was slightly more than twice as fast as our UltraSPARC II. But that only naturally follows, 'cause its clock speed was almost twice that of the Hydra node.

Just a few interesting data points, nothing more.


In Lummy's never-ending quest for good web collaborware, I found a bug-tracking system called RT, that seems to be a web-ified version of ANL's req system.

It seems to be pretty nice -- it has a bunch of features without being overly complicated (ever had a look at bugzilla? I can't even understand how to use that thing!). It doesn't do everything, but it seems to do most of what we need. Most importantly, IMHO, it has an e-mail interface (something that Jitterbug lacks), so the admins don't have to go to the web page to do quick-n-dirty bug tracking things.

I set it up on my router and let some of the guys in the lab play with it. General consensus was that it wasn't bad. I tried to get the CVS copy of RT going (has a bunch more features than the current stable release), but it seems to be not-quite-ready for prime time yet. We'll have to wait for that, I guess. :-)

Lummy said he might try and tie RT into the SourceForge that is running on lsc.nd.edu, but it's more likely that he'll just make some kind of primitive e-mail interface to the bug tracking system that is already in SF. Or, it's more likely that we'll all just bitch about it and nothing will get done. :-)


I noticed something annoying about the CVS version of Vorbis. Some background...

Vorbis is the music format. Ogg is the file format. That is, you pack vorbis data into .ogg files. There are separate libraries to do each. Additionally, there's a third library do write output to sound devices called ao. So to compile oggenc (the ogg/vorbis encoder), you need to configure it with:

--with-ao-prefix=DIR --with-ogg-prefix=DIR
--with-vorbis-prefix=DIR

This seemed pretty silly to me, especially since you typically install all three libraries and oggenc into the same place. So I hacked up their .m4 files to check the $prefix if the corresponding --with-* option was not specified, and submitted it to the voribs-dev mailing list. We'll see if the patches are accepted (they were really only a few lines of shell script; not rocket science).

However, the vorbis-dev mailing list seems to currently be down. I see from the web archives that a few posts (including mine) have been sent since last Friday, but I haven't received any of them. I sent a few queries but have heard nothing back yet. Hmm.


Speaking of oggenc, I have now done much coding of a parallel version (the part above about working on poggenc was written yesterday; I just haven't submitted this journal entry yet :-). I have been following the design laid out in my white paper about mixing threads and MPI for multi-level parallelism, and it seems to be going well. Most of the infrastructure is done, and I'm just starting to code up the parallel aspects (shipping the audio data to remote nodes, shipping the ogg data back, etc.).

When I have some semblance of a working copy, I'll probably ping Dan at Scyld again.


I obviously didn't go to ND as planned this week to meet with Beman. I hear his trip was a great success and many great things were discussed, but I simply couldn't do 9+ hours of driving this week over a 2 day span; it was just too much. I guess I'll meet him some other time.


Tracy's new 'doze machine seems to really chug through RC5 packets. And it has recently decided to start doing OGR packets (I couldn't get it to do OGR before; go figure). The distributed.net client seems to suspend itself unpredictably, however. For example, I left the computer on since yesterday for the sole purpose of RC5 hacking, and turned off the monitor. This morning, I turned it on and the last activity in the distributed.net log was from yesterday.

Weird.


I watched the SciFi channel's Dune saga, parts 2 and 3 (missed part 1). Not bad. I saw the original Dune movie a while ago, and I guess I rate these two as about the same. Some of the special effects were good (in the new one), some kinda sucked. I read the original Dune book, but none of the sequels.

But I'll probably watch the sequel movies when they come out; I enjoyed this version of Dune.


I went to have my stitches out today from the oral surgeon (had my wisdom teeth removed last week). No big deal there. The doctor, coincidentally, is an ND grad (he mentioned it when he saw my ND varsity jacket). I knew that I liked him for a reason.

More interesting, however, was my drive to and from the doctor's office. It solidified my understanding of "every action has an equal and opposite reaction."

The office is a few miles away, and I basically take one road to get there (a fairly main traffic artery). There are many lights between my apartment and the doctor's office. On the way to the office, I only had to stop for one red light. On the way back, almost all the lights were red.

Everything seems to work out evenly.

However, I noticed that the muffler is going out on my car. @#%@#$%@#$!!!!

And of course, in the 45 minutes that I was gone (which was the only time I left the apartment during the business day all week), I missed an Airborne Express shipment from outpost.com (free overnight shipping!) with Turbotax. Quicken/Turbotax is the only reason that I use a Windoze machine with any regularity. For those who don't, and if you happen to have a spare 'doze machine lying around, I highly recommend them. They're great products (I wish they had Unix equivalents; there's gnucash, which, as I understand it, is more or less like a less-mature Quicken, but no equivalent for Turbotax).

December 13, 2000

But honey, there'll *always* be women in rubber flirting with me...

Yow. Been traveling. Journal suffers. Millions weep and gnash their teeth.

Until now.

Be at peace, gentle reader.


Went and saw the play "Rent" with Janna and Tracy last week. It's a good play, depressing and cathartic. I've seen it before (in London), but none of the others had. We were supposed to go to Tracy's work Christmas party afterwards, but the play wasn't over until about 10:30, and by the time we would have gotten there, it would have been over.

Went and had a beer with Janna afterwards, which is always cool. As Anna likes to remind me, "they're my only friends in Louisville" (which isn't far from the truth!). I haven't had much opportunity to get out and meet random people here in Louisville, but this doesn't really concern me. I think that after I get my Ph.D., I might try my hand at an adjunct faculty position at the University of Louisville or something -- something to get me in touch with the geek crowd down here. Who knows.

But until then, like I said, I'm not really too worried. I do have a small group of friends down here, and I've met several of Tracy's work colleagues; they're all nice folk.


Flew to Philadelphia at an absurdly early hour on Saturday morning. Drove from the airport straight to dad's hardware store and started to work on hooking his store's LAN up to Verizon DSL. This actually entailed several things:

  • There is one SCO unix server on the LAN, 3 windoze 98 machines, and 6 DOS machines (!). This was actually mentioned in a recent /. article -- the DOS machines are fully functional; there's really no need to replace them. They use attractive ANSI graphics to do inventory queries, price lookups, etc., etc. 3 of those 5 DOS machines are actually cash registers, and are quite functional (and have been for many years). All 5 machines are very dependable. Sure, there have been a few random quirks, but for the most part, they have served remarkably well over the years, and continue to do so.

    However, whoever at True Value designed the network did so poorly. It seems like the IP addresses were chosen at random. As such, they were unsuitable for connection to the internet. So I had to convert all the IPs to be of the form 192.168.x.y (one of the approved private networking domains). Changing the Unix machine was easy. Changing the windoze 98 machines was also fairly easy. Changing the DOS machines proved to be a little more work -- their TCP driver is loaded dynamically from the DOS command line (it's a .SYS file loaded by a proprietary INET command for their TCP stack). But get this: the IP number, netmask, hostname, etc., etc., are all encoded in this .SYS file.

    I had to ressurect the memories of how to alter the .SYS files out of long-term storage (the percolation model, but several cycles were wasted during stalls while waiting for the memories to surface). But in the end, I triumphed. Actually, I have to hand it to those old DOS programmers -- once I remembered how to do it, it wasn't too bad of an interface for the time (it's all command-line driven).

    So I got everything on 192.168.x.y. Just for the heck of it, the Unix server was alone on 192.168.30.x, the 3 windoze 98 and 2 DOS "lookup" machines were on 192.168.20.x, and the 3 cash registers were on 192.168.10.x. Everything appeared to be working smoothly.

  • First glitch: Dad opened up at noon on Sunday (special Christmas hours; he's not usually open on Sundays). At 11:59am, I get an intercom call from him, "The credit card functionality in the cash register isn't working -- it freezes up." Doh! I had tested cash transactions and they all worked fine, but I hadn't tested credit card transactions (the cashier swipes a card in a slot that it built in to the computer keyboard -- pretty slick, actually -- and the cash register makes some TCP or RPC calls to the Unix server [not sure which; I've never been privy to the internals of the True Value code] who has multiple modems that it uses to make the outgoing call to the credit card center, verify the data, etc. There's a little progress screen on the cash register [ANSI graphics, mind you!] during this time: "Looking for modem" / "Dialing" / "Sending" / "Waiting" / "Approved").

    It took over an hour to figure out what was going on. Actually, my dad spotted the problem without realizing it. While rebooting (one of several while we were trying to figure this out) to get avoid the "hang" produced by the faulty behavior, he said, "Hey look at this --
    one of these status messages that zips by quickly in the beginning flashed a negative number. Does that mean anything?"

    It turns out that the cash register was calculating its ID incorrectly. In the True Value system, cash registers are numbered sequentially from 1 (stupid Cobol programmers -- they must have married Fortran programmers!). Each physical cash register has a fixed ID that never changes. It seems that our 3 registers now thought that they had IDs of -17, -18, and -19. Doh!!

    So even though the main cash-register-processing-routines in the Unix server (every transaction is transmitted back to the main server in the back room) were happily accepting most transactions from negative-numbered cash registers, it seems that the credit card authorization routines were saying, "Hey -- you're a negative number cash register. This must be a mistake. Go away." And therefore the cash register would hang, because it would either not get a response from the server, or it would get an error response that it didn't know how to interpret.

    So that identifies the problem (always an important -- and frequently overlooked -- step). Now, what was the cause? The only thing that I had changed was the IP address.

    No way.

    Way.

    It seems that the True Value programmers calculate the cash register's ID number off the IP address. More specifically, if the IP address is w.c.y.z, the cash register's ID is (20 - z).

    No, I'm not kidding.

    The /etc/hosts file on the Unix server with all the original IP addresses had the cash registers starting with x.y.z.20. I had changed them to be 192.168.10.1, 192.168.10.2, and 192.168.10.3. Doh!! Fixing them up to be .20, .21, and .22 solved the problem.

    How fucked up is that?!?

  • For the next part, for a long series of reasons that really aren't worth going into, we decided that my dad's windoze 98 desktop machine would be the DSL gateway into the LAN using Microsloth's Internet Connection Sharing functionality. So I first disconnected dad's machine from the internal LAN and brought up Verizon DSL on it. I installed a firewall, got everything working, etc., etc. They use PPP-over-ethernet as opposed to standard DHCP-style setup. Hence, it actually uses the Windoze dial-up networking functionality to establish a DSL connection to the internet. Really weird. I read some RFCs and position papers about this (PPPOE is actually standardized, and will be in the mainline Linux 2.4 kernel), but I still don't see the benefit. "You mean I still have to invoke kppp to activate my 'always on' connection?" It just seems weird to me.

    So I activated the Internet Connection Sharing (ICS) stuff, and noticed that it changed the IP address from 192.168.20.x to 192.168.0.1. Hum. I'll bet that was for a reason -- default router on a network, etc., etc. So I looked it up in the online help -- sure enough, it says that you can use addresses in the range 192.168.0.2 through 192.168.0.253.

    WHOA!! WTF?!? They changed a class C private network to a class D!! $%@#$%@#% I can see forcing the ICS server to be 192.168.0.1, but why the hell did they make a netmask of 255.255.255.0 instead of 255.255.0.0?!? That annoyed the hell outta me. And it had the following consequences:

    1. I had to go change the addresses on the other windoze 98 machines to be 192.168.0.x.

    2. Even worse, I had no DNS IP numbers to put in the Windoze 98 machines, since PPPOE "takes care of this for you" (yet another aggravating aspect of PPPOE -- DHCP can do this as well, but you can still manually override it on the client, if you want. You can't override it with PPPOE). Indeed, Verizon wouldn't give me their DNS server IP addresses for this very reason, "It's not supported". I managed to get them myself (whois and nslookup, no rocket science there), but they're not accessible from inside their internal DSL network. Arrggghh!!

    3. Hence, I had to fully surrender the windoze 98 machines to the ICS setup: I had to set the clients to use DHCP to get an IP address (apparently the ICS server turns into a DHCP server on the local network), disable DNS, and enable a setting labeled "Use DHCP for WINS resolution", which apparently did the DNS resolution stuff.

      I'm not sure what WINS is, but I thought it had to do with NETBIOS stuff. Apparently not...?

    4. What further sucked was that these machines now had an automatic netmask of 255.255.255.0. Which means that they couldn't reach the Unix server, because it was on 192.168.30.1 -- i.e., outside the netmask range, so they were sending packets to the default gateway who had no idea what to do with them and probably dumped them on Windoze's equivalent of /dev/null.
      So I had to put my unix server on 192.168.0.250. This is actually risky, because since the ICS server is a DHCP server, 192.168.0.250 is in the range of addresses that it is allowed to give out. Hence, I could have an IP address conflict. I can only hope that the DHCP server does the Right Things and sticks to low numbered IPs (there are only 2 clients, after all), and re-uses them when new DHCP requests come in.

      That just really burns me up -- that the stupid Microsoft programmers automatically assigned a class D netmask to a class C network. Only assigning 0.1 through 0.250 via DHCP would be fine, but the prohibitive netmask prevents [safe] interoperability with anything else that is not Microsloth on the same local network. I suppose that I shouldn't be surprised by this, but it still sucks. In hindsight, I should have done this with a Linux router and it would have been much easier and less time consuming. <sigh>

      MICROSOFT SUCKS!!!


I also had to reconstruct my mom's windoze 98 machine at home. It was basically so broken that it required a full reinstall. To make matters worse, the physical C drive had bad clusters on it, and every time you ran scandisk, it would find more bad clusters. Hence, it was going bad slowly. Not to worry -- we had a second physical disk already in the machine. So I just swapped the two disks (there are 2 IDE interfaces in the box; the boot disk was the master on one by itself, the second was a slave on a interface with the CDROM) and reinstalled on the old D drive.

Whoops -- the machine now takes over 5 minutes to boot (!). There were three distinct locations in the boot where it appeared to stall
-- waiting for something for which it apparently eventually would time out and continue. I only discovered today (i.e., after 2 days) that swapping the disks back to their original positions on the IDE interfaces (even though the boot disk is now different -- changed in the BIOS) eliminated two of the three delays in the boot time. I couldn't believe that this was true, so I swapped the disks and changed the BIOS settings back, and sure enough, it timed out in 3 places instead of 1. That's fucked up. The Intel architecture sucks.

And as for the last time out during the boot -- I have no idea what the heck that was. Dad has a total of 3 machines from this company, and the other two don't do this. It even did it after a fresh, clean install of Windoze 98, so it must be something in the hardware. That's fucked up.


Dad was a bit disappointed in the e-mail performance of Verizon's SMTP servers. It was really slow to send mail this weekend. Much slower than his old dialup account (56k). Indeed, several times it timed out and we had to click "send" again in his e-mail software.

A call to the Verizon help desk got a recorded message, "Verizon customers may be experiencing difficulty sending and receiving mail. We are aware of the problem and are working to fix it..."

It turns out that Verizon got heavily spammed. A /. article about it said that Verizon is convinced that it was deliberate and malicious. Apparently, they brought up more servers yesterday to try to cope with the load, but are still trying to automate the spam rejections.

Along the same lines, my dad does a lot of stuff for the ND Alumni Club of Philadelphia. One of the things that he does is send out mass e-mails to both the club members (it's a pretty active club, actually) and students on campus from Philadelphia (e.g., he passes along e-mails about rides home for students). Hence, he can send out an e-mail with several hundred BCC recipients.

His old ISP was a small local firm, and he got permission to do this. Verizon's max recipient count is 40. And especially in light of the spam problem they had this weekend, they were not interested in raising it at all. So I have to setup some special stuff on lists.squyres.com for Dad to relay his messages out. The problem is that he has a database with the e-mail addresses in it that he sends to, and it gets updated frequently (multiple times a week). So he needs an automated mechanism to import a whole new list of subscribers and completely ditch the old list. I'm thinking of some scripting with GNU mailman to make this work...


I hate Windoze. I am so glad that I don't have to use it on a daily basis. I can't imagine how people actually get work done with it. A very large company that I know (no names mentioned...) has their employees run "weekly updates" from the IT department on all their Windoze boxen (i.e., it runs automatically when you boot up). If it's Monday, it asks if it should run the weekly update. You can click on "No, please delay the update; do it later" up to 3 times. After that, it will run the update no matter what.

The updates routinely take around 2 hours -- if it runs smoothly, which they usually don't. The update frequently hangs/crashs in the middle of the process (your only indication of this is if you happen to notice that the hard drive light stops blinking for an extended period of time, upon which you have to reboot and start the update over again). You also can't use the computer during that time. They don't schedule them to run when no one is there (e.g., 3am), because to save money on electricity, everyone is required to turn off their workstation at night. Hence, this weekly update procedure is guaranteed to make their computer unusable for at least 5% of their ANSI standard work week (2.5% when the standard gets updated to reflect common practice).

I just can't imagine having to work in an environment like that.


On the drive between my dad's store and my parent's home, I saw a most curious thing: a cell phone tower that is disguised as a pine tree. It is painted brown and has evergreen branches on it. It's still a dead giveaway because it stands out much taller than any of the trees around it, but it did cause me to do a double-take.


I have to say: this election is working out exactly as rmurphy4 predicted. The latest bit: FL legislature picking electors.

Kudos to you, sir! And kudos again!

December 15, 2000

The Moog Cookbook

I swear that Perk's older brother works in the MailBoxes, Etc., here in Louisville.

It's either him, or someone that looks exactly what Perk will look like in about 5-7 years.


It is 4:53pm.

Dad's network is finally alive again.

And I don't know why.

At time T, we were at state A.

It didn't work.

We changed one thing, theoretically moving to state B.

This, of course, entailed a reboot.

State B didn't work. So we changed back to state A.

And rebooted.

Suddenly it worked.


And before you ask, I'm quite sure that we only changed one thing, and then changed it back. Yes, something changed during that time, but it sure as heck wasn't from something that we did -- Windoze did something internally.

I think that this is what bugs me most of all about Windoze -- its nondeterminism. It doesn't matter how smart you are, nor how much you know about computers: sometimes it works, sometimes it doesn't. There are absolutely no guarantees about consistent behavior in Windoze.

I'm just bitter at the end of a long, frustrating day where I got absolutely nothing done. <sigh>

The uptime on my desktop linux box is 98 days and counting.

What if we read the space news?

I've been attacking the stack of mail in my inbox from my foray to Philadelphia.

Lummy asked me to step into the Boost discussions about directory structure and whatnot. They do not seem to understand the necessity of separating a source code tree from an installed tree. We'll see where this discussion goes.

Had scads of LAM mail, both from the list and people who mailed me individually. Plowing through all of that...

Got some good responses about parallel vorbis. Turns out that since vorbis is a differential encoding method, it will require the same technique that I used in parallel bladeenc -- sending some redundant input blocks to each processor in order to "build up state", so to speak. This is kind of a bummer; it means that the parallel output will likely not be diffable against the serial output. But some of the vorbis developers indicated that the general idea should be able to work. Hopefully, I'll be able to work on this later today.

Lummy bought me a webcam (Intel Personal Camera or something like that) for use in teleconferencing up with ND. I can use my headset with it, which is trez kewl. It's also detachable from the computer and can serve as a portable digital camera. It's not the world's greatest camera or anything, but it could prove to be useful. Brian and I had a few difficulties getting a netmeeting going on between squyres.com and nd.edu last night. Apparently, the fact that my Windoze box is on a private network behind my router is the problem. Brian/Pete found a kernel module for the NAT that should fix the problem, but I haven't been able to successfully compile/install it yet. We'll see how that goes; could be really useful in terms of communicating with the Home Office.


Windoze really really sucks. My dad, after I got DSL all working and whatnot -- his desktop computer is the Internet Connection master for the local LAN in his store, i.e., DSL comes into his machine and is routed to the rest of the local LAN from there -- was having problems with a tax program that he uses to pay taxes on the salaries for his employees. It's supposed to dial an external phone number through the modem and then Do Its Thing. But for some reason, it didn't work last night when Dad tried to do it.

So Dad called tech support for this program this morning. Their advice? "Yeah, we've had problems with people who have DSL -- even if you still use a modem, our software doesn't seem to connect properly. Let's try something; go to the control panel, network icon, and remove the entry 'Dialup Adapter'..."

Needless to say, this fucked up everything!.

A bit of background: Verizon DSL uses PPP over Ethernet (PPPOE) instead of normal DHCP stuff. PPPOE uses dialup connections to establish connectivity (it's weird; supposedly "it's easier on the user, because they already understand the dialup-to-connect concept". I think it's just stupid -- DHCP can do everything that PPPOE does, and not have to go through an additional dialup step). So when the tech weenie had Dad delete the dialup adapter, that totally fucked up the DSL connectivity.

I've now been on the phone with my dad for over 3 hours and it still doesn't work yet (we've finally managed to get the DSL connectivity back, but the internet connection sharing to the rest of the LAN is not working again yet). Windoze absolutely sucks.


I'd be willing to bet that the first few steps of the tech support checklists in just about every Windoze-based support center go something like this:

  1. Ask the following question: "Have you rebooted your machine since the problem started?"

  2. If the user answers no, have them reboot and move on to the next caller.

  3. Ask the following question: "Have you uninstalled and reinstalled the product that you're having a problem with?"

  4. If the user answers no, tell them to uninstall/reinstall and move on to the next caller.

  5. ...

Windoze absolutely sucks.

December 18, 2000

Do you *have* to squash my face like this?

It was a sad day yesterday.

Queeg, my linux desktop here at home, had its 100-day uptime destroyed by a hair dryer that tripped the power breaker.

There was much wailing and gnashing of teeth.

But I might use the opprotunity to upgrade queeg to Mandrake 7.2.

Do deee...

December 19, 2000

Who the hell do the "Citizens for Broadcasting Decency" think they are?

I finally got my muffler replaced yesterday.

That was $140 I didn't want to spend. Ugh!!


A series of random thoughts:

  • Getting my muffler fixed (actually, it's the pipe between the engine and the muffler, hence it wasn't covered by the warranty) and running a bunch of Christmas errands took a good portion of my day yesterday.

  • I talked to Darrell on the phone and found out that his parents live just a few miles from Tracy's parents down in Florida. It's a small fricken' world!

  • I love the ability to track my UPS packages on the web.

  • For those of you who keep bugging me, yes, I did screw up on a previous journal entry -- I meant to say class B/C networks in the entry about DSL and MS's internet connection sharing, not class C/D. It was late, and I was tired when I wrote that entry.

  • Turns out that MS netmeeting doesn't work between two different private networks. i.e., from my 'doze box behind my router (which has a 192.168 address) to another box behind a different router (which also has a private IP address, perhaps of the 192.168 variety). It's not a shortcoming of netmeeting itself, per se, it's a shortcoming of the underlying protocol -- H323. Bummer. However, with a special H323 NAT module in the IP masquerading stuff in Linux, you can do a netmeeting between my private windoze box and somewhere else directly on the net -- as long as the box on the private net is the one who initiates the call -- which is quite handy.

  • I wrote a schload of Christmas cards yesterday (will send them today) along with our obligatory Christmas letter. I'll post it here in a few days; gotta wait for people to get the snail mail version first. :-)

  • I wonder what it would be like to own an elephant.

  • Darrell has a great picture of his decorated house at http://christmas.kresge.com/. Isn't running your own DNS cool?

  • Louisville has gotten 3 snowfalls of about 2 inches each. Each one has sent the city into a panic. It's quite funny to watch.

  • Lummy and I are fixin' to go to Madison to meet with the Condor folks in early February. I sent some dates up to Erik, and he's checking into it.

  • Progress on poggenc is coming along swimmingly.

That's about it. Back to coding. Mmmm.... coding....

Razzem frazzem...

Oops -- got the URL for Darrell's decorated house incorrect.

http://christmas.kresge.com/

What the hell has happened to quality control these days?

December 21, 2000

The Real Deal with Bill McNeal

Several cool things happened yesterday.

  • I re-released PSR
  • Brian re-released inilib (ok, that was actually 2-3 days ago)
  • My sister got engaged (no URL available :-)


PSR is the Password Storage and Retrieval system that we use with OpenPBS to get AFS authentication with PBS jobs.

We've had problems with our installation of PBS over the last several months (ever since we upgraded to OpenPBS, actually). It turns out that one of the components of PBS, the Mom (a daemon that runs on each compute node and manages the user jobs that are launched on it) was at fault.

Actually, it was our patches to the Mom that were at fault. We had to patch the Mom to include bits to launch some PSR kinds of things (first, a program to get an AFS token, second, a program to "shepherd" the user's job and re-up the AFS token before it expires. Doing this allows a user's job to run for much longer than the life of their token -- their token is magically renewed for them for the entire life of their job).

We used the popen() call to invoke these two commands. Unfortunately, we didn't think that popen() would have the child process inherit the open file descriptors from the parent. But it does. Doh!!

Specifically, the Mom has multiple sockets open, including one that it is listen()ing on. To make a long story short, having multiple processes open sharing the same socket is a Bad Thing, and it caused Ickyness in PBS's runtime because it typically disrupted PBS's internal protocols.

Adding the following code to the beginning of the PSR executables solved the problem:

for (i = 3; i < sizeof(fd_set) * 8; ++i)
close(i);

However, I still think that this is not perfect -- not knowing the internals of the Mom, I think it is still possible to get a race condition where Badness can occur. This can happen if the PSR executable is launched and them some Event happens on the socket before the PSR executable is able to close it. I think the real solution is to make the sockets be close-on-exec in the Mom, but I'm not sure. I've mailed the PBS guys to see what they think.


If you don't already use inilib, you need to. It will save your life! I classify it in the same category as the STL -- you could write something to do the same thing, but why?

inilib is a C++ library that reads and writes .ini files. While this in itself is unremarkable, its cool aspects include:

  • Simple 2D array-like accessors. For example:

    foo["section"]["key_name"] = keyvalue;

  • Small API; easy to remember and use
  • Automatic write-upon-destruction semantics (if desired)
  • Script-like automatic type conversion semantics. This is truly cool. By abusing some of the properties of C++ on the back end, we can do things like this:

      int i = 37;
    foo["section"]["some_integer_key"] = i;
    string s = "1000";
    foo["section"]["some_integer_key"] = s;
    i = foo["section"]["some_integer_key"];
    // i now equals 1000

    More to the point, you can use the inilib objects like Perl or PHP objects -- all the type conversions are automatic and safe. This is utterly cool.

So anyway, go start using inilib. You'd be surprised how often you want to save a config file from your program; inilib just works.


My sister got engaged last night. It was a typical dinner-romantic-walk kind of proposal, but I'm sure that Alan delivered it with style. Needless to say, Terry accepted. Woo hoo! So we'll have another Squyres wedding in the next 1-2 years. Alan's a good guy; I think he'll make a great addition to the family. Now we get to meet his family (who all live in Indiana, not too far from ND, I might add!).

Rock on!

December 22, 2000

I believe in straight lines

A Motley Bag o' Notes today.

  • By my logs, K-Mart calls are definitely on the rise (my home phone number is the same as K-Mart's, but with 2 digits swapped). Must be because of the holiday season. The callers are also getting more and more polite -- most people are saying "sorry". Interestingly enough, there's four standard responses from people during a given K-Mart call:

    • Ok
    • Thanks
    • Sorry
    • [click] (i.e., hangup)

    I've been keeping logs on this in a flat text file. Someday I gotta hack up a PHP script and a MySQL database to do this online so that everyone can see the numbers...

  • Bob from Veridan (is that what they're called these days) e-mailed me with a better solution to the PSR problem with PBS. Apparently, there's an internal PBS MOM call named fork_me() that does all the Right Things to fork a child process. This is orders of magnitude better than the popen() that we use now. Oops. I'll have to go back and fix that one up...

  • Progress on poggenc is coming along swimmingly. I now pass the input .wav data all the way through the five distinct states in the state machine (input, input queue, encode, output queue, output) -- there's three separate progress bars to watch (eye candy!).

    The progress bars, themselves, turned out to be an interesting sub problem -- you want to update them all the time, but only want to actually display the value periodically. And you only need to display it if it's different than last time. But then you run into a problem when you have more than one file. So here's a sample progress line for one file:

     foo.wav            |********75%**     ||********73%**    ||********70%*      | 

    How exactly do you show the progress of multiple files (it's quite possible that multiple files are being processed simultaneously) without having to link in a curses library? Showing one line is simple -- you just output a \r instead of a \n (I think that even works in 'doze as well). But with multiple lines, without the ability to make the cursor go "up" a line, you can't do it.

    So I punted for now and just have it redisplay the whole thing again if there's more than one file being processed simultaneously. This isn't the main focus of the work, after all. :-)

    Things still to do for poggenc (in no particular order):

    1. add overlap of inputs. This is mainly a function of the input queue; need to add some extra logic to save a few readsets of the input from the tail end of every dequeue of the input and prepend them to the next dequeue.

    2. add vorbis/ogg processing.

    3. MPI stuff (only does threading for now).

    4. juice up the eye candy. If it's not worth watching, it's not worth running.

  • xmms continues to crack me up. I think I've mentioned this before here in the journal, but there's a thread leak in it such that every song it plays launches a new thread. This thread never dies. And since in Linux threads are implemented as processes, you can see how many threads are running.

    As of right now (9:52am), I have 331 xmms processes on my Dell desktop.

Back to poggenc!

And Bill, don't let those fat bastards in Congress stick it to you

I have declared today "Annoying Female Vocalist Day", or AFVD, for short.

It's all queued up in xmms -- I'm set for hours of uninterrupted AFV's.

Sidenote: That is a definite benefit of digital music; you can just queue up hours and hours of music and then not have to futz with it. It's pretty much the same reason that they came out with 3- and 5-CD players. But with digital music, you can queue up [more or less] an infinite amount of music -- you're not just restricted to 3 of 5 CD's worth. There's been many a workday where I've queued up move than 18 hours of music when I start working, and then don't bother with my music for the rest of the day.

Sidenote: I love my telephone headset. It has a noise-canceling mike; I can have my music on fairly loud in the background and the person I'm talking to on the phone can't hear it at all. It's also stereo -- it has 2 earphones, contrary to most headsets. I find this to be extremely useful. It also allows sounds to be piped in from my computer -- so I can play MP3s directly over the telephone. While not an amazingly useful feature, it has been practical once or twice.

Back to AFVD.

We're starting out with Alanais Morisettet; I've got about 1.5-2 hours of her queued up; Supposed Former Infatuation Junkie and Jagged Little Pill.

Then on to Bjork -- both Telegram and Debut. That voice; my God
-- how did that happen? Hearing these albums always makes me feel like I want to donate large sums to children's charities.

We pour salt into the wound by following up with Ace of Base. True, AoB isn't completely female, but they satisfy both requirements of a) having female vocalists, and b) being annoying. We've got The Bridge and The Sign from AoB.

Then Erasure. Ok, not female at all. But he sounds female, and is definitely annoying. I Say, I Say, I Say.

The pain stays alive with Jewel -- Pieces of You and Spirit. Annoying to the max.

The agony continues with Loreena McKennit, The Book of Spirits. There's that one cool song on there that has a really bizarre-o video with some midgets running around, but other than that, it's annoying.

Nina Hagen comes next. Just ask anyone in the LSC (particularly Dog) -- Revolution Ballroom is one of the worst albums of all time. This is why you're probably never heard of her.

P.J. Harvey's Rid of Me -- so aptly titled -- prolongs the horror. I think there's one song on the album with a few quiet parts so that they can pass it off as multiple cuts to the radio stations (after all, if you're not Floyd, they're not going to play 20-30 minute songs).

Suzanne Vega's 99.9F rounds out the mix. Just like P.J., I think there's one song on this album. Consider: monotone singing with a guitar. Need I say more?


So it's going to be a Very Long day. I'll have to rely on my coding skills and embroil myself deep in hackery to keep from having my spirit crushed. It will be a true testament to my abilities if I can come out of this day and still be sane.

See you on the other side...

December 31, 2000

The saddest of all keys

Just for posterity's sake, I'll make a brief journal entry.

Trip to FL to Tracy's parents for about a week was good. It was warm. We did nothing, and lots of it. I did take my laptop and spend the better portion of a day in the warm sunshine working on parallel oggenc, though.

Came back up here to a few inches of snow -- it's nothing compared to what other parts of the country are currently experiencing, but it's very unusual for Louisville to have this much snow for so long!

The last two days have been almost nothing but parallel oggenc coding/debugging. I've been pretty active on the vorbis-dev list since I got back; much discussion has occurred. Monty tells me that he's got a branch where the thread-safe encoding stuff is partially done (whoo hoo!!!); can't wait for that to become mainstream.

I've gotten pretty far with poggenc, and it appears to almost be working. Things yet to do:

  • Output ogg data appears to be getting hosed somewhere in the output queue. It varies from completely bad output to occasional "blips" in the output audio, but I think it's symptoms of the same problem.

  • MPI stuff hasn't been written yet; it's all threaded right now.

  • My stats displays will have to be re-thunk a bit. Right now, there's 3 separate displays (done that way on purpose; one for input, encoding, and output). But it might make it easier to have a single object for all three (rather than three objects) because I think I want to have an "oggenc compatibility mode" stats display where there's only one progress bar so that programs like grip can run poggenc without having to analyze a different progress bar.

  • All the steam shutdown bookkeeping isn't written yet; I don't think there's much, but I'm concentrating on getting it working before I make all the memory cleanup at the end work properly.

The first item is the focus of my current work; it's been a bear to find some far -- I can't manage to track it down somehow. We'll see...

Heading out to a New Year's party later tonight which should be fun. I'll probably be up at ND later this week (not for sure yet, but likely).

Special shout out to my homie Suzanne -- happy birthday!

January 1, 2001

My car just hit a water buffalo...

Of 1004 processes running on my desktop, 897 of them are xmms.
Even with the thread leak in xmms, I have 107 processes running on my desktop. I'd like to see Windoze do that.

Er... actually, no I wouldn't.


I caught one of the lead developers of Ogg/Vorbis (Monty) on #vorbis (IRC) today -- it marks the first time I have ever used IRC, actually. Amusingly enough, when I tried to run one of the stock IRC clients that comes with Mandrake, it fired up Gnome for me! I'm a KDE user (for no particular reason; I won't participate in any WM religious wars). So now I somehow have both Gnome and KDE running simultaneously. Amusing.

Monty answered a question that has been causing me fits for 3 days now (audio output incorrect when queueing up ogg packets for later writing, see yesterday's journal entry). That should fix up the Big Problem with poggenc.

I'm redoing the stats bit right now; more robust, less flakey, and displays a spinner reliably now. Almost done. After that's done, I'll incorporate the fix from Monty and see if that does the trick (it should; I tried it in a different context). Then to finish up the bookkeeping issues and test for memory leaks (I'm thinking that there are many...). bcheck is your friend.

Monty also tells me that the thread safe stuff won't be on the main CVS trunk for a week or two yet; I'll have to put in the duplicate-input stuff in the input queue. I was hoping to avoid that. Oh well.

Back to coding...


Fiesta Bowl is tonight -- Go Irish!

January 8, 2001

Her father was a Shriner, know what I mean? Nudge, nudge, wink, wink...

It's been quite a while since I've done an entry, and I blame Arun. Without his daily (and sometimes more than that) journal entries, how can I be expected to remember to do my own?


How cool is this? Tracy's products are up on GE's web site.

http://www.geappliances.com/cooktop/

Until about 3-4 weeks ago, I thought she was working on 2 models: 1 gas, and 1 electric. But she's really been working on about 90 different models! Yes, nine-zero.

Cool! (GE even did some kinda cool stoopid-browser-trix things on that website, too)

They just started going down the production line a few weeks ago (which is pretty cool in itself); they won't be available in stores for a little while yet.

My wife rocks.


I've been doing a lot of development work with poggenc. The first generation is essentially finished -- I'm currently working on plugging up the last few memory leaks. I have found at least 1 bug in the Sun Forte 6.1 STL implementation -- std::vector::resize() causes a read-from-uninitialized error. Doh.

poggenc is still threads-only (no MPI yet). I thought that I knew a lot about threading before I started this, only to discover that I didn't know jack about threading. My original design had many locking bottlenecks, such that encoding with multiple threads (or even one thread!) had so much overhead that it was slower than hell. I had to redesign a bunch of the interfaces and reduce the numbers of locks necessary by a lot in order to get the processing time down to a reasonable level.

Still, however, it's less than linear speedup with multiple threads on SMPs. Of course, nothing can exhibit perfectly linear speedup, but this isn't close enough for my liking. I'll continue to investigate that.

I started some web pages to explain how this works, with the idea that some of this text can be morphed into dissertation-quality text afterwards. i.e,. the web pages are a dry run for a dissertation chapter.


Saw an old Army cadet of mine this past weekend; Brent and his wife Aimee (I hope I spelled her name correctly). He was one of my Airborne plebes; I beat up on him as part of his training (and he's a better person for it! :-). It's a small world -- he now works for GE Appliances here in Louisville. It was good to seem him again, and to hear what he ended up doing in the Army, and what he's doing now. Ironically, he outranks me -- he finished as a Captain, while I'm still a 1st Lieutenant. Life is amusing that way...

He's working on an idea with an old commander of his who is at the Army War College. It's an overhaul of the Army's evaluation system. It's pretty cool, actually. There's a web and technology component (which is why he asked me). He asked if I could help, and I probably will throw a bit of advice their way (contributing to the open source/freeware cause, of course), but I don't have time to do any actual programming for them. Ah well. :-\


Over the past few days, we've (me-n-Andy) been coordinating a trip up to the University of Wisconsin/Madison for a visit with the Condor folks. We've got it all set on the first week of February, but I forgot my @#$#@$% dentist appointment that week. Arrghh!! Tomorrow, I've gotta see if I can get it rescheduled (my dentist isn't open on Mondays).

Other than that, it looks like it's going to be a great trip; I'm going to give a talk on LAM. After a little discussion (we've got a mailing list setup for the LAM and Condor folks for ongoing collaboration), we decided to split my talk into three parts:

  • MPI vs. PVM: theoretical / practical reasons, with a few small code samples
  • Talk about how the lower layers of LAM work (daemon-based stuff, etc.)
  • An intro to what we're hoping to do with a Condor + LAM collaboration, what I've tentatively nicknamed "Lamdor" (like the name?)

It should be a good time.


Speaking of the Goodness of LAM, there's a Linux Integrator company (Aspen Systems -- http://www.aspsys.com/) who wants to install an 800 node Beowulf with LAM and Myrinet 2000. How cool is that?!

LAM: Lust for Glory!


I visited ND last week for a few days. The lab is a total disaster with water damage and whatnot. However, I've heard the most sensible idea for solving the problem that I've heard in years: instead of trying to fix the roof, they're going to essentially install an upside-down umbrella in the attic under the roof to catch all the water that seeps in from the roof. This water will be funneled to a new drain pipe that they installed inside Cushing. That's right -- they drilled through a hole in the floor 325 Cushing, and also through the floor in the room below us, and will be installing a massive drain pipe from the attic all the way down to the ground floor and outside, so that the leaking water can flow all the way from the roof to the outside, safely.

Engineering wise, it's actually pretty cool.


While I was at ND, I managed to grab Dan from Scyld on the phone. We had a good chat. He's very pleased with the progress on poggenc, and we talked about LAM/Scyld as well. We think we came up with a hack for LAM/Scyld. It's not perfect, but it will [hypothetically] allow:

  • LAM to work on Scyld machines.
  • An RPM of LAM to be distributed that will work on both Scyld and non-Scyld machines (decision is made at run time).

We'll see how that works out.


Also while I was at ND, Dog and I met with Paul and Johanes to "turn over the keys" of the Hydra. Dog and I are now no longer the primary caretakers of the Hydra -- Paul and Johanes are. Of course, we'll be in a transition mode for a while; Paul and Johanes will probably have to consult us with any problems with PBS for some time. But at least we've started the transition.

Two things I have to do before I am fully out of the loop:

  • Integrate the Maui Scheduler and QBank software into PBS. This is because Rich Sudlow has finally decided to take us up on the CTC deal where ND HPCC users get 10% of the cycles of the hydra per month. To do this, we need an allocation-tracking program (QBank), and a scheduler that can interface with it (Maui). I'll install this stuff, and tell Paul/Johanes about how it is setup when it is done. Hydra PI's and students will either get an unlimited monthly allocation, or an allocation so large that they cannot spend it all. All the HPCC users will share a common allocation that amounts to 10%
    of the hydra cycles per month.

    Interestingly enough, the Maui scheduler did not compiler under Solaris. It was a handful of small items that were "wrong". I corrected them and sent a patch to the Maui scheduler list. The author was very grateful and promised to include the fixes in the next release. How cool is that?

  • Finish the PRS once and for all -- there's some calls to popen() that need to be replaced with formal fork()/exec() stuff (for various technical reasons). This is of lower priority, but it does need to get done eventually.


In Army news, after a weird sequence of events, it looks like I'll be heading down to ARL/STB (Army Research Lab / Software Technology Branch) Atlanta for one more 2 week stint before they get shut down. I have to do my annual 2 week tour before 1 March, so I could go down there immanently. It depends on the trip to Madison and my dentist appointment; we'll see what happens there.

Also, my PMO (personnel management officer... took a minute to remember that) at AR-PERSCOM (Army Reserve Personnel Command) sent me an e-mail at the end of the day saying that she's got a line on a new position for me in ARL since STB is being shut down. I'll be talking to her tomorrow about it. This likely means that I won't be heading back to be a BSO (Battalion Signal Officer) for some combat unit after this 2 weeks. Wooo hoo!


As of 8:47pm, I have 433 xmms's running on queeg.

January 11, 2001

I'm ballooning my ass off up here!

A quickie tonight, 'cause I'm tired and want to go to bed. Typos be damned.


I sent a broad list of suggested meeting topics to the lamdor list today. We'll see what the Condor folks think. I also sent my lengthy discourse on all my thoughts about Lamdor that I wrote about a week or two after SC2000. I was amazed to see that I had written 638 lines of text. Good God! I talk a lot.


I have spent 3 excruciating days tracking down a LAM problem on AIX 4.3.3 in 64 bit mode. Craig Stewart and friends down at IU ROCK, by the way. I absolutely needed an AIX 4.3.3 account somewhere to track this down, and after calling/e-mailing everyone I could think of to no avail, Lummy suggested the IU folks. They got us an account within a matter of hours. Amazing.

This all came up because some guy (Shahryar) in ibm.com e-mailed the LAM mailing list saying that he was having problems getting LAM to work under AIX 4.3.3 in 64 bit mode. It turns out that his boss is an old LAM guy from Ohio State, so we felt obligated to help him out. :-)

Amusingly enough, over the span of 3-4 days, I have more e-mail from Shahryar (141 from him) than I have with any other single LAM user. The next contender is Keith from Citibank, of which I have 117 e-mails -- but that was over a period of several months. The next largest number of mails that I have from a single LAM user is 39.

Wow.

Huge props to Darrell and Rich for helping me figure this out. There were deep discussions today in e-mail about kernel-level stuff (I have to admit, it was somewhat amusing to watch a BSD guy and a SYSV guy duke it out). There were many others that helped in find this bugger, too -- thanks to everyone.

Here's an e-mail that I sent about it earlier today:


Subject: Bloody AIX!

For the past several days, I have been struggling with an issue under AIX 4.3.3. This may affect you in the future (it has to do with blocking and non-blocking sockets), so I thought I'd pass it on.

The quick moral of the story: friends don't let friends use AIX.

Or, in the original German: AIX ist pronouncen "aches"

January 12, 2001

Radio broadcasting, wrought iron smelting... it's all pretty much the same thing

A few things that I forgot to mention last night...


As of 7pm on the 10th, I had 788 xmms instances running on queeg. xmms finally crashed yesterday; I'd assume that we were over 800 when it finally died at 12:41pm yesterday (the 11th). Right now, I have 98 xmms instances on queeg.

I need to write an automated scripty to monitor this so that I can keep a record of the most number of xmms instances; probably a simple cron job that appends the date and the number of xmms instances to a flat file would be fine.

...done. cron will fire this thingy up every 5 minutes. Because when you have an 800Mhz machine, it's important to bog it down with utterly useless crap. Oh yeah, I need to write that PHP K-Mary Phone Call tracker, too, with a MySQL back-end and automated report generators...


Got one response back from one of the Condor guys already about my really long summary of what we need to do for Lamdor. Cool!


It looks like Jeremiah (of the clan LSC) will be forced to make his LSC Friday Lunch scripty thing in a webified doo-hickey. It'll be good for him; he'll have to learn PHP, which will, fundamentally, make him a better person.

PHP makes the world a better place.


Johnny hooked me up with an account on his MS Exchange server at home the other day. I used MS Outlook 2k to hook up to it. Why would I do such a thing?

Well, Outlook is actually not a bad program, truth be told. It has many nice features. That being said, I don't know if I'll ever be able to use a GUI mail client because I'm so conditioned to a green-screen mail client, but... My sister Robin needed some advice on enterprise-wide calendaring; having an account on an MS Exchange server where I could step through each window with my sister over the phone (her company uses Outlook/Exchange as well) was quite helpful.


I think we'll try to have a "welcome to the internals of LAM" party/meeting next week with Ron and Brian. Arun will likely be there, too, since he's really only been exposed to a small portion of the LAM internals. I'll have to think about what I want to talk about, and how to orient the guys to a source tree of 80+ directories and 950+ files.

An annoyance that we ran into in LAM the other had to do with libtool. libtool can be your friend, but it can also be your enemy.

It seems that -- at least in some environments --
libtool does not like source filenames with more than one '.' in it. i.e., "foo.c" is ok, but "foor.bar.c" is not ok. It's some kind of regexp problem inside libtool somewhere (I tracked it down once) -- they just made the bad assumption that there would only ever be one period in the filename, the one that separates the basename from the extension.

We had a handful of filenames in LAM that had two dots. I don't know why, but libtool barfs on these only in some environments. I didn't bother figuring out why; I know that I've seen this before and that I somehow managed to fix it (of course, I can't remember how, now). But on the rationale that when we start distributing LAM with libtool-enabled builds, someone will run into this problem, Arun and I just went through an renamed all the "bad" filenames using s/\./_/.

This is kind of annoying in CVS, because there's two ways to do it:

  1. Go muck around in the CVSROOT and rename the repository files manually
  2. Use CVS to remove the old filename, and the use CVS to add the new filename

Either way is ucky, but the latter preserves the history in case you need to roll back to an older version, so that's what we did (with comments in the logs about where to find all the previous versions of these files, since they now have CVS versions of 1.0.1).


It's 8:10am. I have 102 xmms instances running on queeg.

January 14, 2001

Honneysuckle matchheads

Still working on parallel oggenc. Ugh! There's some internal massive memory leak that is proving incredibly elusive (it must have something to do with the way that I'm invoking the Ogg/Vorbis API incorrectly...). However, I have proven to myself that I have plugged all of poggenc's holes.

C++ can be really helpful. I have a templated buffer pool class; it is used to allocate and then recycle buffers so I don't have to new / delete forever.

In order to provide that nothing is getting lost in this templated class (i.e., everything eventually gets deleteed), I had to put some couts in the destructor (shh!). But since the class is templated and used in many cases, seeing a general: "X buffers remain unaccounted for" is not helpful -- I need to know which instance has buffers that remain unaccounted for.

Enter C++'s typeid construct. With it, I can do:

cout << typeid(this).name << " has " << size << " buffers unaccounted for" << endl;

which shows the real type of the templated instance. Very cool, and very useful. <typeinfo> is your friend.


Brandon (and some others I think), a ND CSE senior, has written the ultimate Palm Pilot killer app: it plays the ND fight song, alma matter, and the Victory Clog. It's still in beta, but I managed to snag a copy of it and it seems to mostly work. Brandon says they're still working on it, and will send me a copy when they hit 1.0.

It's not like I know a bazillion people who would want an app like that or anything...


Saw some movies this weekend:

What Women Want: A good flick; watched it with Tracy and Janna. There was stuff in there for both men and women. Got a bit mushy towards the end (it is a romantic comedy, after all), but all in all an enjoyable movie. I give it 12:30.

Keeping the Faith: Quite amusing -- saw this one on video. It's with Ed Norton (Fight Club!), Ben Stiller, and Jenna Elfman. This one, too, slowed down a bit towards the end, but there was a good supply of one-liners to make it enjoyable. I give it 15:00, partly on the strength that Ed Norton rocks 'cause of Fight Club, Ben Stiller is just really funny, and Jenna was really hot.

It seems that Arun is going to show Eraserhead for the movie club this week. What a horrendous choice. Eraserhead has the dubious honor of being the only movie that I have ever returned to a video store without watching it in its entirety. It was too fucked up for me -- I turned it off somewhere about halfway through. Granted, that was at last 15-17 years ago, but still, I have memories of that movie sucking Big Time.


I'm waiting for a bcheck run of poggenc to finish (it takes quite a while, even with a small sample) that will hopefully shed some light on my memory leak woes.


Miron, head PI for the Condor project, sent some wisdom to the Lamdor list today: let's just concentrate on getting LAM jobs to run in Condor before we do all the checkpoint/migration stuff. I was under the impression that we had to do the checkpoint/migration stuff to get LAM to run under Condor, but Erik informs me that they have a static scheduler that allows things to run uninterrupted, and therefore not have to have checkpointable/migratable code.

This is good to know -- it makes a nice, clean abstraction break between these goals (getting LAM to work in Condor, and getting LAM to be checkpointable/migratable).


I had 487 instances of xmms running on queeg a little while ago -- 85% of all processes on queeg were xmms. However, that caused xmms to eat up over half of my RAM, which was really slowing things down. So I had to kill and restart xmms.
However, X itself still consumed about half of my physicial memory even after I killed xmms. Perhaps there's some gradual memory leak in the X server as well. Who knows. I restarted X and all was well (X had been running for about 30 days; while that's not perfect, I suppose it's [fairly] forgiveable).


Tracy and I contacted a realtor (on the recommendation of several co-workers of Tracy's) and started looking at houses on Saturday. We're going to look at more tomorrow.

It was actually surprisingly fun.

I never thought that I'd be able to walk into a house and say, "Nope. This one won't do," and actually mean it, and have reasons for saying it other than just being cocky, flippant, and arrogant.

Damn, I'm getting old.


Ah! The bcheck run is done. Back to coding... squishing little buggies...

Tastykakes and beer: health food for a Gnu generation

Typo in the last journal entry. The C++ typeid example should read:

cout << typeid(*this).name << " has " << size << " buffers unaccounted for" << endl;

Forget not to dereference this, lest the wrath of the incorrect anwer descend upon you, and add to your time in Purgatory.

January 20, 2001

A complaint about the complaint box. Delicious.

As I was driving home from Notre Dame yesterday, I drove south into a snow storm, which is really odd. Normally, it's the other way around -- you go North to get snow.


Had a good couplea days at ND.

I had a rockin' LAM pow-wow with Arun, Brian, Ron, and Dog. Dog was more of an observer, but he has been an official Friend of LAM for quite some time. When he gets some Spare Time(TM), he does need a Master's project, so it's possible that he'll do something in LAM. We'll see.

We discussed all the things that are Going On in LAM, and came to a few decisions:

  • The next release of LAM will be 6.5, not 6.3.3. Mainly PR reasons, but also to signify that this is quite a big change since [the currently available] 6.3.2.

  • First order of business this semester is to get 6.5 out the door. There's one or two issues that I'm going to look into this weekend, and then start giving tarballs to Ron and Brian for formal testing.

  • Ron will probably start looking into Totalview support. That will be way cool; having a real parallel debugger that supports LAM.

  • Brian is going to start looking into IPv6 support. This could give us some really cool things, such as optimized collectives (using IPv6's native multicast ability), security in the lamd (using IPsec), etc.

  • Arun's going to finish the Myrinet RPI. He's having problems with long messages right now; hopefully that will get fixed Real Soon Now. He'll likely look into the VIA RPI after that, and dabble a bit in compression at the RPI level. This is an interesting sub-note: I think I had the inspiration to use compression in MPI during a drive SBN<-->Louisville. Sometimes it's not worth it, but sometimes it may make a huge difference in terms of bandwidth. It would be our ringer for ping-pong tests. :-)

  • We'll probably have a series of quicker sub-releases (hopefully!) that incorporate major new features. e.g., 6.5.1 may have Myrinet support. 6.5.2 may have Totalview support. 6.5.3 may have some TCP RPI optimizations (e.g., tiny messages, fixed linked list handling). And so on. We can't really do this now because the 6.5 tree is very different than the 6.3.2 tree.


Didn't get to see Ed-n-Suzanne too much; maybe we'll have to do dinner one of these times when I go up there. Cleo went barking crazy when I came home both nights. I think the Cleo's non-barking acceptance rate is complicated function. There are multiple factors:

  • Whether I initially come in during the day or at night (day, 1 = day, 0 = night)

  • Whether Cleo is there when I initially arrive (only if during the day) (at_home, 1 = home, 0 = not home)

  • Whether I come home at night by car or by foot (car, 1 = by car, 0 = by foot)

  • How many days I have been there (days)
  • Phase of the moon (moon, fraction from 0 to 1)

These factors have led me to the following equation (too bad mathML isn't yet implemented anywhere...):

chance_of_bark_at_night = \sum{i = 0}{1 \step{.1}}{\frac{1}{days} \times ((day * .75) (at_home * .75))^{(i != moon)}}

A team of 13 scientists that have been studying my visits to Chez Costech came up with that formula. I'm quite sure it's right.

January 22, 2001

"Slut" has been playing continuously for 2 days

Spent much of this weekend looking at houses again.

Found two great houses over by Janna, and we were all set to sit down and slog through the details of deciding which to get, and then found out that neither of them have DSL availability.

Arrgh!!


I'm doing a bunch of LAM work right now to enable Ron and Brian start the release process for LAM 6.5 (did I mention in the journal already that we're going to call the next release 6.5 instead of 6.3.3? It's a long complicated story [e.g., where's 6.4?], but there are definite reasons for everything. To summarize: there's been major changes since 6.3.2 such that we didn't feel that an increase in the release number was sufficient to describe the enormity of the change. It isn't quite as revolutionary as should indicate a major number change, so we settled for a minor number change. There.).

I've added a whole schload of programs to the lamtests test suite, and added a few more canonical example programs to our "examples/" directory --
something I've been meaning to do for a while. We have some good examples already, but none are the "standard" examples that are typically used in MPI, like the pi approximation program and the ring program.

Now I'm briefly diverting to write a few man pages (we have a bunch left unwritten for MPI-2 functions, so we've divided them up into groups and assigned them to various Llamas. Tackling them a few at a time is a good way to whittle the number of unfinished pages down to a small number, as the limit goes to 0). Mostly MPI-2 dynamic functions for me.

After that, I'll finally get around to fixing MPI_COMM_SPAWN and MPI_COMM_SPAWN_MULTIPLE
-- there's something wrong with using app schemas such that you still have to give a process count on the root or something (you shouldn't have to). And I think the error code reporting is futzed up somehow (lamteam advised me of this about a month ago or something. Not a huge deal since errors typically cause aborts, but it is possible that someone could set the error handlers to return and expect to get valid error codes back).

Then if all else looks good (oops... looks like I have a seg fault in one of the new test programs...), I'll hand the tarballs over to Ron/Brian to begin the release process.

Long live LAM!


There are currently 144 copies of xmms running on queeg, which is 63% of all processes.

January 24, 2001

Choco-latte dead-head sickers

I just had a lengthy journal entry about how LAM 6.3.3b52 is officially dubbed "release candidate 1". Happiness all around.

Unfortunately, I hit ctrl-c at the jjc prompt, and all was lost. Doh. Gotta put something in jjc to prevent that from happening in the future. :-(

Suffice it to say that we're starting the formal LAM release process. I put some way-cool centralized error reporting stuff in the lamtests module (there was a lengthy explanation of it in the Journal Entry That Is Now Deceased; it's too late to re-type it all now), and generally expanded the testing base. This actually resulted in finding a few more bugs and minor memory leaks for obscure cases in LAM (which is a good thing -- yay for testing!).

I will, however, re-print an excerpt from a LAM user that I got today:

"I wrote you a while ago regarding C++ extensions for MPICH. By now we've switched to LAM. Feature availability convinced us to do so... :)"

I replied to her that all Right Thinking people use LAM. Resistance is futile.


Here's a cute one -- kudos to anyone who can decipher it:

 10001001101111010000011110011101111111010101000001100110 11001011100101110110001000001100010110010111101001110100 11001011110010010000011011101101111111011101111110000000 

It's in Darrell's .sig.


Now that Brian and Ron will run with LAM's release process, I'll head back to poggenc... By the looks of vorbis-dev, Ogg/Vorbis beta 4 is pretty close. There's still some broken things in terms of building in non-gcc/non-Linux/non-shared-library environments, so I'll keep bitching about those. :-)


There are currently 413 xmms instances running on my machine out of a total of 497 processes. 83% of the jobs on queeg are xmms.

January 25, 2001

It tastes exactly like licking a shag rug

xmms finally clobbered queeg today.

There were 536 xmms instances on queeg out of 623 total processes -- 86%. Things were running at an absolute crawl when I came back from dinner. So I had to kill xmms, ending the 6 day streak of playing "Slut" continuously. <sigh>


Found a few minor bugs in LAM today; Brian and Ron start formally testing tomorrow. Woo hoo!

I also realized that I needed to re-read and update the README, INSTALL, and RELEASE_NOTES files. Doh! I have a great example in the RELEASE_NOTES FILE, though -- check it out:

     % LAM_MPI_FOO="green eggs and ham"
% export LAM_MPI_FOO
% mpirun N -x DISPLAY,SEUSS=author samIam

I got access to a friend of a friend's BSD box for some LAM testing. He's quite a nice guy (his name is Todd), and has come through for LAM a few times before. Never underestimate the friendliness of fellow programmers on the internet.

Kudos to you, Todd! And kudos again!

And Kudos to Craig down at IU for getting us AIX access! And kudos again! Er... actually... he got us AIX access... perhaps we should be cursing him...?

All for the glory of LAM.


IRC is actually fairly interesting. The Ogg/Vorbis developers hang out in a channel on irc.openprojects.net, so I pop in there periodically to ask questions, etc. BitchX is an amazingly powerful program; I'm sure that I only understand about 2% of its functionality.


I am so fed up with ROMIO. It turns out to be pretty broken on *BSD platforms. Words cannot express.

Miles to code before I sleep.

January 27, 2001

Are you Doobie Keebler?

Ying and yang.

Ying: The LAM release cycle is under way. After some struggle, I solved a bunch of issues with the build process that had to do with automake and bizarre timestamps. We're up to LAM 6.3.3b55.

Yang: In chatting with some Ogg/Vorbis developers on IRC this evening about some problems that I have been having, it turns out that doing a parallel Ogg/Vorbis encoder simply may not be possible due to the nature of the Ogg/Vorbis encoding algorithm.

More details to follow. I don't fully understand the encoding process, so I don't completely grok what they told me; need to sleep on it.

:-(

February 3, 2001

All I ask is that you obey me like the Will of God

I have somehow angered Herman, the God of Automobiles.

Last Wednesday, I came out in the morning to my car and noticed a huge crack in it. It was fine on Tuesday night. It doesn't look like an impact crack; perhaps it was thermal stress...?

Thursday afternoon, when I drove in to South Bend, I drove up to Chez Costech and ran over a bottle that I didn't see. Not only did I get a flat, but the bottle managed to gash the side of my tire (which isn't repairable), so I had to buy a whole new one. The folks at Basney Honda were quite nice and hooked me up (they didn't even charge me labor, which was nice -- Kudos to the "Jeff" guy who worked there!), but it was $65 that I didn't particularly want to spend.

I'm worried because these things typically come in threes, and ID have 3 more long drives ahead of me (to Madison, from Madison, and back to Looieville).

Whatever I did, Herman, I'm sorry.


In other Big News, Tracy and I have finally decided on a house. Here's a breakdown of the big details:

  • 2300 square feet
  • 2 floors
  • 4 bedrooms (all on second floor)
  • Laundry room upstairs
  • Entryway off front door is open all the way up to the second floor; the stairs go around the edge
  • Sitting room on first floor
  • Dining room
  • Big kitchen
  • Great room
  • 2 car garage
  • Patio out back
  • Basement

We picked out the cabenits and countertop last week, and put some "good faith" money down on the house so that the builder would customize it for us. Yes, it's a brand new house -- not even complete yet. Tracy's working on picking out colors and carpets this weekend.

And thanks to Alan Greenspan, we got a truly awesome interest rate on our mortguage -- 6 7/8%. Rock on!!

We expect to close by the end of the month (Tracy worked out these details after I left for SBN, so I don't know them offhand). We'll spend the next month cleaning and moving in and whatnot (we'll kinda be taking out time with this), and probably move in by the end of March.

Woo hoo!


And now for some quickies...

  • Went to the Keenan Revue with Arun, Perk, and Co. Was quite fun. Some of the skits were really funny. I won't give anything away here, but the wheelchair bit was my favorite.

  • Lummy and I are heading to Madison tomorrow to visit the Condor folks. Should be a great trip; I'm pretty excited about it. I'm giving a talk there on Monday afternoon; I need to finish it!!

  • All told, Arun, Jeremiah, Brian, Raja, and I spent probably about 3-4 hours discussing quoting and shell escaping rules for LAM on Friday. Wow. In the end, we decided to punt, and only allow simple stuff -- no quoting will be allowed. Maybe someday.

  • Brian gave a talk on IPv6 at LSC lunch yesterday, which was quite informative. When LAM 6.5 gets out the door, he'll be looking at supporting IPv6 with LAM, and doing some cool things with collectives