## October 1, 2000

### I have failed

I have failed.

I noticed that one of my students -- we'll call him "Fred" to protect the guilty -- had the following process running yesterday on one of the LSC machines:

fred pts/17 Tue 7pm 3:09 telnet rodrigues-8a.student.nd.edu

I am greatly saddened; all the Righteous have long since struck "telnet" from their working vocabulary, and save it only for debugging of ASCII protocols such as SMTP and HTTP, and use some form of encryption for normal remote access (e.g., ssh).

Alas, Fred, where did I go wrong? How did I not stress the importance of security? I feel like a parent who has just found out that their child has been a habitual drug user for multiple years.

Oh yea, the way of telnet is easy -- it is fast, universal, and yea, it may be ingrained in typing habits. But the path of the Righteousness is never easy. Installation of ssh takes time (but is not difficult), and requires remembering to type "ssh" instead of "telnet" (half as many characters, I might add).

And so spoketh the great System Administrator in the Sky:

...He who uses telnet for personal use shall be damned in the fires of script kiddies. His boxen shall become IRC bots, and be owned by demons half his age. He shall be scoffed by his new owners as yet another useless academic. His boxen shall become slow and bogged down with new traffic, and there will be great wailing and gnashing of teeth. None shall hear his screams (for the Righteous do not look at unencrypted traffic).

Fred (you know who you are): you need help. If you don't get help from NDLUG, please, get help somewhere.

## October 3, 2000

### Mangos and Margins

After the whole hydra time sink, got some good things done today...

• Officially re-opened the hydra for business today.

• Someone noticed a minor error with parallel bladeenc last week, and I finally got around to checking it out (in between compiles of real work today). Turns out he found a bona-fide bug in the shutdown routines -- it only showed up under MPICH because LAM rocks (i.e., if you do a singleton init with MPICH, you get MPI_COMM_WORLD == MPI_COMM_NULL, which is icky). I noticed that I had a few unreleased things in parallel bladeenc, but I didn't release them -- I just edited a 0.92.1b4 tarball with the fix, and called is 0.92.1b5. Freshmeat announcement in in their queue. Maybe someday I'll test and release the unreleased stuff that I have in CVS, but not right now...

• I hooked John up with SSL/IMAP on www.squyres.com (a.k.a. shipman.ws -- my first non-.com hosting!). I also hooked him up with authenticated and SSL-encrypted SMTP access -- pretty cool stuff. So he can relay through www.squyres.com to his heart's content, because he's fully authenticated using SASL, and all of his traffic (not just his IMAP traffic) is SSL-encrypted. Gotta figure out how to make pine do that (encrypt and SASL-ize SMTP traffic); he's using Outlook Express.

• I hit the RedHat guys up for some free stuff for SC'2000. I hope it's not too late to get stuff from them...

• Called and volunteered at my church. I'm such a great guy. ;-)

Turns out that I'll be leaving for ND Thursday morning and staying there for about 1.5 weeks. The Stanford game is this weekend, and then I'll be staying on to meet Rusty when he comes to campus next week, and for various meetings, etc., etc. Larry Augustine is coming to ND this Thursday, and I might get to meet him. Should be fun and interesting.

I think that .com's are starting to realize that service is very important -- you can't just put a bunch of products up on an https and expect people to buy.

Random question: what happens when you put version control meta directories under version control? Apparently, that's what one former LSC student tried to find out. I ran across this directory today by accident (line broken up for web/browser display purposes, and name changed to protect the guilty):

 ~lsc/ccse/lums/Archives/Students/STUDENT/xmpibackup/RCS/RCS/RCS/RCS/\ RCS/RCS/RCS/RCS/RCS/RCS/RCS

Do you think that God uses CVS? If so, what version are we? Are we a branch, or the main trunk? Can you imagine meeting a later version of yourself? Just think of all the new, cool features that you'd have!

A: "Ah yes, this is Jeffv1.7. The current version, Jeffv13.2 is much more advanced -- it has additional pincher claws, direct audio/visual/pseudo-senseing input feeds, extra-sensory perception (v7.2), electro-skeletal implants for strength and flexibility, web slingers (not spider-man like, these are the real thing), he's on the Space Football team as first string quarterback, etc., etc. Oh, and it can code like nobody's business."

A: "Yeah, but this is better."

B: "How much better?"

A: "11.5 better."

B: "Ah, so he goes to 11 then, does he?"

### To Be or Not To Be

First entry in a while... (started this entry yesterday)

I am thoroughly worn out. I have just spent about 48 hours trying to get PBS configured properly for the hydra after all the nodes and front ends have been upgraded to Solaris 7/64 bit. I was finally [mostly] victorious, but I am left feeling cynical and wondering why I waster 2 days of valuable time when I could have been working on my dissertation. I must rant.

\begin{rant}

First, some technical details about why this was difficult, and why a seemingly simple thing took so long to do. We don't use a vanilla distribution of PBS. We use a clever patch (and a few extra executables) from Dale Southard to enable proper AFS authentication when our PBS jobs are run. This package needs libraries to perform AFS authentication; you can use the proprietary Transarc AFS libraries or the freeware krb4 libraries (http://www.pdc.kth.se/kth-krb/).

My initial goal was to build everything in 64 bit mode, because a) Curt had some horror stories about trying to run 32 bit AFS binaries in 64 bit mode, and b) it seemed the Right Thing to Do. Knowing that everything had to be 64 bit in order to link properly, I set about trying to build PBS in 64 bit mode.

It took a bit of research (thanks docs.sun.com!) to figure out how to compile in 64 bit mode in the first place. It took further research into the PBS docs to figure out all the ./configure flags that I wanted, etc., etc. (the PBS docs are somewhat hard to read, IMHO...). This all took a good chunk of time -- I just wanted to build a vanilla PBS first, and then try to build Dale's stuff (and probably recompile PBS to integrate it).

Being a forward-thinking person, I took Dale's stuff and updated all of it (because I forsee the need to do this whole process again in the not-too-distant future). I put in a proper automake process, with a full configure script to automagically figure out all the things that it needs to figure out so that you don't have to go fill in the Makefile yourself. That took a while, but I believe that it was worthwhile to do.

After as bunch of experimenting and poking around, I determined that the provided Transarc libraries are all 32 bit. Useless. So I went and got the krb4 package, and tried to compile it in 64 bit mode. Unfortunately, krb4 didn't want to compile in 64 bit -- it complained about some missing types.

<sigh>

So I said "Fuck it, I'll just build everything in 32 bit mode. Who cares?"

And I did.

And it worked.

...sort of.

The PBS mom's would periodically randomly die. I figured that it was because Dale's PBS patch had bit rotted, and was causing badness in the mom. So I put in all kinds of syslog() calls trying to track down where the problem was. I never saw any of the syslog messages. It made me think that the problem wasn't with the AFS code (!).

Luckily, Bob Henderson of PBS/Veridian, came to my rescue and informed me that if PBS is to be run in a 64 bit environment (like Solaris 7), it too, must be compiled in 64 bit mode so that it can read /proc properly. Without that, PBS will surely crash.

Arf. So now I have to get everything to compile in 64 bit mode.

krb4 took some tweaking (it's missing some typedefs that don't appear to be a problem if you compile in 32 bit mode -- go figure), but I finally got it compile properly.

Dale's stuff also use the RSA encryption routines from rsaref, so I had to compile that, too. Wow -- that thing must have been written a long time ago, 'cause about nothin' is standard. It's weird as hell. For example, it compiles to rsaref.a, not librsaref.a. Weird...

After that, I got Dale's stuff to link properly. It wasn't until much later that I discovered that rsaref wasn't happy in 64 bit mode. Trying to generate some keys, it sat and spun endlessly instead of actually producing output. Dale actually rescued me here
-- he pointed to a web page that indicated that there is a bad typedef for UINT4 in rsaref/source/global.h that is an unsigned long instead of an unsigned int
-- hence, in Solaris 7/64 bit mode, it was coming up as 8 bytes instead of 4 bytes. Changing the typedef and recompiling rsaref fixed everything, but figuring that out and fixing it took quite a while.

Dale's stuff links into the PBS mom, and I had some serious linker issues here. Turns out that both AFS (krb4) and PBS define routines for some MD5 stuff. It took a bit of creative side stepping, and changing Dale's patch, but I finally got it to work right.

After that, I had ended up creating 3 different PBS configurations: one for the PBS server (heracles), one for all the PBS client machines (athos, etc.), and one for all the hydra nodes. And I wrote a script to install each one. Not difficult, but not trivial either -- it took a lot of iterations to get the three scripts right.

So all in all, it took the better part of 2 days to get this all figured out an working properly. Ugh.

So why am I unhappy about this? It's not the work -- I don't mind that. And I learned some good stuff while doing this. But I really need to be working on graduating. And this is not such work. Even worse, we're doing this for people who don't care -- they expect that we do this. They use the hydra much more heavily than we do, but we have to take all the pain of administrating it.

Don't get me wrong, I like all of our users -- they're nice people, after all -- but they all don't have a clue as to how much work it takes to keep it running (which, in retrospect, is the mark of a good sysadmin). We're basically doing this out of the goodness of our hearts, and losing valuable time because of it. After about 10am this morning, I couldn't help thinking to myself, "Why am I doing this? Is anyone going to notice? Is anyone going to care?" These questions are quite cynical, and reflected my frustration at the time. Indeed, the answer to the second question is "no", which is one of the reasons that I can say that we are good sysadmins (more about this below).

Just to be clear -- I'm defining "good sysadmin" from the viewpoint of a user. Users who have a good sysadmin barely know that they have a sysadmin; for the most part, things just "work". They don't have to keep continually updating their personal work habits to work with their computing environment. i.e., there is one environment, and it stays more or less uniform so that after users make the initial adjustment to work within it, they are rewarded with a fairly constant look and feel. This is not a hard and fast definition, but I think you can get the sense of what I am trying to say.

Bad sysadmins have exceptions for foo, you have to update your .cshrc to get the new version of bar, have no plans for uniform distributed environments, no backup schedules, no cohesive set of services, don't check their system logs, etc., etc. Users, however, unless they have had a good sysadmin, don't know the difference. In a society that tolerates (nay... expects) to reboot a Windoze machine multiple times a day, having exceptions for foo, or needing to type the full pathname to get the new version of bar seems like no big deal. It's extra pain that I (the user) must go through to do my real job; that's just the way it is. Users don't realize that it can be better.

But is this bad? If people don't realize that they have a sub-optimal arrangement, and just get used to dealing with the constant change, some things working and others not -- if they really don't know any better, what difference does it make? Probably little. However, I think this disturbs me philosophically at some level.

I have walked into 2 professional organizations where I worked as a system administrator (both, coincidentally, for the army). Both had horrendous (IMHO) sysadmins before me. Here's an example conversation that I had on my third day in the second organization (in a networked Unix environment):

Me: "I've installed the new version of Netscape; the one that was out there was a few versions back from the current release."

User: "Great! How do I access it?"

Me (puzzled): "What do you mean?"

User: "How to I bring up the new version?"

Me (still puzzled): "Well how do you bring up netscape now?"

User: "It's on one of the pulldown menus in my window manager."

Me: "Just access it the same way -- the next time you fire up netscape, it will be the new version."

The user literally sat there blinking at me for a few seconds. He had no concept of just doing the same thing and having updates automagically appear. This is one of many examples as to why I maintain that they had a bad sysadmin before me. Not that I'm self-aggrandizing, but doesn't it seem odd that when I announce the installation of a new version, the users assume that they'll have to do something different? ("You mean I don't have to reboot my unix machine multiple times a day? Why not? I think I'd still feel better if I rebooted it anyway." -- actual quote from a user when their desktop workstation was converted to unix)

The goal of good sysadmin is not only to keep everything working, but to hide as much of the work as possible from the user. The users have enough to worry about; they have their real jobs to do -- they shouldn't need to worry about fighting their computer to get their job done. It's the sysadmin's job to keep the computer running and make all of its services [relatively] easily accessible to its users. A sysadmin who does not make a "seamless" (i.e., as much as can be -- it cannot be 100% seamless) work environment for users is not doing their job, IMHO. A computer is supposed to be a labor-saving device -- this should be the sysadmin's mantra. More to the point, the technology itself should not make user's work harder than it already is (and I'm not talking about the evolution of using hand written foils that took 5 minutes to create to picture-perfect powerpoint presentations that take endless hours to create -- this is artificial demand that has been created by users; this is a different discussion).

Hence, the two organizations where I have done professional sysadmin (outside of ND that is) -- and again, I'm not trying to be self-aggrandizing -- now have a completely different view of sysadmin. They now expect a lot more from their sysadmin (as they should, IMHO). They don't want to fight the system anymore, to have to remember the three different ways to access netscape, etc., etc. They just want netscape, and they just want it "to work".

So how does this all tie in to how I'm annoyed with the hydra?

Well, to be blunt and arrogant, we've done a pretty darn good job with the hydra. Yes, we've screwed up a few times -- :-( -- but all in all, that system is pretty darn reliable and uniform. It "just works" for the most part, and users have had very few complaints. As such, our users have never had bad sysadmin. I dare say that we are under appreciated mainly because we set the initial level of service too high (most of the credit actually goes to our boss, Lummy, who infused me with many of the qualities of good sysadmin that I described above early in my graduate career, and I have done my best to pass these on to other grad students. i.e., these qualities don't just apply to sysadmin; they apply to research [and so on] in general. But that's a different conversation...).

I cite two reasons why we are "good" sysadmins:

1. Our users have no concept how much work it takes to run the hydra (sure, once it's running, it pretty much runs itself, but, for example, this weekend's upgrade to the combined resources of two good sysadmins for multiple long-hour days to accomplish).

2. Multiple groups have come to us asking if we'd sysadmin their cluster for them.

The second point kills me; it's further proof of the first point. Being a sysadmin is not our business; being a grad student is our business. If I were being paid to be a sysadmin, I'd be happy to do it without complaining. But how many other research assistants have to put in double digit hours a week on keeping their own (and others'!) systems running? This is the job that someone should be paid to do, not a job that someone should have to do in their spare time, or at the expense of their real job.

Not that I'm faulting anyone here -- indeed, I have learned a lot as a sysadmin over the years, and I honestly think that it has made me a better computer scientist. And I do seem to recall that we volunteered to sysadmin the hydra, etc., not really realizing what a big job it would be. As such, it's probably our own fault for raising user expectation levels so high -- they've always gotten this service for free, and don't realize that sysadmins out in Silicon Valley get 6 digit salaries to do what we do.

Indeed, we have taken pretty much the same attitude with the LSC software trees for Solaris 7. That is, for Solaris 2.5.1 and 2.6, the LSC had extensive software trees out in AFS that many, many users at ND used because the OIT-provided software trees were inadequate. The OIT trees were out of date, didn't include all the software that we needed, etc., etc. Hence, we made our own trees, and maintained them fairly well. For Solaris 7, we have pretty much refused to do this because -- just like the hydra -- it is just a time sink. We end up supporting all kinds of people instead of just us, and this takes time away from our real jobs. So if we end up with software trees for Solaris 7 (we haven't really yet, because the OIT has some Quality people who are actively being good sysadmins), we might very well lock them to LSC personnel only. We're not in the sysadmin business... but people think that we are.

More to the point -- look at the investment/reward ratio for the hydra. We barely use the hydra. The main users of it are CHEGs and civil engineers. We use the hydra for some development work, but we don't consume the vast majority of cycles on it. But we're still investing huge amounts of time in the hydra when we get very little back out of it. This is time that could be spent elsewhere, doing things that are relevant to our own work, not others' work.

So at the end of this long ramble, I guess I have no one to blame by myself. When you provide a good service (which is rare in today's society), people expect it to keep going -- especially when it's free. Indeed, our users don't even have the concept that such work should cost money. The very aspects of what makes a good sysadmin create a self-perpetuating cycle of raising the level of service that is simply not sustainable by someone who is not a full time sysadmin. However, we never put down limits on what we would do as sysadmins, so we continually fed the fire of user demand without realizing that we were damning ourselves even more.

As such, I think it is time for us to gracefully back out of this business. We have provided a good service for several years to our users, but we just can't do it anymore. Indeed, Lummy has already initiated this process -- he assures me that we will not be sysadmining the hydra past 1 Jan, 2001. I do feel badly about this, because I feel a commitment to my users, but it is necessary for us to continue our real jobs. :-(

\end{rant}

Please keep these comments in context -- I have no ill feelings toward any of our users at all. Indeed, I find most of them to be very friendly people, and we have gotten along well with them for several years. And I don't think that I'm the best sysadmin in the world; in the field of computer science, there is always more to learn. I've met some highly accomplished sysadmins who make my sysadmin knowledge look like belly button lint. I'm not trying to make myself sound like the world's best sysadmin; I only mention these kinds of things to give my rant a frame of reference.

The above rant is just out of frustration because I'm trying to get my own work done, and can't because of artificial demands placed on me. Miles to code before I sleep...

### San Demas High School football RULES!

I hit a problem the other day: how to tell when a machine is down? i.e., given a random IP name/address, how do you tell that it is down without a really lengthy timeout?

For example, try sshing to a machine that is down. Not one that doesn't have ssh enabled -- that rejection is almost immediate. And not to an IP that doesn't exist, or isn't reachable by you -- those rejections are also immediate. But ssh to a machine that is currently powered off, or not connected to the net.

It can take a long time to timeout.

A Solaris 7 machine takes almost 4 minutes (3:45) for a telnet to timeout to a host that isn't there. It takes 15 minutes for ssh to timeout (again, on Solaris). Quick testing showed that the majority of the 3:45 time was spent inside a single connect() call.

But a linux machine takes 3 seconds for telnet to timeout to a host that isn't there. What's it doing differently? How can it tell so quickly that the machine is not there?

Interestingly enough, the Solaris telnet reported "Connection timed out", whereas the Linux telnet reported "No route to host". So they're definitely doing something differently. Hmmm...

I ran my connect() test on both Solaris and Linux, and the results were identical to telnet -- Solaris sits for a long time on connect(), and then eventually times out. Linux only sits for a few seconds in connect() and then returns with a "no route to host" error.

Hmm. If connect() does not report the same error in the same way across multiple OS's, how do I do this? Indeed, Linux's behavior is great -- but what do I do on Solaris (and anyone else who doesn't return in 3 seconds)?

I got to thinking about the problem, and decided to look at some network and hacking tools. ping was my first stop. ping works in interesting ways. I didn't realize that it had its own protocol stack (like TCP and UDP). It works like this: you open an ICMP socket (you don't don't bind it to a port). From that socket, you send packets to the ping recipient. The ICMP stack on the other side will reply right back to you. Here's the catch: all ICMP replies come to a single point -- so if you have multiple ping programs running simultaneously, they'll see each other's ping replies (makes sense, if you think about it). Hence, you have to put some encoding in the payloads of the ping requests (which the remote ICMP stack will echo right back at you) to know which requests are yours and which you can discard.

Hence, here's a nice way that you can tell if a machine is up --
send it an ICMP packet. If you don't get one back in a relatively short timeout (probably even user-runtime-settable), rule it as "down". No problem.

Wait -- there's a catch. You have to run as root, 'cause the ICMP stuff is protected. Crap. We don't like setuid programs.

nmap was my next stop. They've got all kinds of goodies in there. SYN scans, FIN scans, etc., etc. They note, however, that many of these are not available to non-root users. Hence, they try the connect() thing as well when a non-root user tries to scan a machine. Again, Linux bails in 6 seconds saying "machine is not up" (this must be due to Linux's short connect() timeout). Solaris, however, takes much longer -- 1 minute. But it is significantly less than 3:45 that we saw in both telnet and the raw connect() call.

Some poking around in nmap revealed the following:

• It's actually pretty small; only a dozen or so .c files. For something as full featured as nmap, I would have guessed that it would have been larger. Who knew?

• It seems to be pretty well coded -- I could actually code the code pretty easily. They have good voodoo; color me impressed.

• The non-root ping scan tries a connect(), but does it in a non-blocking way, and repeatedly uses select() to check if the connect() has finished yet. A neat trick --
this allows them to set their own timeout (evidentially somewhere around a minute; I didn't bother checking what it actually was).

So I'm going to have to try this -- code up my own non-blocking connect() and put it in my threaded booter and give it a whirl. Too tired right now, though -- this will be tomorrow's activity.

"I'd actually like to see a non-blocking MPI_WAIT."
- Shane Herbert, MPI-2 Forum.

## October 5, 2000

### Blueberry pineapples

Candles from Pier 1 seem to burn poorly. I will not buy any from there in the future. But then again, perhaps it is Louisville's great altitude above sea level...

Got all the nmap stuff working in my threaded booter. Cool stuff!

Tried to import boost into my project today so that I could start using GGCL and a cool progress meter class that they have, but I was sadly disappointed in the usability aspects of it. For one thing, it extracted itself in ".", not in a separate subdirectory. Then there is no README or INSTALL files, no Makefiles, no configure, no nothing. Just a bunch of files and you're left to figure out how and what to use. Disappointing.

I started a rant about this on the boost list, and one guy is being somewhat silly. I decided to wait a few hours before responding again just so that I don't really start slamming him; I am new on the list, after all.

I watched the Voyager season premier tonight. Good stuff. Left some hooks for later in the season, too. Could be very interesting --
this is the last season, after all.

Brian reminded me that I totally forgot to put the XMPI hooks into LAM. Doh. So I spent an hour or two on that tonight. Adding a single function in LAM requires many things:

• A new file in share/mpi with the body of the function

• Modify share/mpi/Makefile.am to add the new file

• A new fortran binding for the function in its own file in share/mpi/f77

• Modify share/mpi/f77/Makefile.am to add the new file

• If adding profiling versions of the function, add entries in share/pmpi/Makefile.am and share/pmpi/f77/Makefile.am

• Add a new "block" type (essentially an enum for that function) in share/include/blktype.h; shift the hiwater block type up to accommodate the new function

• Add a new string for that enum in share/etc/blktype.c

• Add the appropriate prototype in share/include/mpi.h

• Add the four name #defines in share/include/MPISYS.F (eight if doing profiling versions of the functions)

• Write a man page for the function in its file in share/mpi

It's off to South Bend in the early AM tomorrow. Miles to drive after I sleep...

## October 6, 2000

### Enlightenment

It's 7:18pm, SBN time.

I'm in the LSC. Some time ago, DJ-Jazzy-Arun was spinning up the MP3s, and a hauntingly beautiful modern rock ballad kind of song came on. Entranced, I asked the DJ Man, "What song is this?"

He replied, "It's that song 'Slut' that I was raving about not long ago. I used to listen to it 50-60 times a day, but I'm much better now."

"Good," I said, "Put it on repeat."

That was a while ago (where "a while" >= 2 hours). It's still playing.

I don't know how many times I've heard the song now, but it's not enough. Amazing.

We need more speakers in here.

## October 7, 2000

### Caffene-free Microsoft

Didn't get a lot done research-wise today, but it still seemed like a good day.

I made some progress in LAM; cleaned up a little code, made a fix that a helpful LAM user suggested, etc. We currently don't have a hope of compiling LAM with a C++ compiler -- it was originally written with pre-ANSI function declarations. As such, there are still billions of them throughout the code, and it would take a long time to convert them all the real ANSI declarations (which C++ compilers require). Don't quite know what to do here -- it doesn't seem like it would be easy to write a scripty-foo to automagically convert everything... Harumph.

Talked with Jeremy about boost things; reorganizing the directory tree, a potential build process, etc. I sent our ideas to a guy on the boost list who I was discussing this stuff with. He replied, but I haven't had time to look over what he said yet.

Talked with Arun about LAM progress. Seems like it is going well, but annoying mid-terms will halt its progress for about a week. Similarly with Brian and XMPI.

Went to Larry Augustin's talk today. No real shockers in his talk
-- we've heard most of it before (open source will save the world, etc., etc.), but it wasn't a bad speech, I suppose. Others didn't like it at all. Oh well.

Had to make a command decision on the SC2000 paraphernalia today --
the company couldn't do beach balls in the time that we needed them. :-( So we opted for footballs; we'll see if they can do those in time.

Arun and I listened to "Slut" for several hours this evening. Wonderful. The song is not what you would expect at all -- it's quite hauntingly beautiful. I suppose that my image of the song would be shattered if I actually listened to the lyrics and found out that it's some kind of pig-worship satan song or something. It's amazing how I could listen to that song on repeat for hours on end and not be able to tell you a single word of what they were singing. It's that good.

I opened up the LAM/MPI CVS tree for anonymous read-only access tonight. We'll see if people actually check it out...

## October 11, 2000

### Fuzzy dice on a motorcycle

I've fallen behind on my journal entries. Cope.

Brief synopsis on dissertation stuff: got the threaded boot "fully" working. It still sometimes hangs in a very large boot (e.g., .helios scale boots, ~148 hosts or so) near the end. I suspect that someone drops out of the mesh before the boot finishes, but I haven't had a controlled failure yet to check the logs and see what is really going on. Additionally, sometimes a given node drops out when we do arbitrarily large numbers of children (e.g., at 12, foster.helios.nd.edu somehow decides that it doesn't want to boot. I don't know if this is an artifact of foster's parent screwing up [e.g., running out of file descriptors], or if foster itself somehow legitimately getting hosed. It's hard to tell, too, because all of these machines are actively being used when I do my tests :-).

I had to spend a good amount of time writing the jmesh algorithm into the code. I was using the Boost Graph Library, which was written by Rich Lee and Jeremy Siek here in the lab. However, as is always a danger with developing code, the APIs and concepts are continually changing, and the docs that I have (i.e., the book that Jeremy is writing on the BGL) is not consistent with what is available for public consumption at www.boost.org. Additionally, Jeremy's local CVS copy changes stuff even further. As a result, I spent a long time before I actually got it working. Arf.

However, I did come up with an iterative method to generate a list of edges in a jmesh that doesn't use any lookups at all
-- it just generates pairs of vertex numbers, and then we smush those into the constructor of a BGL graph. As such, it's considerably faster than the version that I wrote before -- the prior version would go to each vertex, check how many edges it had, determine if it needed more, etc., etc. This new version just does bookkeeping as it goes along with a small number of integer variables, and All is Well.

Now that I've finally got that working, I can add the stuff for all the nodes to make the connectivity implied in the jmesh, drop the boot tree connectivity, and then sit there waiting for commands. Not for now...

Unfortunately, however, Bill from NIST sent around an e-mail today saying that we'll be having a conference call about the SC'2000 IMPI demo on Friday morning. Doh!!! I haven't done butkis on IMPI yet. I've got to do the following:

• Finish implementing the attribute color stuff per IMPI section 2.5
• Implement MPI_BARRIER on IMPI communicators
• Make LAM/MPI compliant with the IMPI errata
• Get the pmandel demo code working with a few instances of LAM

It would be Really Good to get this all working by the call on Friday so that forward progress can be claimed...

Sidenote: it's really been quite a while since I've worked on IMPI. I am finding out how much I forget about how it works. Doh!

Saw a great quote in the IMPI code today:

Honk if you love silence.

I even remember putting that comment in there. Classic. :-)

Found two interesting python MPI projects:

I don't think that either project's main goal is formal python MPI bindings, but instead have some main "real" project that is [at least partly] in python, and they wanted to use MPI. I conversed with the sourceforge project author (at Lawrence Livermore); they're actively using it. I asked if there will ever be a formal release (all that's on sourceforge is CVS, not a real distribution). Haven't heard back yet.

Tony Hagale got my journal up and running. Woo hoo! Not entirely pain-free, though. Had to upgrade his C++ compiler, etc. He had some initial problems with quoting, as well. Not quite sure if that was a local configuration issue or a bug in my code ('cause it doesn't happen to me :-).

Started running MojoNation on squyres.com. Speaking from a distributed/crypto standpoint, that's some really cool shit!

Much work to do to get IMPI into shape. Miles to code before I sleep. Rusty will be here all day tomorrow; he's giving a talk on MPICH's daemon, and then Lummy and I are going to the LaSalle grill with him for dinner (yummy). Should be quite interesting.

### A reddish green

I completely forgot to mention Stoopidcomputing things...

We're giving out cool freebies at SC'2000. The orders went in this morning:

• The pocketknives got nixed. With extreme prejudice.

• 500 LAM LED-light keychains. They'll be translucent blue and have a white LAM logo on them.

• 900 mouse pads (I don't know what the hell we're going to do with the extras -- having 900 mouse pads in one place just sounds like an inherently dangerous operation. Are there FCC rules against that?). They're all LAM/MPI mouse pads, with the LAM logo and URL in the top right, "Dept of CSE/ND" propaganda (phone, fax, URL) across the bottom (Kogge paid for it all, after all), and a bunch of MPI function bindings across the majority of the surface area.

The cool thing is that we've got three flavors mouse pads (300 each):

• C
• Fortran
• C++

That is, they vary in the language of the bindings that are on displayed on the mouse pad. We're actually predicting that the fortrans go much faster than the C or C++ ones.

Anyway, it's all cool stuff (mainly working on the assumption that if it's free, it's gotta be cool). Should be a fun time down at SC'2000. A picture of our booth is available at http://www.indiana.edu/~rindiana/. A map of where we'll be located on the show floor is at http://www.sc2000.org/exhibits/floor.htm (scroll down to the bottom -- we're a purple booth, number R701).

## October 13, 2000

### Fuzzy ethernet

Some food for thought.

PBS is just plain sucking. It's unfortunately been flakey ever since we upgraded it. :-( I did find a bug in our AFS/PBS shepherding code a few days ago that resulted in tokens being allowed to expire during PBS jobs that ran longer than the length of your initial token (which I think it defaulted to 10 hours, regardless of what your real default is), but that was our fault, not PBS's.

Yesterday, there was one job that was "stuck" in the queue and wouldn't die. The job was long done and gone, but PBS thinks that it's in an illegal state, and won't let it leave the queue. Hence, the node that that job was on wasn't released. Today, there are many more jobs like that (but those jobs are still running). I have no idea what the problem is, and I'm kinda annoyed.

We asked again for PBSPro (i.e., the commercial version) -- we first asked about 3-4 weeks ago -- and the PBS guys replied that it was taking them longer than they thought to setup their online store (even though PBSPro is free for educational users). :-( I'm kinda hoping that PBSPro will fix some of this flakiness that we've been seeing. :-(

Rusty from Argonne was here yesterday. His talk was good; I'd seen most of the material before, but it was good stuff anyway. We had good chats with him about optimizing MPI collectives (there are some really cool algorithms for this out there..), the future of LAM and MPICH, MPICH's Abstract Device Interface (version 3), my threaded booter (I gave him a copy of it, too), MPICH's mpd, etc. We had dinner at the Lumsdaine Grill, because Someone forgot to get a babysitter so that we could go to the LaSalle Grill. Ah well -- it was a good home-cooked meal, so I shouldn't complain. :-)

I downloaded the ADI-3 document, and it's huge! Compared to the spartan RPI (request progression interface) approach in LAM, ADI3 is a gargantuan.

I just noticed a post on the Beowulf list -- someone posted LAM vs. MPI/Pro (a commercial MPI) vs. MPICH results. The TCP numbers are clearly in LAM's favor. This, obviously, is because LAM rocks. However, MPI/Pro and MPICH have VIA results (which are obviously better than TCP results)... we need a VIA device... You see the results for yourself. LAM ROCKS!!!.

I've been working on IMPI stuff this week. I got the IMPI attributes on communicators working (i.e., on MPI_COMM_WORLD -- since we don't do anything other and MPI_COMM_WORLD yet, we don't have to maintain these attributes on other communicators, which would take some additional bookkeeping, because relative rank order can change, etc., etc.). I also got MPI_Bcast working in fairly short order.

I noticed a good number of typos and one inconsistency in the IMPI standard. Hence, I am proud to say that I am personally responsible for every item in the IMPI errata document. Well, ok, I only helped discover the first one (an issue with the protocol hiwater/ackmark values), but I still had a hand in it.

This is all for the SC'2000 IMPI demo with HP and MPI/Pro -- we're going to run a GUI Mandelbrot program across all three MPI implementations. Should be pretty cool, actually. We had our second teleconference today, and things appear to be going well. We plan to test the stuff across the internet next week. HP and MPI/Pro have been using LAM to test their IMPI implementations. I gave them instructions for CVS access today, so that they can get the MPI_Bcast and color stuff.

I just can't help it -- LAM ROCKS!.

Seriously, though, it is very cool to be working on a project that matters. That is, LAM is probably only used by a few thousand people around the world (at most), but there are many devoted fans who use it every day. Indeed, many people's software relies on ours to function properly -- much real-world depends on what I do in LAM to function properly. It's very cool.

The level of responsibility can be a bit scary at times (indeed, I remember the first time that I noticed a .mil site downloading LAM; I told Lummy about it, and he just smiled and said, "sleep tight!"). Real world stuff uses my code. Hence, if I fuck up, Bad Things can happen. For example, I know for a fact that companies like GE and Exxon use LAM/MPI.

But isn't this the level of responsibility that a good engineer should embrace? I think so. Being Careful about what you do is not just a state of mind, it is a way of life.

Saw a talk from Vince's advisor today about link-time optimizations. Interesting stuff. Similar to things that are available in Solaris (e.g., -O5, where multiple runs generate profiling feedback data that speed up subsequent runs), but it was neat to hear how it works. He was using it in conjunction with MPICH, so I set him straight in his ways -- since they're using TCP/IP, if they really want asynchronous message passing, they should use LAM since we can do it (via the lamd mode, which has its own tradeoffs -- the asynchronous message passing mode isn't free, so to speak).

He sounded intrigued, and said that he would get the latest version of LAM and give it a whirl. And so we progress, one user at a time, towards world domination...

Well, ND's network is going to start shutting down for maintenance in about 15 minutes, so I'm outta here. Next journal entry will be from home.

## October 14, 2000

### Calamari airlines

ND vs. Navy -- finally a fun game to watch.

Aside from two big mistakes the defense made late in the game (and to be fair, it was at least our second- or third-string who don't have too much experience), we dominated the game. Those are what I like: boring and dominating. This is the whole reason that the wave cheer was invented -- the fans need something else to do to occupy their time.

But CBS's coverage of college football really sucks. They don't get good angles, their camera operators get faked out and don't follow the football, they rarely show replays (even on penalties). And their announcers talk more about anecdotes than about the game that is being played. They suck.

NBC's games take forever, but you get the whole nine yards (hah!) with them -- tons of replays, game strategy speculation, etc., etc.

In other news, ND's network seemed to come back up without much of a hitch. I was on briefly at about a quarter of one this morning and it was back. And the latency from squyres.com to nd.edu seems to be a lot better (granted, there's no students on campus right now, so traffic in and out of nd.edu is probably pretty low. But at least I'll probably have good connectivity for the next week. :-)

## October 15, 2000

### Clairvoyance and Corn Flakes: Coincidence or Fate?

Last night, Tracy and I went to see a local production of Dracula. I'm a big fan of theater, especially after having done a bunch of productions in stage crew in both high school and undergrad college. The production was actually quite good -- it was theater in the round, with a fully-functional single set.

The technical setup was actually quite impressive (being an engineer and an ex-stage crew type, I tend to notice these things). I couldn't find the control room, for example -- it was that well hidden. Or perhaps the control room was distant from the actual production area, and the techies watch by video (I'm guessing here, but that would be a pretty cool setup).

This production had a few extra twists that separated it from others that I have seen. For example, Lucy had a female friend, Nina, who died before she did. Nina came back as a vampire and started attacking children around London.

Props to a bunch of the special effects, too:

• Some various pyro, bangs, pops, flashes, etc.

• Using deep sustained bass noise, very hard to hear -- the kind of sound that you subtly feel rather than hear -- that created a feeling of dread and fear. Very cool.

• The professor killed Nina with a wooden stake through the heart while she was sleeping in her coffin. Since it was theater in the round, it actually happened right below me -- not 10 feet away. The stake actually appeared to go into Nina, and blood squirted everywhere. Again, very, very cool. That alone made the price of admission well worth it -- who wouldn't pay to see a beautiful vampire seductresss screaming in the throes of death, with blood squirting everywhere?

• Once or twice, there a character had a sudden moment of clarity and realization. The clock in the corner of the study suddenly got very loud (tick, tick, tick), as if the focus of the world suddenly got very narrow. And then the ticks got subtly farther apart
-- creating the illusion of slowing down time, and heightening fear.

• Dracula "disappeared" at one point by means of what I assume was a hydrolic trapdoor in the floor of the stage (I caught a glimpse of it). He was surrounded by a cloak, which suddenly fell to the floor, and he was no longer in it (having been in theater for a while, I was proud of myself for anticipating the classic misdirection designed to make you look away from him for a second while his head disappeared downward -- no one else that I was with noticed it). Most excellent.

• In the final scene, where they drive a stake through Dracula's heart while he's sleeping in his coffin (more blood squirting everywhere -- yummy), they kill him, and then close the coffin. A few seconds later, his hand pops through the top of the coffin in a feeble attempt to strangle the professor, who successfully evades his grasp. Seconds later, they open the coffin again to really kill Dracula, but all that is there is a skeleton. Cool!

All in all, a good production. The actress who played the maid was a little weak, but the badass transformation of Count Dracula to a Vampire (multiple times, too!) made up for it.

The Director's Cut of the movie The Abyss was on TV tonight; I hadn't seen it in quite a while. Most people aren't aware that there is a 10-15 minute sequence at the end that was chopped from the version that was released. It was all about war and violence in the human race (a sort of commentary on today's society), and how the water people almost killed everyone on the planet with enormous tidal waves. With this sequence, much more of the movie makes sense.

I'd advise renting it to those who haven't seen it -- I'll give it a rating of 10 minutes.

We finally finished all of our thank-you notes from the wedding today. Woo hoo! We had gin and tonics in the excellent ND drink glasses that Brian/Arun gave us.

And speaking of alcohol... I think Arun's proclamation of not drinking until Momar's 40th anniversary is a sham!! He admitted in his journal that he had Kalua pancakes, and later had One Enormous SuperPankake with some kind of flavored liqueur in it.

Hence, I think Arun's thin guise of "not drinking" has fallen away
-- we now see him for the closet alcoholic that he is. Was it really "Sprite" that he was drinking all Sophomore year (by the gallon, I might add)? Does he really like "water" and "Dr. Pepper" that much? I think not, gentle readers. Yes, it's true
-- Arun was even kicked out of the 1996 Olympics (Bulgarian all around gymnastics team), for his excessive indulgence in what he called "pixie sticks", and "Mr. Pibb". Said Mr. Rodrigues at the time, "I just love pixie sticks and Mr. Pibb. Don't knock it until you've tried it! Now don't bother me -- I've got to go practice my Triple Lindie."

(...catch the rest of this exclusive story in a special expose section in this week's National Enquirer)

My fricken' router has frozen 5 times tonight. Destroyed a good uptime, too. It seems that one of the NICs is getting overloaded (I'm trying to ftp/scp/whatever 4.5GB from my router to my desktop, which hangs the machine after a while). Sucks!! I don't quite know what to do about this yet -- I need to get that data over to my desktop so that I can burn a CD of it. Arf!

In other linux woes, during one of my router crashes this evening, it caused the xmms on my desktop machine to freeze. So I did a "ps" to kill it. I found no less than 662 copies of xmms running. No joke.

My desktop has an uptime of over 37 days, and I've been logged in to a single KDE session for probably over half of that time. I guess there's some kind of leak in xmms that's causing that to happen. Weirdness. For example, I see that there are already 11 copies on my desktop now.

Some testing shows that a new one appears every time a new song starts. I'll bet that they are terminated-but-not-reaped threads (remember: linux emulates threads with duplicated processes). <sigh> Open source software can suck sometimes. :-(

Did some LAM work today. Turns out that I was a bit sloppy and checked some crap back into CVS that didn't work. Oops. :-( Caused Arun a bit of pain, too. Double oops. :-(

But it's fixed now -- it compiles (and seems to work) with and without IMPI support. I also added some stuff for XMPI to drop communicator name traces during MPI_Init for MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT (if it exists). I added man pages for MPI_*_set_name and MPI_*_get_name, too, just for good measure. I've got to finish the IMPI extensions to MPI_Reduce tomorrow.

Found a new "hauntingly beautiful" song today. It's not quite "Slut", but it might be close. It's Tori Amos' "Carnival", from the MI-2 soundtrack. I've put it on repeat, but my router (which streams my MP3s to me) has been rebooting, so I haven't heard it continuously enough yet. I'll keep you posted.

## October 16, 2000

### Pumpkin flavored telephones

Happy happy, joy joy!!

The amazing show News Radio is now showing down here in Louisville!

(or, more specifically, I just found out that it is showing down here in Louisville -- it may have been here for some time)

It's on A&E at 6:30pm and 12:30pm.

After fighting PBS all day, I am bounding with joy to be able to watch News Radio again (the floor of pi is 3).

Life is Good.

Happiness.

## October 18, 2000

I have a beef with Kentucky. "Where's the beef?" you ask. Right here.

I was out and about today, running errands, as the time approached noon. Feeling a little peckish (that's "hungry", you pervert), I was hit with a craving for one of those new Kentucky Fried Chicken wrap things -- a few chicken strips in a burrito wrap-thing, along with some lettuce, tomatoes, and some sauce. I had one when they first came out a month ago or so, and they were quite yummy.

So I decided that I'd like to have one today for lunch.

So I'm driving around, and driving around... and driving around...

I can't find any flippin' Kentucky Fried Chicken restaurants anywhere!!

What the hell? I mean, I'm in Kentucky -- you'd think that there'd be one on just about every corner (just like Waffle House
-- man, those things are everywhere!). But no. I could not find a single one. I know of precisely one KFC restaurant around here, and it was a good distance from me (probably about 15-20 minutes driving time), and I wasn't about to drive all the way out there.

I was so flabbergasted that I couldn't find a KFC that I had to settle for Wendy's (yummy stuff, but not my first choice today).

How can you not find a KFC when you're actually in Kentucky? I'm beginning to think that KFC actually has nothing to do with Kentucky at all -- COL Sanders is just a sham. He was probably a Canadian (them's be shifty types; can't trust 'em). I can easily imagine that KFC is a Canadian plot designed to subterfuge the American public into thinking that "take off, a" is a normal expression.

Just my opinion.

### Haiku for you

Some amusing haiku (haikus?) between myself and Kevin Barker today and yesterday:

From me:

To extern or not
Beep, crash. EOF

From Kevin:

Very nice haiku
Keep up the good work

From me:

-mt is good You must use it everywhere LAM/MPI, too

From me -- an old one, but I still love it:

MPI_RECV
MPI_SEND, oops CANCEL
MPI_ABORT

## October 23, 2000

### Is he related to Jack Pontiac?

I just watched News Radio (brilliant show) -- the "Cane" episode. Amazing, as always.

But what really blew me away was that A&E apparently has a totally different opening credits sequence than I have ever seen before. A brief recap for listeners just joining in...

I noticed a week or two ago that News Radio was on down here in Louisville on A&E (actually, Tracy pointed it out to me). I have happily been watching ever since, every night at 6:30pm.

Er... well... actually, it's usually around 6:32:07pm, when I look up from my coding, shout, "Gadzooks!!" and race off to the television to catch this fabulous show. Hence, I hadn't seen A&E's opening credits yet. Until tonight.

Tonight, I was actually a little hungry, so I emerged from The Coding State around 6:27pm. I heated up some soup (gas ranges are awesome -- I highly recommend them over conventional ranges. Buy GO, of course), and made it out to the TV by 6:30pm. I was delighted to see that the "Cane" episode was on (I've seen it several times before). But I almost spilled my soup when the opening spiel was over and the credits started.

It has shots of New York, cabs, busy people walking around... nothing like I'd ever seen when I watched it in South Bend (Philadelphia, too, I think). It just goes to show you -- even something that you have enjoyed for a long time can have new twists and turns to keep it exciting.

It's the little things in life.

That, and good parking spots. That's what makes life worth living.

### You insult me.. and of course, my cane.

It's been a few days since I did a journal entry, mainly because I've been traveling. Let's catch up...

Left on Friday night to go to Chicago. Tracy and I flew Southwest from Louisville to Midway. Flying Southwest is an interesting experience. It's a cross between the best of "People's Express" (where you sat on milk crates in the hold, but they were damn cheap tickets) and the Orient Express (there's some really shady people on there, and most people don't speak English). Got to Midway around 8pm, picked up our Avis car, and drove to Jill's.

Seeing Jill was great -- Jill owns her own condo on the north side, right near the lake. We had dinner and caught up with Jill, which was much fun. The next day, we walked along Lake Michigan (very cool) and went to the Hogshead Bar to watch the ND vs. West Virginia game. The game itself was kinda sloppy; we had moments of brilliance, but all told, the final score didn't tell the story of the game. We won, but save a few critical plays, WVa almost beat us.

I randomly ran into some people that I knew at the Hogshead -- two of my old roommates, Mike and Brian (it was good to catch up with them), and an old CS grad named Dan (journal policy not to put in last names to protect the not-so-innocent). He works at a .com in Chicago called www.ubid.com. We chatted about that for a while. His brother is a froshy at ND, and is thinking about CS. Good for him!

After the game, Jill and Tracy and I ran to Marshall Fields to pick up a wedding gift for the reception that Tracy and I were going to that night (stoopid Marshall Fields -- they don't have their wedding registry online yet!!). Tracy and I raced up to Lincolnshire for the reception (the wedding was about a month ago, in Italy) and made it pretty much just in time.

It was fun -- I didn't know anyone (it was one of Tracy's co-workers who got married), but we saw a bunch of GE people that Tracy knew, and they were nice folk. We had a good bunch of laughs, and a good time was had by all. Hell, the booze was free -- how can you go wrong?

By the end of the evening, however, my ears hurt from the music. They had a live band, and they were actually pretty good -- it was a Benny Goodman orchestra-style band, but played all kinds of music. Their singers were quite good, and very lively (dancing on the dance floor while singing, etc., etc.). They even had a mixer boy, but I came to hate him because I saw him keep edging up the "master volume" slider. Bastard. I hope that his MPI programs rot in hell.

We flew back Sunday morning and got back here around noon. I did e-mail but was otherwise uninspired to do any work, so I lazed around and watched TV. A good Sunday. :-)

Bandwidth to nd.edu is sucking again. Well, it's not sucking, but it's certainly not nearly as nice as it was during break last week. For example, streaming MP3s from nd.edu to squyres.com is pausing all the time. Icky.

After having been gone for the weekend, I am shocked to discover that my Mojo level has fallen to about 850,000 (it was about 980,000 when I left). This amazes me -- I left my mojo server running all weekend, but I personally did nothing with it all weekend, and yet somewhere in there I spent about 130,000 mojo. How could that happen?

That's not the whole story, of course -- I do have about 100,000 mojo "coming in" (when people spend mojo with you, it doesn't necessarily come in right away; there's a credit system for totals up to 10,000 mojo -- see http://www.mojonation.net/ for more details), so I actually didn't lose all that much -- but it still seems wrong. That is, I have mojo going out at a much higher rate than it is coming in!

I hope that it's just still bugs in the system. It doesn't take an accountant to realize that even though my consolidated total isn't much less than when I started, you can't spend what you don't have, so if mojo [actually] is going out faster than it is [actually] coming in, you're screwed!

Did some more research into DSL for my church. They want to get DSL for the following reasons:

• They have 3 separate computing resources right now that they want to consolidate into one bill:
• The Youth Center, which is physically distant from the church's main administrative offices, uses e-mail, and has a $9.95/month Juno account. • The main admin offices have an AOL account at something like$21.99/month, with 7 e-mail accounts.

• They have a web site that's hosted at a local company for something like $19.99/month. This comes to a total of something like$42/month. DSL will at least double that, but there are other factors as well...

• They only have a total of 8 e-mail accounts, but have at least 12-15 people who need e-mail. Hence, they're maxed out right now, and need to expand.

• They only have so many phone lines at the church; when people are on the phone for e-mail or web, that's one (or more) phone lines that can't be used for regular business.

And actually, the admin offices are already wired on a LAN, so they're pretty well setup. After some preliminary investigation, prices in this area for 192kbps/SDSL (the church is technically considered a business, so they can't get the cheaper residential rates) are between $100-120/month. Still need to contact a few more vendors (I'm doing it during lengthy compiles and/or network transfers nd.edu<-->squyres.com) to get some more options. It's not just the base bandwidth that they charge for -- they all have different services in terms of number of mailboxes offered (for free), how much web space they offer, whether there's a dialup line (for the Youth Center), Etc., etc. WHOO HOOO!!! My boss from my army unit just e-mailed me -- he got me a tentative position in Army high performance computing; apparently I'll be in the "hacking" group. This could be interesting! This is just the results from a few preliminary meetings that he has had with a group (in Aberdeen, MD, I think). We'll see where it goes. But it least it looks like I won't be forced to go back to be a signal platoon leader somewhere. Whoo hoo!! I've changed my "Dissertation" topic on the journal to "Technical", because I find that most of the "Dissertation" stuff that I send is only sometimes related to my dissertation work. Most often, it's just some techincal stuff that may or may not be related to my dissertation, or anything at all, for that matter. ### There's enough bad vibes in here to run a Vodoo factory I did much work on IMPI today. Lesson for the wise: never write/debug parallel programs with only two nodes. Always use at least three. Three is probably better than four, actually, if your program has to work for all general cases. I already knew this, but I discovered it again the hard way today. I'm working with HP and MPI Software Technology on our IMPI demo for SC'2000; I thought that I pretty much had LAM ready to go on Friday. Today, I tried it with three clients (instead of just two, up in nd.edu) -- i.e., two clients in nd.edu and a client down here in squyres.com for a local display (the demo is a GUI plot of the Mandelbrot set -- the plotting is calculated in parallel, and the results are sent to the display master to be shown on X). Everything worked great with two clients, but started barfing horribly with three clients. Ugh! I had to go around and fix all the places where I had made bad assumptions and whatnot. So, kids, please don't program in parallel with just two nodes -- always have adult supervision and use three, four, or two hundred nodes. It didn't help that there were actually other bugs in the demo code that we're supposed to run (the parallel Mandelbrot stuff was originally written by the MPICH guys and then modified by the NIST folks for specific purposes of the IMPI SC'2000 demo). I found at least two bugs today (remember: broadcasting pointer values across multiple architectures is meaningless) -- possibly more, but I think I've blocked them from my memory to prevent further trauma. I also had a few bugs left in LAM -- the code for calculating host and client colors and sizes looked like a Darwinian experiment gone horribly wrong. I had to evolve that code into something better and greater -- to make it more than the sum of its parts. Now, it rocks with the rest of LAM. I just can't help it -- LAM rocks. It all seems to be working now. It's happily checked back into CVS, and hopefully I'll be done with that for a while... Conversed with a guy at GE Aircraft Engines today. They're using LAM for somethingorother. He asked for a good feature on Friday (see his post on the LAM list); so I moved our discussion off the list and we'll iterate through a few things trying to get it right. In related news, GE acquired Honeywell today. And "Just Jack" will stay on as CEO for an additional several months (he was going to retire next April, IIRC) until the end of 2001. You just can't go wrong with "Just Jack". Glory be to the Father, the Son, and GE's stock price, amen. ### Past present participle future improbably never tense (this is a few days old -- I started it before last weekend. So take all present tense to be past tense) Learned some wisdom today. It was painful, so I'm going to share in the hope that others may save some time... On the eternal quest to have "proper" Makefiles, we had quite an elaborate setup for dependencies in LAM/MPI (the automake stuff for generating dependencies is broken for non-GNU make). The only problem was, it didn't work for VPATH builds. We were somehow under the mistaken impression that you didn't need make depend in VPATH builds. Sidenote: For those of you unfamiliar with VPATH builds, it's a slight variation on the GNU standard "./configure ; make all install" Scheme of Doing Things. It allows you to use one source code tree to build multiple binary trees. i.e., you download a random tarball, expand it to its source tree, and then run "./configure ; make all install" multiple times simultaneously. What's the benefit? For building on multiple architectures, and/or with different configure options, of course! If you think about it, this is a really handy mechanism. It works like this (I slightly lied above): you expand the tarball, and make a new directory to build in. And then run configure (and make) from that new directory. For example: unix% gunzip -c foo-1.0.2.tar.gz | tar xf - # ...makes foo-1.0.2/ directory... unix% mkdir build unix% cd build unix% mkdir sparc-sun-solaris2.7 unix% cd sparc-sun-solaris2.7 unix% ../../foo-1.0.2/configure \ --prefix=/yadda/yadda/yadda/sparc-sun-solaris2.7 # ...much output... unix% gmake all install  (The final "gmake" is necessary because Sun's native make isn't VPATH enabled) Hence, you can have multiple of these puppies running simultaneously, all from the same code tree. This is really handy in development, too, when you need to test on multiple architectures simultaneously. But now I see the error of my ways (it took developing on Solaris and Linux simultaneously with the same code base to show me this piece of wisdom). Hence, I set about to make our depend target work properly for VPATH and non-VPATH builds. Easier said than done. Although I already knew this, I have finally and firmly decided that make's rules for syntax (particularly quoting) SUCK. We use the GNU tools automake and libtool to build LAM/MPI (the use of libtool doesn't actually matter here, I just wanted to use it to mention our sponsors -- buy GE products today). Now previous journal entries have shown how automake can be your friend, but automake can also be your enemy (very similar to power tools, in this respect). This journal entry has nothing to do with automake (buy GE appliances). In our automake setup, we include a top-level Makefile.depend file that has our "depend" target. It was fairly lengthy and involved, and it applied to the whole tree, so this made sense. For an hour or two, I tried to make it do VPATH stuff properly. This involved the following: • Getting the source file list • Running makedepend on all of the source files Sounds pretty simple, eh? Not so, gentle reader, not so. Here's why: 1. First off, GNU make sucks. I don't know if this is a documented "feature" or not, but it certainly makes no sense to me. So when you have a list of source files (e.g, "BLAH = foo.c bar.c baz.c"), GNU make happily prefixes each of them with the VPATH for you. Whoo hoo! This saves a lot of trouble of doing it manually. After all, none of the source code files are actually in this directory -- we have to add some kind of prefix to get to each of them. However -- closer examination reveals gmake's suckage. The last file in the list does not get the VPATH prefix applied! Why? I have no idea. But it pretty much fucks up the whole scheme -- it's pretty useless to get all but the last one. It's not ok if you only get five chicken McNuggets when you order the six-piece combo at the drive through. Heck, no. You get all six or its throwdown time. As such, I had to write code to a) strip off the VPATH prefix from each entry (if it was there), and then b) add it back on to every entry. Not that this was extraordinarily difficult (but escaping the sed expressions in the Makefile was a bitch...), but I shouldn't have had to do this. 2. With the re-VPATH-prefixed list of source files, you can run makedepend. But oops, it barfs. It seems that it can't find the file lam_config.h. Arrgh -- that's the one that configure generated via autoheader. It seems that automake isn't smart enough to add -I$(top_builddir)/share/include to CFLAGS --
it adds -I$(srcdir)/share/include instead. What the hell is the point of that? (translation: automake is adding a -I for the source tree, not the build tree. But the config .h is always put in the build tree -- not the source tree. So I'm not quite sure what the logic is here) So we have to manually add the -I for the build tree. Not nice -- we shouldn't have to do this -- but very easy to do, so move on. 3. Whoo hoo! It seemed to work! Checking the generated Makefile... #@$%@#$%@#%@#$%!!!!!!!

All the dependency entries are for "VPATH/foo.o", and "VPATH/bar.o", etc. instead of "foo.o" and "bar.o". That is, we're building foo.o, not ../../foo-1.0.3/src/foo.o. Hence, the Makefile has to show the right dependency.

CRAP.

So we have to add some more sed mojo to post-parse the Makefile and strip out the VPATH prefixes from the generated dependencies.

4. Ok, run again. Seems to work this time. Let's try it on the whole source tree...

Barf-o-rama. One of the source directories in LAM has almost 250 source files in it. Adding "../../lam-6.3.3b44/share/mpi" to every entry in the list quickly overflowed the shell's buffer for a single variable. Hence, it just dropped all the additional filenames.

So I had to add a loop around the file list to only process about 20-25 at a time. <sigh> This really became painful at some point; I hurt.

Trying once more... #@$%@#$%@#\$%@#!!!!!

Since we're running makedepend multiple times, it only saves the output of the last run in the generated Makefile. Hence, it saves the dependencies of the last 20 or so files; all the previous dependencies are snipped each time makedepend runs.

Luckily, makedepend has a -f option to specify where to send the output, so we can save it in a temp file and tack on successive results to the end of the Makefile.

5. Try again.... <sigh> Still no love.

Now it's not ditching the previous results at all. Since makedepend isn't running on the main Makefile, it doesn't snip the previous dependencies. Hence, we have to do it ourselves. Redirect some input to ed to snip out all lines after "# DO NOT DELETE" (seems pretty ironic, doesn't it?) and catenate the new results on after that.

6. Finally... it works.

That whole process actually took quite a while -- adding additional quoting for make (especially in the sed expressions) made it arbitrarily difficult. So somewhere near the end, I said fuck it, and moved the whole thing off to a bourne shell script. It actually became much easier at that point -- I should have done that much earlier. The depend target actually became pretty small at that point; it just calls that script with a small number of arguments followed by the list of files (also as command line arguments to prevent single-shell-variable-overflows).

The moral of the story: it works now. It works for VPATH, it works for non-VPATH. If you want the script, LAM's anonymous CVS access
-- it's config/run_makedepend. The depend target itself is in the top-level directory, a file named Makefile.depend.
Save the planet: reuse code. Feel free to steal/improve this depend target. Your country depends on it.

### Day of a 1,000 journal entries

Criminey, could I write any more journal entries in one day?

Apparently so.

Click, bing! You've got mail!
Jeff Squyres can write a lot
But he is now done
This is just to say
That I won't write more today
No more mail ('cept this)

## October 24, 2000

### xmms blows/It does not clean up its threads/Suckage, yea, suckage

xmms sucks.

It just barfed on me (seg faulted) after running for at least a week or two continuously on my desktop. I tried to start it again, and it "finished" immediately, but with no error messages.

Puzzled, I started looking around as to why.

As reported in previous journal entries, I found many, many xmms processes running in the background (935, to be exact). Assumedly, xmms is not cleaning up its threads properly (since Linux mimics threads with "cloned" processes).

On the bright side, though, I'd like to see any Windoze machine run upwards of 900 processes and continue functioning properly! (i.e., everything else in my system was functioning properly; I think xmms was bailing because it was trying to contact the other "running" xmms instances properly).

## October 25, 2000

### I love Kung Fu movies...

Some quickies...

• Dad got the "LoveLetter" virus on all his 'doze machines at the store yesterday (it spread itself via mounted drives and went rampant across three machines). Viruses suck; it automatically overwrite all .jpg and .vbs files on all three machines. It's not quite clear where it initially came from, either. Dad had up-to-date virus protection, but he had an older version of Norton AntiVirus, and it wasn't automatically checking e-mails, so I suspect that this is where it came from. <sigh>

• Possibly going to see "Rent" with Janna in December. That should be fun. I saw it in London, and laughed uproariously at "You can take the girl out of [New] Jersey, but you can't take the [New] Jersey out of the girl." Being from Philadelphia, this is enormously funny to me (we make fun of New Jeresians all the time). But it's apparently an American joke, because no one else laughed.

• And old ROTC cadet of mine (Trent) is now out of the military and working at GE Appliances. Small world.

• Not sure if I'm going up to ND this weekend or not; should know by the end of the day.

• The HP guy (CQ) found some bugs in my IMPI code for synchronous sends. Ugh. This is proving troublesome to track down...

• The motherboard/PROM on the Airmics mail server is fried; it is crashing multiple times a day. Suckage. They trying very hard to get the new server setup, but it just takes time...

## October 29, 2000

### There's no private property the LSC!

Many days, no journal entry. The usually nemesis is at fault: traveling.

I've been up here at ND for the latter half of this week. Mainly for SC2000 coordination (the freebie mouse pads arrived way early. Yay!) and other miscellaneous tasks. I also made my famous "hockey puck" chocolate chip cookies this week for the efforts of the Engineering Graphics department (ok, mainly because Joanne from EG said that I owed them cookies for their efforts). For the uninitiated, it is widely known that I make the World's Best Chocolate Chip cookies. They're roughly the size and shape of hockey pucks (hence, the name); none of these twice-the-diameter-of-a-nickle and paper thin kinds of chocolate chip cookies for me. Hell no. Soft and chewy in the middle with a 1lb pound bag of chocolate chips in the mix just "so that there should be enough". One of these cookies can serve as a meal. A double batch made 12 cookies this week.

Anyway, that all went well, and we finished up our virtual posters for SC2000. I had to use some evil powerpoint animations in them, but they'll be ok. We still need a result graph from LAM/myrinet (more on this below) for the slides, but everything else is finished.

Sidenote: Myrinet is a proprietary network that runs at gigabit speeds. i.e., orders of magnitude faster than 100Mbps ethernet. You can run TCP/IP over Myrinet -- they provide a driver for it -- but it's at a significant cost in performance over "native" communication over the Myrinet hardware. "Native" communication is provided though a library called "gm". Hence, we're adding a "native gm driver" to LAM to utilize this ultra-fast communications over Myrinet in LAM directly, rather than relying on TCP/IP over Myrinet. This is what Arun has been working on since the beginning of the summer. We want to have [at least] a beta of this stuff working to show off at SC2000.

Arun and I tried to make a result graph for LAM/gm -- just a basic one showing "TCP over Myrinet is good, but gm over myrinet is better!", but unexpectedly got bad seg faults and couldn't produce anything. This generated the rest of my Friday evening, and most of Saturday morning.

Before all we could launch into extensive debugging, though, we had to otherwise finish up the slides. Got some good slides for LAM/gm (Arun), XMPI (Brian), and IMPI (me). After everyone else had left, Lummy wandered in (while I was still working on the slides; perhaps 6:30pm or so). Had a long chat with him about the future of LAM and whatnot. It was especially interesting with the prospects of MPICH's going through an entire re-write (with the focus on their ADI-3 work now -- already a 70+ page document!). MPICH 1.2.1 is probably pretty near the end of the line for that code base; MPICH 2.0 will probably have some elements stolen from MPICH 1.2.x, but will likely be mostly from scratch. This is really cool stuff, actually.

I spent the rest of the night upgrading the version of GM that we had. We reported what appeared to be a bug in gm to Bob -- one of the authors (a very helpful guy, actually), and he said, "you're using a really old version of gm -- you should upgrade and see if the problem just goes away)". Ugh. How embarrassing! Turns out we were using gm-1.1.3, and the latest is gm-1.2.3. Oops.

myri.com is apparently connected to the world through a 300 baud modem; it took about an hour to download the 1.2.3 tarball (only a few megs). It took a few tries to get it installed properly -- we have really old Myrinet hardware (probably a few generations behind current stuff). Myrinet utilizes a kernel module in Solaris, so you have to take some care to build and install it properly. And compiling on the Solaris 2.5.1 140Mhz machines is just painfully slow. Ugh.

So I finally got everything up and running around 11pm or midnight. I ran some test programs, and finally decided that everything was working properly. Then I ran a simple test program through bcheck. Badness. Lots of "read from uninitialized" errors from within libgm itself. Crap!!

After a lot of source diving in libgm, I determined that the problem was a buffer that was supposedly being initialized by an ioctl() call into the gm kernel module. The upper libgm was providing the buffer and expecting the lower kernel module to fill it in. It took a lot of hacking around and source tracing in workshop to absolutely verify that the lower kernel module was, indeed, filling that buffer properly, but it remained a mystery to me as to why bcheck would think that the buffer was uninitialized. Worse yet, sometimes bcheck reported that everything was fine -- no read-from-uninitialized. Hmm.

Hesisenbugs suck.

It didn't occur to me until Saturday morning that bcheck couldn't possibly know that the buffer was filled -- bcheck only monitors the process under debug; it doesn't monitor the kernel module at all. So it makes perfect since that while the buffer is initialized by the kernel module, bcheck simply has no knowledge of it, hence, it reports it as uninitialized when upper libgm reads from it. Although this doesn't explain why bcheck sometimes reported that all was well, I'm 99% sure that this is what is happening. Bob later confirmed my suspicions, too.

Hence, I [effectively] added a memset() to the upper libgm code, and bcheck finally only reported Truly Bad Things --
similar to what we had to do in LAM when we know that uninitialized buffers are ok ("when you optimize code, all coding guidelines and rules are out the window, and painfully splattered on the ground below").

I then set about trying to debug a simpler example than NetPIPE (which a de facto MPI latency/bandwidth benchmark program) -- the program that we were trying to use to get some result graphs for LAM/gm. I made a simple "hello world" ping pong MPI program, and tried to debug that. Arun came in around 11am or so, and we set about stepping through the internals of the gm progression engine inside LAM. Not for the meek.

It's good that Arun came, 'cause he wrote the stuff, and I wasn't completely familiar with it (indeed, I had only seen the internals once before -- when we had a code review about a month or so ago). So his explanations and rationale were quite helpful. We finally tracked down a repeatable kind of error in the simple ping-pong program, but then had to leave for the football game.

ND vs. Air Force. Wow. A real nail-biter, there. I can't believe that we won. It's horrible to say, but our offense really did not look good at all during the game. We had one decent drive, and it was full of 3rd and longs. The rest of our points were off lucky Big plays and the like. :-( Granted, I was in the stadium and didn't have the benefit of instant replay and the like, but it didn't look pretty from the student section.

Our defense was kinda shaky, too. We had some great stops a few times -- held them to 3 points at least twice, for example, a blocked field goal (which put us in overtime -- and later gave us the game), etc. But they were able to throw all over us all day. Our pass defense was just not good.

But in overtime (!) we managed to win the game. Air Force went first and we held them to 3 points. We then came back and got a touchdown, putting the final score at 34-31. Amazing. It's our first overtime victory -- we were previously 0 for 3 in OT.

Some other random points about the game:

• Great flyby from 3 F-16s (or F-18s...?) during halftime. Well timed, and it was lead by some 1LT who graduated from ND in '97.

• There was some woman behind us who was clearly visiting some friends here at ND. Whenever she opened her mouth, stupid came out. Some memorable quotes:

• (during the band's halftime tribute to the military, where there were various military people on the field with the band, the American flag and the flags of all four services were flying on the field) "Is this some kind of Halloween thing?"

• "I just love that Leprechaun guy! I just wanna scoop him up and hug him!"

• "So they're not really downs, are they? They're attempts at downs, right? So why does everyone call them downs?"

• Saw Tony and some other JeffJournal fans after the game. Felt kinda silly, because I didn't recognize Tony right away (it's the beard! I swear it!) -- duh. But then later, I realized that I really hadn't seen Tony since last spring, and I felt [somewhat] better. :-)

• Tracy and I went to see Pay it Forward afterwards. Not a bad flick. Not quite as complicated and intricate as I had hoped, but still not bad. so I think I give it an official vote of "sympathy".

This morning... back in the lab, and I think I've narrowed the problem down in LAM/gm to an unexpected receive. A pointer is not getting reset properly in the gm progression engine, and when an unexpected receive (definition below) comes in, a linked list is attempted to be searched for a request that is no longer valid (and has actually already been freed). Hence, sometimes it works, and sometimes it doesn't.

I suspect that this is just an error from the "translation" of the TCP engine to gm. i.e., we literally copied the TCP progression engine and gm-ized it; I suspect that this bug is just an error in the gm-iziation process. Hopefully, this will be the last Big Bug...

An "unexpected message" is one of the Big Concepts for MPI implementors. It is possible that a user does a send from one rank before doing a receive on the target rank. Hence, the message may actually arrive at the target before the necessary bookkeeping has occurred to setup to receive that message. Hence, when the target gets such a message, it files the message in the "unexpected" queue. When the matching receive is finally posted, it first checks the unexpected queue to see if the message has actually already arrived before going to actually check the message passing hardware for the message. There's a lot more to it than that, of course, but that's the gist of it.

Hence, when LAM/gm receives an unexpected message, it's checking the list of outstanding receives improperly.

I'm off to go squash this fucker, then a visit to Chez Lummy, then back to Looieville. Rock on!