« May 2001 | Main | July 2001 »

June 2001 Archives

June 3, 2001

I whipped Joe's ass

My kingdom for quickies!


Really able to simplify the Tuscon user interface. Now only need a "kernel" for input, worker, and output -- all buffer management (which is not trivial) is handled by the boilerplate input, worker, and output engines.

And it all works!!

Happy dance....


ps shows that I typically have between 60-100 processes running on queeg (i.e., just under the username jsquyres). Wow -- I guess I'm a busy guy!


Tuscon works in multi-level with no changes. Cool...


Army cushy job no more. I've been told that I must wear a uniform (class B) during my two weeks! Schnikies!! I suppose I shouldn't complain, though. They mentioned "distance learning" as what I might be doing. I guess I don't quite know what that means. We'll see...


Brian put in stuff to get LAM to compile psuedodaemons separately. Woo hoo!

Still having problems with myrinet RPI on Chiba City. Discovered that the head node doesn't work, but then again, there are some cases where the myrinet RPI doesn't work on the interactive nodes either.

Turns out that the tiny and short sizes were mixed up upon initial assignment, so short messages were effectively getting tagged as envelopes, which then caused short messages to get interpreted as envelopes, which just led to Badness. Arrgh!

Side effect that we didn't think of before, though -- since the user can override the tiny and short message sizes, what happens if log2(tiny) == log2(short)? Hmm. Need to investigate the gm tagging mechanism a bit more...


Dog was here. He saw the house, went to a convention, and then we all went to party at Janna's. It was much fun -- it was to celebrate Jim's 30th and getting his MBA.

Dog and I talked a bit about LAM and his MS this morning before he left.

Two words, Joe, "Mon ney", and lots of it.

More LAM gm RPI work.

There's a degree of urgency to this because we're asking Myricom for some cheap/free equipment, and we kinda need a working Myrinet RPI to do this. Plus, the Sandia folks have a Myrinet cluster, and it's kinda in our interest to have a working version...

  • Added more to the README.myri file -- it was incorrect and
    incomplete. For example, it didn't have anything about changing the
    tiny/short message lengths.

  • Had to bullet-proof the tiny and short message lengths (both the
    defaults and the user-settable sizes) to ensure that they weren't
    the same, and that the tiny really was less than (short +
    sizeof(struct c2c_envl)) -- it's a little confusing because the
    tiny size has to have the size of the envelope added to it. Hence,
    the tiny and short lengths must be at least (sizeof(struct
    c2c_envl)
    ) apart.

  • Cleaned up a lot of confusing "size" vs. "length" misnamed
    variables and whatnot in the code (these two words mean something
    very different in Myrinet/gm).

  • Found some problems with user-overridden message length sizes;
    initial buffers were being provided with the wrong size, so messages
    would never be received properly.

All these took a long time to resolve because I had to go [re]learn how the Myrinet code worked. Plus, I made assumptions about things that were supposedly already tested which ended up being broken anyway. Ugh.

However, with these bug fixes, we might be darn close to the first beta release. We'll see.

Everything works on Chiba City now; need to do some cross-checks on the Hydra and on the babel cluster.

Life is pain

Forget that last entry.

Lots of Myrinet MPI tests are failing with seg faults or otherwise hanging on Chiba City.

Much frustration.

<sigh>

June 6, 2001

Fact cited by Matthew Brock are not necessarily facts

I think every long-running program should have a unix domain socket that you can use to communicate with it. It's an inherently useful capability; you can use it to query the current status of the program, change run-time parameters, etc.

Particularly in multi-threaded apps -- you can have a thread just sitting there waiting for connections, handling the requests, etc.

All the cool kids are doing it.


Been fighting with LAM's Myrinet RPI all week. Got some issues solved, but not all of them (any claims that I previously made about having it all working were later shown to be totally false. Doh). I seem to be having a problem with collectives that use long messages now. Hmm.

It seems to work in trivial cases, but if you stress test it at all (i.e., get a bunch of concurrency where multiple messages are going through the state machine simultaneously), barf-o-rama. I think it may have to do with the fact that we're using a slightly different state machine (vs. the TCP RPI that we stole it from) in the initial tcp_advmultiple() entry point, but I'm not 100% sure of that yet...


I find it very amusing that the action-packed trailers shown on TV for the "Tomb Raider" raider feature a song by Fatboy Slim named "Michael Jackson". Notable lyrics in the song (not heard in the trailer, of course) are, "Michael Jackson -- that's a cute guy!"


Learned something yesterday by accident -- tcsh's pushd command, when executed with no argument, will swap the top two elements on the stack, and change to the directory that is now on the top of the stack. That's pretty cool -- and useful!


Had to spend some quality time with RPM's yesterday to make the OSCAR LAM RPM. Learned a bit more about RPM's than I really wanted it, but I de-mystified a bunch of my prior knowledge about building RPM's.

I made the OSCAR RPM (main difference is that it is completely installed in /opt/lam-LAMVERSION rather than in /usr and friends), and setup the scripts for OSCAR to install it.


Brian is doing Great Things with C++-izing the lamd. We've been hashing out ideas (granted, he's doing all the work) via e-mail today and yesterday. That rocks.


Started using e-mail notifications of CVS commits recently. Seems to be working well. We started with a basic mail script, but I stole a perl script that the Vorbis group uses to mail CVS notifications that sends out much more information. We'll see how that works out.

It seems that this perl script was originally written by someone at Cisco. Small world.


Back to the Myrinet RPI...

June 10, 2001

He's got a billion dollars! He could hire Steve Forbes as his cleaning lady.

Here I am in Atlanta.

Warning: this entry is almost totally random. Weak-minded readers, consider yourself warned.


I was sitting in the Louisville airport today, waiting for my flight to Atlanta (I'm so happy with direct flights. I'd like the think the person who invented them. Props!). I saw three women in BDU's (Battle Dress Uniform, for those of you who are uninformed -- or Army fatigues). One of them was addressing a group of obviously college-aged kids. She met them at the end of the terminal and was directing them were to go to catch some bus.

It wasn't until I saw her yellow and black armband that said "CAMP CHALLENGE" that I realized that the kids were all ROTC cadets reporting to Ft. Knox for fun in the sun. The women in BDUs were all spec-4s. I remember being confused by that rank -- they're not privates, they're not corporals, and they're not sergeants. "What is that funny little thing on their collars?", I used to wonder.

It took me a while to catch on that it was a real rank, not just an unusually uniform (yes, excuse the pun) patch of black on so many different people's BDUs. I was always the slow one in my family.


Great line: "She makes coffee nervous."


I'm here in Atlanta for my Army two weeks. Don't know what I'll be doing yet; I got some vague mention of "setting up distance learning", but that's all they told me before I got there (did I mention that before in the journal? If so, cope).


Suzanne sent me the recipe for her mother's Lebanese casserole. It has to be the most yummy food on the planet, I think. She makes it not-infrequently when I go up to SBN. Someday, I'll actually try to make it myself.


I found an outdated PHP header file that caused the search functionality off the main LAM mail archives page to break. A helpful LAM user pointed it out to me today. Oops!


I got my hair cut in preparation for my Army time. I wish the barber lady had left a little more on the top. Ah well. It will grow back.


My hotel room has ethernet. Thank goodness. I probably won't use it every day, since I start work tomorrow and will have internet access from the lab, but I have to say that it was quite convenient today.


I fixed a bunch of dumb errors in the lamtests suite today and caused a bunch of tests to hang when you ran them on more than 2 nodes. Thanks to a helpful LAM user for pointing this out.

That same users also pointed out that the spawn tests failed in the non-uniform filesystem case. Oops!


I recently accepted to buy 3 CDs from BMG in order to get 9 free. One of the three I even already owned (it was a special offer -- 3 specific CDs). The thing that I forgot was that BMG's selection totally sucks. I had to browse their entire "Modern Rock" section until I could find 9 CDs that I sorta kida would probably get if they were free. i.e., they definitely weren't on my "yeah, I'd like to get that" list.

I'll definitely quit BMG after I get these nine.


Another great line: "She's gone out to meet a bunch of bikers. Big ones. Full of sperm."


Time for sleep. Gotta report to duty tomorrow; I'm defending your country through distance learning!

June 13, 2001

Can that thing measure a New York Minute? 'cause Jimmy could walk through that door any minute. And this is New York City.

Defending your country...

I got my new Army black beret today. It was made in Canada, thankyouverymuch. The official start day for these berets is Friday. What a pain in the butt. You have to shave them, shape them, etc., etc. They're actually quite difficult to wear properly. That's why the French are so uptight, for example.

I think everyone in the Army is gonna look dumb for a while while we learn how to wear berets.


I was driving around Atlanta today taking care of administravia (getting an ID card, parking permit, etc.), and I heard "Weapon of Choice" by Fatboy Slim on the radio (it's probably a remix of something else -- it is Fatboy Slim, after all -- but I have no idea what the real song is). Even though I've heard this song many times, I never paid too much attention to the lyrics.

I was amazed to hear a Dune reference:

Walk without rhythm, and it won't attract the worm.

Trippy.


Played with qmail -- a replacement for sendmail. Seems quite powerful and flexible. I have to say that it's non-trivial to install; it requires additional users, groups, directories, etc. I guess I've grown accustomed to sendmail over the years, and am now quite comfortable with it.

qmail's strengths seem to be that it scales much better than sendmail (it claims to, anyway), is 100% secure (there's an open $5000 challenge for anyone to find a security problem with it), works nicely over NFS (when you use it in "maildir" mode), and lets users make their own arbitrary mailing addresses (e.g., jsquyres-foo@hostname).

I guess it makes me somewhat nervous, 'cause it wants to work in maildir mode, where -- if I understand this right -- it has a separate directory for each user's mail and puts each mail message in a separate file. This actually makes a lot of sense, and there are a lot of good reasons for it. But I'm not sure that pine can read the messages that way (although it might -- gotta dive into that a little more, I suppose). I dunno if imapd can, either (same disclaimer).

I guess I'm just uncomfortable 'cause it's a large new system --
no mail server can be small and simple to use; they're just too complicated. Maybe I'm getting old. ;-)

----

The next day...

----

I just noticed why I never heard the "Dune" reference in "Weapon of Choice" before. I just heard the song again on the radio. The voice is quite clear and easy to understand. I always listen to the song in MP3 format on my computer. And my computer actually has fairly good speakers.

But the voice on the MP3 is distinctly degraded and difficult to understand. Proof positive (to me, anyway), that MP3 really does degrade the sound quality. I'm tempted to encode this song in ogg/vorbis and see if it's any better. Hmm...


The last time I was here, I discovered that my hotel charges
$18/night for parking. Wow! Yes, I can get reimbursed for that, but that ultimately comes out of all of our tax dollars, so I went looking for a different garage within walking distance of my hotel. I found one -- it's the parking garage for America's Mart. It's actually right across the street, but who's counting?

They have a max of $7/24 hours. And it's literally right down the street from my hotel. It's less than 1/3 of the cost! I'm not only defending our great country, I'm saving taxpayer money as well. So I parked there all last time I was here.

As an added bonus, if I left early enough in the morning, I would get there before the attendant was on duty, and therefore I could drive out without paying (the gate is up when there's no attendant on duty). I didn't note enough when the attendant came on duty so as to figure out how to always avoid paying.

I was wondering if the same conditions would apply -- the last time I was here was about 1.5 years ago. And lo, the conditions are the same -- I parked in the garage last night and left this morning for $7. The attendant was there at 7:45. I'll have to see if I can leave before then tomorrow...


I've noticed a curious phenomenon with my hotel room.

When I come back at night, the volume is turned up 100% on the TV. I think the maids must be severely hearing-impaired.


I had dinner at Planet Hollywood last night.

I had dinner at the Hard Rock Cafe tonight (they're right across the street from each other).

Per diem rocks.

(I'm such a simple man...)


I now realize why I tend to stay late when I'm working down here (aside from not knowing anyone else in Atlanta, only having working clothes, and having free high-bandwidth internet connectivity at work)
-- the traffic in downtown Atlanta between 5-7pm sucks. Stop-n-go traffic made a normally 5 minute drive about 20-25 minutes.

Ugh!


On the way out tonight, I dropped my laptop. Aaahhh!!!

The CD drive cover broke off.

Where I previously had one vertical line on the screen, I now have about a dozen.

:-(

I think it's now definitely time for a new laptop.

:-(


I've started watching Witchblade -- the new series on TNT. Very bad -- I don't have time for it.

It has very cool photography. Lots of matrix-like effects. They had a very cool scene in the first five minutes of the show tonight with two dirt bikes jumping right at each other, mixed with several bullet-time spins at stopped time. Trez kewl. They play a lot of good music, too.

Indeed, TNT is headquartered here in Atlanta -- I don't know if it's the headquarters, but there's a TNT complex right next to the edge of the GA Tech campus where my building is. There was a multi-story Witchblade poster on the side of the building yesterday.

Your *what* hurts?

BTW, redirect all comments about my [lack of] math skills in the last entry to /dev/null.

June 15, 2001

On cell phones and swiss cheese

My laptop just suspended itself for no apparent reason.

Weird.

Is this another symptom of it's impending death? The vertical lines on the screen are damned annoying. The keyboard is becoming less responsive. And now this.

Poor pokey. :-(

pokey has served me well over the years.

However, I just initiated the ordering process for a new laptop from Dell (through IU):

  • Inspiron 4000
  • PIII 900 MHz (the price difference between 900MHz and 1GHz seemed to be not worth it)
  • 14.1 inch SXGA TFT screen
  • 256MB RAM
  • 20GB HD
  • 24x internal CD
  • Pluggable internal floppy (will probably stay on the shelf in my office...)
  • Windows ME (but IU has a site license for Win2K, so I might format and install that)
  • No MS Office; IU has a site license for that as well

  • Port replicator
  • External speakers
  • External mouse
  • External keyboard

Lummy said that I can have one of the extra monitors that came with the machines that we got for the machine room, hence, it wasn't in the actual order. It's shipping directly to Bloomies; hopefully it'll be there next week sometime.


Have you ever had a surreal experience involving C++, three peacocks, a kumquat, and 1977 $10 bill?

Neither have I.

Just curious...


Looks like Lummy and I are traveling to Sandia to see Brian's gig and talk to the folks down there. We'll give some kind of talks about LAM and what we want to do with it w.r.t. fault tolerance, etc. Don't know the exact composition of the talks, nor the exact travel days, but it's likely to be the early part of the week of 25th.

I've had a bunch of really interesting phone conversations and e-mails w/ Brian about this FT stuff w.r.t. LAM. It's very cool stuff. He's doing Great Things with the lamd.


More LAMisms:

  • Dog has started doing LAM stuff. (Did I mention this in jeffjournal before? Short term... what?) He'll be adding TM stuff, which will only benefit PBS right now, but hey, perhaps others will implement it as well. He'll also be adding compile-time and run-time parameter checking disabling, and measuring to see if that actually makes a difference or not.

  • I finally made a breakthrough in the Myrinet RPI last night, and I think that I may have found the last problem (that I'm currently aware of, anyway). The tag/size from a long message ACK was getting trashed before the actual body of the long message was sent back in an obscure race condition because the tag/size was stored in a temporary buffer. This race condition only exhibited itself during long message all-to-all tests. Woof. The solution was to save the tag/size immediately upon receipt -- not difficult at all, but it took forever for me to understand what was going on, what was going wrong, and exactly why the tag/size was getting trashed. I'll run all the tests on all three Myrinet systems that I've got access to and see if I can manage to get a beta out this weekend. Of course, as soon as I type this, I probably doom myself to finding other problems, but we'll see...

  • Network Solutions really sucks. We've copied the DNS zone files from nd.edu to cs.indiana.edu, and they're up and available. I tried to use the NSI web interface thingy to change their pointers for the two top-level DNS servers for lam-mpi.org (and .net and .com), but it wouldn't work. So I called them. I spent over 2 hours on the phone, and they still aren't changed. The first woman that I spoke to was... well, let's just say that she was unhelpful. She finally transferred me to "second level support", where I sat on hold for an hour before giving up and hanging up. That was 2 days ago. I haven't had the strength to call back yet. I feel bad because we continue to impose on the good will of Curt while this continues (it's his DNS server that we're using in nd.edu), and it's of no other fault than the fact that NSI sucks.


I'm still defending the USA down here in Atlanta. We're doing a massive IP number reorginzation tonight -- the changes go live in DNS at 1630 EST. We have a whole class C network to ourselves, are are only using less than half of the available IP addresses. So we're shifting all the IP addrs down to the lower 128 and letting GA Tech have the upper half back. So I'm going to go around to all the machines tonight and reset their IP numbers.

We're moving our servers around, too -- the old mail server is going to be retired (although it will stay on for the next few days, while the new DNS information propagates around).

I did a nessus scan the other day to determine what IPs were being used and which were not (DNS really didn't match what we had at all), and I found an unpatched IIS on one of the windows servers. Gulp. I immediately told my boss about it, and it started what can best be described as a political free-for-all brouhaha.

Suffice it to say that it took about 24 hours to get the machines powered off, and some people were very unhappy about that. They're going to need to be reinstalled, 'cause I'd find it extremely unlikely if no one has cracked them yet -- the IIS doors were wide open, with bright blinking neon lights, "Crack me! Crack me!".

Ah well. It's good to know that good technical sense finally prevailed over the political disputes. What will happen with those machines in the long run has yet to be determined, but at least they're off for now.

June 17, 2001

We're the pros from Dover

I heard Bjork on the radio on the way in this morning. You don't hear her stuff much on the radio anymore.


The movie MASH was on TV yesterday, and I watched part of it from my hotel room. MASH has to be one of the funniest movies of all time. Ever wonder where Hot Lips got her name? You gotta watch the movie to find out.

"Goddamn Army jeep!"

I think the football segment of the movie has got to be one of the best football movies ever. "I think their ringer just made our ringer."


I continue to have less and less hope in the current myrinet code. It seems to be a sinking morass of race conditions. I fix one, and another one appears. The next one inevitably shows up in an innocuous single test failing in the test suite. After tracing it down, it typically turns into a conglomeration of events that ends up in some memory location being used twice. <sigh>

This is completely the fault of stealing from the TCP RPI -- the myrinet RPI reflects many of the same assumptions that the TCP RPI reflects, most of which aren't necessary when using gm for communication. The central assumption that has caused the most Badness is that when you read() from or write() to a TCP socket, you may or may not transfer all the data that you expect to transfer.

Since most of the time you're not allowed to block, you have to have extensive bookkeeping to remember exactly where you were in reading/writing a given message. The next time you enter the state machine, you have to try to continue reading/writing from where you left off.

Hence, each socket has some state associated with it -- pointers for the current message being read/written, and how many bytes are left. This stuff is all redundant in myrinet, because it doesn't send/receive partial messages. Hence, when you send, it's sent. When you read, it's read in its entirety. However, the current code still uses these pointers that are associated with each "socket" (actually, we call it a "process"). It dawned on me while I was walking in that this could be the root of much Badness in the myrinet RPI. I'll spend the rest of today investigating getting rid of all of that stuff and see if I can send/receive directly from the MPI request in question rather than use all these temporary pointers/bookkeeping that is attached to each process.


T-5 days left on my current army tour.

I have to spend some quality time with my beret tonight and get it into shape.


I spent an hour on hold with Network Solutions yesterday before I finally got someone. Fortunately, the guy that I got was actually fairly cluefull. We managed to get the top-level DNS servers for lam-mpi.org|net|com changed to the IU servers.

The change apparently went in at 5am this morning; it'll take a day or three to propagate around the world. No domain that I have access to can see this change in DNS yet (LBL, MCS/ANL, ND, GATech, Telocity), so I hope it's propagating... I guess that was only about 6.5 hours ago, so it may not have been picked up by any of the respective local DNSs yet.

June 23, 2001

Can that thing measure a New York Minute? 'cause Jimmy could be walking through that door any minute. And this is New York Ci

Much has happened.


The easiest format is quickies
[sound effect: mad crowd cheering]


  • I'm officially an IU post doc. Can't remember if this has been in the journal yet. Pay starts very soon. Real money -- woo hoo!

  • I'm back from my Army 2 weeks. Literally moments after I turned in my rental car and got in the shuttle to go to the airport, it started raining. Hard. So hard that it was darn near impossible to see out the windows. Could I have timed that any better? It is unlikely that I will return to Atlanta for future ATs; not only are there only 3 unix machines (part of the AT that I just completed involved ramping down their Unix side), there is also some question as to the future of that specific office. I've got some contacts that I'll be following up with to see if I can get another computer-type posting (as opposed to being a battalion signal officer somewhere). We'll see how that goes.

  • Contrary to my last 2 AT's, I sent off my Army travel/pay paperwork immediately.

  • My trip home from Atlanta was otherwise uneventful. The plane was somewhat late in taking off from Atlanta, but that was no biggie.

  • Tracy picked me up at the airport and we went out to dinner. We ended up at a table right around the corner from Janna and some of Jim's MBA study friends. We ended up going back to Janna's and watching the new version of Charlie's Angels. Holy cow, did that movie suck! I give it 25 feet. It was so bad that parts of it were really funny, but it wasn't bad enough to be funny enough to be a worthwhile movie. It was just plain bad. They tried a whole bunch of Matrix-like special effects which were technically competent enough, but (for example) I don't think that the actors/actresses carried off the harness work well at all. They even left the door open for a sequel, but I highly doubt that that will happen.

  • (Editor's note: If you don't know the internals of LAM, skip the following item as it will make less sense to you than a one-eyed dog looking at a "hidden eye" picture that contains a Hindu translation of the Rosetta Stone) Brian has found a really troubling bug in the lamd w.r.t. some new code that I put in recently for sending back the routing table in a call to ldogetlinks(). I recently changed the code to split up the routing table to only send a portion of the table at a time because the nsend() glue in the lamd does not packetize -- it truncates over 8k. This was a problem when the routing table was over 8k (i.e., lots of nodes in the LAM universe). However, my changes were somehow causing failures sometimes on RedHat 7.1 (consistently in ldogetlinks() in tping, but not in ldogetlinks() in any other program). Much weirdness. We decided to punt on this for now, and put the old code back. Brian's work in the lamd will soon give us fully-packetizing nsend(), anyway, so the point will be moot.

  • Monty (of Ogg/Vorbis) just replied to me that the problems inside the Vorbis engine to making it parallel are only going to get worse, because of new things that they want to do, etc. (insert math mumbo-jumbo here). Bummer. He thinks it's still possible, though, but it will require some API work in libvorbis to support this. So the door is not closed yet; we'll see how it works out.

  • I spent today catching up on paperwork and snail mail that accrued while I was defending the country.

  • Target has a fairly nice and fully-functional web site. I just went there to buy a wedding present for Ken/Amanda (their names don't really combine well). I was pleasantly surprised.

  • queeg rebooted sometime while I was in Georgia because of a power blip. Bonk. His uptime is now only 9 days. Unfortunately, I don't have any records of how long he was up before that.

  • Some machine in indiana.edu got hacked this past week. Someone on the ND CERT list sent the press release around; this makes two hacks this year. Hmm... imagine if ND had a press release every time they got hacked! (the implication here is that it would overflow ND's PR department)

  • I downloaded the newest stable bladeenc and am re-ripping a few of my CDs. I re-ripped the Weapon of Choice song from Fatboy Slim (the one that I complained about a few days ago here in the jjc -- I never noticed how bad my MP3 was until I heard it on the radio, and the quality was much clearer).

  • I've been hearing a lot of Fatboy Slim on the radio and in movies these days. It's probably because of my support and the taglines that I've been giving him here in the jjc. You're welcome, Normy.

  • I'll be visiting IU this upcoming week sometime (gotta fill out paperwork) and then go visit Brian and crew down at Sandia in New Mexico next week.

That's all for now. More later.

June 25, 2001

I took Miller and Johnson and squished 'em together and picked 'em apart and got... "Monsoon".

And so it goes.

I'm reading Robert A. Heinlein's Stranger in a Strange Land. A good book. I think I grok it.

When I debarked from my plane to Louisville on Friday afternoon, I turned my cell phone on. I wondered how long it would take for new voice mail messages to show up. When I turned it on, it showed no new messages. About 20 seconds later, BEEP!, and my messages arrived. Even though I'm familiar with the technology how it works, it's somehow amusing to me that the simple act of turning on my cell phone causes a database lookup on a voice mail server somewhere in the depths of the Verizon network.

Interestingly enough, the "you have voice mail" indicator lit up while I was in Atlanta last week. It used to only do that when I was in Louisville -- you could get voice mail from anywhere, but your phone would only alert you to new voice mails when you were in your "home" area. I wonder if that's nationwide now.

Jortney are now using squyres.com as a temporary home for their domain (and therefore e-mail) while they move into their new home. John is distressed because there's no broadband available where they're moving to. Sucks to be him. :-(

We bought patio furniture today. Woo hoo. (I didn't have too many opinions on this stuff; Tracy mostly picked it out) We also finally got blinds for our great room. I am actually pleased about that; it's the last window that we still had sheets hanging on.

I finally bought a headset for my cell phone today. It comes with a real headset-style over-the-head band thingy, but also converts to a clip-on-the-ear thingy. My C*'s called me today on my cell phone, so it proved to be an excellent opportunity to try it out. It works great, and is much more comfortable than holding the cell phone up to your ear, especially for those who are on the phone for non-trivial amounts of time. The only disadvantage is that it's a bit to big and fragile to shove in my back pocket with my cell phone if I want to go out, so it's really only useful for in the car. Perhaps I'll just keep it in my laptop case; it's not too inconvenient to switch to the headset during the middle of a call. They do have the small plug-in-your-ear kind that has a separate clip-on mike that you could shove in your pocket with your cell phone, but I generally don't like those things.

I converted the LAM ldogetlinks fiasco to use a single nsend. After thinking about the problem some more, I'm not sure that multi-threading the lamd would have fixed this problem. Hence, I just changed the protocol outright to be simpler (albeit less efficient). Hopefully, this will fix all of our woes (RH 7.1 tping and MPI_COMM_SPAWN_MULTIPLE).


Night fell, and the sun rose again. A new day.

Must work on my OER today, and then continue to work on the Myrinet RPI (haven't been able to work on that since mid-last week or so). I started to re-write it from scratch, and was coming up with a much, much simpler model, but was forced to ditch all of that because it would break our compatibility with our shared memory RPIs. Arrgh...

June 27, 2001

Good thing we didn't do any theropy, Dave

Emacs "C-x v =" is your friend.

Went up to IU yesterday. I got all my paperwork sorted out, and got my accounts setup in indiana.edu. Sooner or later, jsquyres@indiana.edu will start working (I think it works now, but I'm not sure where it's forwarding to...). I saw several of our new machines (several sun blade 100's and some big Dells for linux and win2k). Jeremiah and Ron seem to be establishing themselves nicely at IU.

We discussed the LAM license issue for a while. Lummy wants (in order) one of the following: Clarified Artistic License, Apache, BSD/MIT. I'm not too personally fond of the CAL -- it seems to be a bit restrictive-sounding on the issue of distributing binaries. Apache is not GPL-compatible, so I think we need to ixnay that one because it might lock us out of some linux/BSD distributions. I would not mind a BSD/MIT license. We'll see how this plays out.

I found out that my Rasmus number is 2. It turns out that Todd went to both high school and university with Rasmus (the author of PHP). Todd even had an account on Rasmus' BBS back in their high school years. It's a small world.

I have switched to having my pine config on the IMAP server. I have found that there are three different places that I typically access my mail from: my desktop, my laptop, and one of the workstations at school (which share a common filesystem, so it doesn't matter which one it is). So whenever I update my pine configuration, I have to update it in three places. This has proved to be annoying, and I rarely remember to do it. End result: when I run pine on an nd.edu machine, I don't have much of the setup that I'm accustomed to. Bonk. So I uploaded it to the IMAP server and made an alias ("gpine") for the lengthy command line that is necessary to fire up pine and retrieve my config from the server. Seems to work quite nicely. Now, if only pine would support disconnected IMAP operation...

Still haven't figured out how to access whale.cs.indiana.edu (the CS IMAP server) -- it doesn't seem to accept my password.

More later...

June 29, 2001

You have nice hands, Dave

Ick. My last entry was an example of good formatting gone bad. jjc even warned me about it, but my fingers just acted by themselves (really, I swear, officer!) and submitted the entry anyway. This is the same entry, but with the formatting fixed.

I had to delete the last entry from the jjc database as well, so that it didn't show up on the web page. Urgghh...


Great quote in text talking about the history of /bin vs. /usr/bin vs. /usr/local vs. /opt:

Manuals for these programs are present for one funny reason: Steve Bourne ran a cron script that checked /usr/bin for new/updated programs each night. If there was no manual or the manual had not been updated, the binary was removed by the daemon.

We finally got the project name "oscar" at SourceForge. Mike from IBM is filling up the site today and tomorrow. Finally!!

I was suddenly hit by the urge to hear "Echos" by Pink Floyd. It's playing right now. Mmmm.....

My paper got rejected from SC2001. Bonk. From the reviews, it was apparently mainly because I didn't have any results in it (they only wanted an extended abstract -- full paper to be submitted later). I was right up against the word limit as it was, so I put a blurb in there about "results will be included in final paper". Both Lummy and I thought that would be ok for the extended abstract. Apparently not. <sigh>

Tracy and I built our new patio furniture in the rain yesterday. The furniture was delivered to our back patio during the day. Shortly after Tracy got home, it started raining. Oops -- all the furniture is still out there, and is in cardboard boxes! So we decided to just build it right then and there. When was the last time you built patio furniture in a thunderstorm?


Had more interesting discussions today with Brian about multi-threading LAM and the lamd (at least 2 hours worth). Good stuff, but very confusing. Wow. He did a good writeup of it in his journal.
Talked with Dog for a long time about what he's going to be doing in LAM, too. Very cool stuff. He's going to modularize some of the stuff in LAM that we use for bootup and various system services on different kinds of systems (regular rsh, scyld, tm, globus/grid, condor/grid, condor, etc.). We more or less figured out how to do it such that it can be entirely self-contained in its own module directory (e.g., modules/rsh or modules/scyld).

The most obvious example of where such things would be useful is for lambooting -- each different kind of system has different ways to launch executables on remote nodes. But there are other things as well -- Scyld's whole "there's little or no filesystem on the nodes" concept really threw Brian for a loop when he did the Scyld stuff, for example.

Here's the loose plan:

  • The idea is to aggressively build as many of the modules as possible. Hence, it tries to configure all of the modules. If the configuration of a given module fails (e.g., libbproc can't be found
    -- so we must not be on a Scyld system), we don't build it.

  • Additionally, the overriding goal here is that a module is completely self-contained in its directory -- the addition of a new directory requires no changes to any other part of LAM.

  • LAM's top-level ./configure will traverse the directories in modules/ and look for a configure script. If a directory has one, the top-level ./configure will run it.

  • If the configure script in that directory succeeds, the top-level ./configure will add it to the "to be built" list. If the configure script in that directory fails, that directory will be ignored.

  • For all modules listed in the "to be built" list, the top-level ./configure script creates a .c file (perhaps share/etc/modules_init.c) that is part LAM (in liblam.a somewhere) that initializes the modules. This .c's only purpose is to call the "init" function of each module. So some standard header is written out, followed by a list of "lam_module_NAME_init()" calls, where NAME is replaced with rsh, scyld, etc. (i.e., whatever the name of the module's directory is). This is because the function names cannot be the same, or we'll get linker errors. So instead, there is a naming convention so that we can build the function call list on the fly.

  • Indeed, the API that these modules will need to support will also not have function names (for the same linker error issues) -- the init function of each module will need to supply a struct full of function pointers of all the module functions.

  • Assumedly, LAM will have one or more modules built at compile time. Later, at run time, LAM must determine which module to use. One of the module functions (perhaps it will be the init function itself) will be used to make this decision. That is, keep the decision for whether that module should be run or not in the module itself -- the module can do whatever test it wants to determine if it should be run. For example, if the tm module detects the environment variable PBS_ENVIRONMENT, then the tm module should be used.

  • However, one can imagine situations where multiple modules may report "yes, I'm the module to use". So each module should probably also have a command line flag that forces its use to resolve ambiguities. For example, say you're running in PBS, but also happen to be in a Globus environment. In this case, you'll probably want to use the Globus module, not the tm module. However, both modules would probably report that they could be used. So we'll have a flag such as "-Mglobus" to lamboot that would tell all the modules "if you're name ain't 'globus', you ain't runnin'." But most of the time, there probably won't be an ambiguity, and the modules can just determine themselves which module to use (optimize the common case).

This is actually quite a useful concept. There's a few other details that I didn't mention (e.g., for all the API functions, there will also be "default" versions such that if a given module supplies a NULL function pointer for a given API call, the default version will be used instead -- somewhat like C++ base/virtual functions).


I love interpreted languages that have eval functionality. This allows you to effectively have self-generating code. I'm guessing that this is the entire premise of Spielberg's new "A.I." movie --
self-modifying php and perl scripts that went bad and ended up going to war with each other to prove language supremacy once and for all.

Let us not forget the following quote from the Field Guide to Your Unix Sysadmin:

Typical root .cshrc file:

TECHNICAL THUG: Longer than eight kilobytes. Sources the output of a perl script, rewrites itself.

June 30, 2001

Go to hell, Costas

This is an awesome post from Jack of the Ogg/Vorbis project:
(Think: Willy Wonka)

Um Pah Lum Pah, Du Pi Dee Doo,
Proprietary formats will make a slave of you.
Um Pah Lum Pah, Du Pi Dee Dee,
Best to be wise and not use M P 3.

What do you get when you make an M P 3?
Besides artifacts and patent royalties?
It's not to late to open your mind.
Use Ogg Vorbis Don't
Fall
Be
Hind.

Don't you pay those
Ger-er-mans.

You could live in
Happiness too! Like
the
Ogg
Vor
Bis
Programmers
Do!


I took some friends to the airport today where they're leaving for a vacation. I took a very sub-optimal route home, though -- I really need to learn the roads around here better. Doh!


Rich Murphy made some groovy points about my last journal entry about having multiple "modules" for system services (including booting) in LAM. His main point was that we should just use dynamically loadable modules and avoid what I was talking about. If I don't post something about this, I'm sure that Darrell will say the same thing. :-)

Here's part of his e-mail:

Let me make a suggestion. Get to know and love dlopen() (or equivalents on other platforms... Solaris = dlopen(), linux = dlopen(), IRIX, AIX, etc., I have no idea). Basically, you make this part of the code modular by loading a shared object. Each shared object has a function, like lam_boot_module_init(), and maybe a lam_boot_module_finish(). Then, you set up a handle into your lam boot module's functions, say you want each module to implement a generic open_remote_node() and close_remote_node() function. You have a structure like:

[code snipped for brevity]

The module loader uses dlopen(), and can be driven from some init script. Then you can use dlsym() to find your lam_boot_module_init function. Call it and get the handle to everything else you care about. Then you're done.

Also, you can require that other modules dynamically load the libraries they need...

The best part is you don't have to rebuild lam every time, you don't have to futz with finding unique names every time, and your interface is perfectly well defined.

You probably want static linking in liblam.a, but why???

His last question is exactly right -- I do want static linking. I have three main reasons why:

  1. When using dynamic libraries/objects, the user inevitably gets
    burned by a) using a wrong/old version -- I'm reminded of DLL
    russian roulette in windoze -- b) paths changing and therefore
    having to set LD_LIBRARY_PATH (or some equivalent), or c)
    doing a new installation of LAM can fuck currently-running MPI
    programs (e.g., MPI programs that run for weeks).

  2. Difference of dynamic linker functions on different OS's; creates
    headaches for us with [potentially] lots of #if kinds of
    statements. Ick.

  3. Creating .so's on different architectures is, at best, a
    nightmare. Libtool only partly solves the problem. Hence, we
    make shared libraries an option, not a requirement. Portability, portability, portability! Unix != Unix.

My end goals are:

  1. Maximum portability and reliability with minimum effort. If I can have configure write out a single .c file instead of changing my whole paradigm (using dlopen(), adding environment variables to potentially specify alternate locations / version numbers of shared libraries, build shared objects, etc. -- that's a large effort.

  2. Minimum change for the user to fuck it up, particularly after the
    installation (see #1, above) -- put it this way: we got a
    question on the LAM list the other day from a user asking how to
    set $PATH. Do I really want to explain the nuances of shared
    libraries to these kinds of users? No.

    Consider the target audience for MPI: scientists and
    engineers. NOT necessarily computer science folks. People who
    still write in fortran. Why? Because it's simple and it works.
    They can chunk in their formulas in really shitty coding styles
    and rely on the compiler to spit our nice optimized code for
    them. They just want it to work -- they don't care how.

    This guy asking about $PATH is a typical example of that.
    So while we privately laugh at him, we'd be pretty hard pressed
    to explain the basics of how a particle beam accelerator works,
    and/or how to make adjustments to it. So one can see his
    viewpoint, at least.

    Granted, we'll probably never have to use a particle beam
    accelerator, but you get my point. :-)

    MPI is just a tool. And it should be darn easy to use the
    run-time environment that is required to run it. And by "darn
    easy", I mean adding one entry to your $PATH, if any. If
    you're very adventurous, you can also add something to your
    $MANPATH. More than that, and the users' eyes glaze over, we
    get bombarded with questions on the mailing list, and users think
    "this LAM is a piece of crap -- why do I have do do all of this
    just to run a job?" They might be damn good technical reasons to
    do the 20 different things to your environment before running a
    LAM job, but no "normal users" will do them. It's almost a PR
    issue. Know your audience. Target them. Make things easier for
    them so that they can concentrate on their real work, not the
    intricacies of how MPI/LAM/whatever works.

    Software needs to suck less, and unfortunately I can't make
    LAM not suck less if I use C++ or shared libraries. Yet. :-(

There were some other interesting side issues in that e-mail conversation, but that's the gist of it.


I'm getting to the end of Stranger in a Strange Land. It's actually getting disappointing. :-(

It started off well as typically SciFi with a human that had grown up with Martians and was returned to Earth. But towards the end of the book, it's just degenerated into discussions about sex and whatnot that seem somewhat frivolous. I understand the point that Heinlein is trying to make, but (IMHO) it could well have been made without descending into semi-porno.

But that's just my opinion...


Watched October Sky with Tracy last night. A good warm-fuzzy flick, with elements of "engineers rule!". I give it 15 minutes.

I also watched End of Days with Arnold Schwartezzenaggerama in it. I thought it was a good movie -- I've always enjoyed christian-end-of-the-world / mysticism movies. However, I can see how it didn't do spectacularly well in the theaters 'cause Arnold portrays quite a different kind of character that his fans know and love. Even though he wins in the end, he's portrayed as a weak ex-cop. Plus he has no witty one-liner puns that he's famous for.

But I enjoyed it, and it had some really great special effects. 20 minutes.


If you're ever in an argument and you start losing, and perhaps realizing that your position is less than correct, you can abruptly win the argument by saying, "Yeah, that's just what Hitler said!".

Most everyone will recoil in horror at the thought of being compared to Hitler. Hence, by invoking a known abhorrent image that probably has absolutely nothing to do with the conversation, you win.

It works the other way, too. If you're arguing with a Neo-Nazi, just say, "Yeah, that's just what Jesus said!" The end result will be the same.


The Myrinet struggle goes on. I find bugs, I fix them. I got it to a point where all the tests that should pass on the Hydra did, and then took it out to LBL. There I found a few endian issues, and a minor seg fault in connect_all().

Now I've got some insidious problems in COMM_SPAWN that I think are actually symptoms of something else. <sigh>

About June 2001

This page contains all entries posted to JeffJournal in June 2001. They are listed from oldest to newest.

May 2001 is the previous archive.

July 2001 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34