It's been quite a while since I've done an entry, and I blame Arun. Without his daily (and sometimes more than that) journal entries, how can I be expected to remember to do my own?
How cool is this? Tracy's products are up on GE's web site.
Until about 3-4 weeks ago, I thought she was working on 2 models: 1 gas, and 1 electric. But she's really been working on about 90 different models! Yes, nine-zero.
Cool! (GE even did some kinda cool stoopid-browser-trix things on that website, too)
They just started going down the production line a few weeks ago (which is pretty cool in itself); they won't be available in stores for a little while yet.
My wife rocks.
I've been doing a lot of development work with poggenc. The first generation is essentially finished -- I'm currently working on plugging up the last few memory leaks. I have found at least 1 bug in the Sun Forte 6.1 STL implementation -- std::vector::resize() causes a read-from-uninitialized error. Doh.
poggenc is still threads-only (no MPI yet). I thought that I knew a lot about threading before I started this, only to discover that I didn't know jack about threading. My original design had many locking bottlenecks, such that encoding with multiple threads (or even one thread!) had so much overhead that it was slower than hell. I had to redesign a bunch of the interfaces and reduce the numbers of locks necessary by a lot in order to get the processing time down to a reasonable level.
Still, however, it's less than linear speedup with multiple threads on SMPs. Of course, nothing can exhibit perfectly linear speedup, but this isn't close enough for my liking. I'll continue to investigate that.
I started some web pages to explain how this works, with the idea that some of this text can be morphed into dissertation-quality text afterwards. i.e,. the web pages are a dry run for a dissertation chapter.
Saw an old Army cadet of mine this past weekend; Brent and his wife Aimee (I hope I spelled her name correctly). He was one of my Airborne plebes; I beat up on him as part of his training (and he's a better person for it! :-). It's a small world -- he now works for GE Appliances here in Louisville. It was good to seem him again, and to hear what he ended up doing in the Army, and what he's doing now. Ironically, he outranks me -- he finished as a Captain, while I'm still a 1st Lieutenant. Life is amusing that way...
He's working on an idea with an old commander of his who is at the Army War College. It's an overhaul of the Army's evaluation system. It's pretty cool, actually. There's a web and technology component (which is why he asked me). He asked if I could help, and I probably will throw a bit of advice their way (contributing to the open source/freeware cause, of course), but I don't have time to do any actual programming for them. Ah well. :-\
Over the past few days, we've (me-n-Andy) been coordinating a trip up to the University of Wisconsin/Madison for a visit with the Condor folks. We've got it all set on the first week of February, but I forgot my @#$#@$% dentist appointment that week. Arrghh!! Tomorrow, I've gotta see if I can get it rescheduled (my dentist isn't open on Mondays).
Other than that, it looks like it's going to be a great trip; I'm going to give a talk on LAM. After a little discussion (we've got a mailing list setup for the LAM and Condor folks for ongoing collaboration), we decided to split my talk into three parts:
- MPI vs. PVM: theoretical / practical reasons, with a few small code samples
- Talk about how the lower layers of LAM work (daemon-based stuff, etc.)
- An intro to what we're hoping to do with a Condor + LAM collaboration, what I've tentatively nicknamed "Lamdor" (like the name?)
It should be a good time.
Speaking of the Goodness of LAM, there's a Linux Integrator company (Aspen Systems -- http://www.aspsys.com/) who wants to install an 800 node Beowulf with LAM and Myrinet 2000. How cool is that?!
LAM: Lust for Glory!
I visited ND last week for a few days. The lab is a total disaster with water damage and whatnot. However, I've heard the most sensible idea for solving the problem that I've heard in years: instead of trying to fix the roof, they're going to essentially install an upside-down umbrella in the attic under the roof to catch all the water that seeps in from the roof. This water will be funneled to a new drain pipe that they installed inside Cushing. That's right -- they drilled through a hole in the floor 325 Cushing, and also through the floor in the room below us, and will be installing a massive drain pipe from the attic all the way down to the ground floor and outside, so that the leaking water can flow all the way from the roof to the outside, safely.
Engineering wise, it's actually pretty cool.
While I was at ND, I managed to grab Dan from Scyld on the phone. We had a good chat. He's very pleased with the progress on poggenc, and we talked about LAM/Scyld as well. We think we came up with a hack for LAM/Scyld. It's not perfect, but it will [hypothetically] allow:
- LAM to work on Scyld machines.
- An RPM of LAM to be distributed that will work on both Scyld and non-Scyld machines (decision is made at run time).
We'll see how that works out.
Also while I was at ND, Dog and I met with Paul and Johanes to "turn over the keys" of the Hydra. Dog and I are now no longer the primary caretakers of the Hydra -- Paul and Johanes are. Of course, we'll be in a transition mode for a while; Paul and Johanes will probably have to consult us with any problems with PBS for some time. But at least we've started the transition.
Two things I have to do before I am fully out of the loop:
- Integrate the Maui Scheduler and QBank software into PBS. This is because Rich Sudlow has finally decided to take us up on the CTC deal where ND HPCC users get 10% of the cycles of the hydra per month. To do this, we need an allocation-tracking program (QBank), and a scheduler that can interface with it (Maui). I'll install this stuff, and tell Paul/Johanes about how it is setup when it is done. Hydra PI's and students will either get an unlimited monthly allocation, or an allocation so large that they cannot spend it all. All the HPCC users will share a common allocation that amounts to 10%
of the hydra cycles per month.Interestingly enough, the Maui scheduler did not compiler under Solaris. It was a handful of small items that were "wrong". I corrected them and sent a patch to the Maui scheduler list. The author was very grateful and promised to include the fixes in the next release. How cool is that?
- Finish the PRS once and for all -- there's some calls to
popen()that need to be replaced with formalfork()/exec()stuff (for various technical reasons). This is of lower priority, but it does need to get done eventually.
In Army news, after a weird sequence of events, it looks like I'll be heading down to ARL/STB (Army Research Lab / Software Technology Branch) Atlanta for one more 2 week stint before they get shut down. I have to do my annual 2 week tour before 1 March, so I could go down there immanently. It depends on the trip to Madison and my dentist appointment; we'll see what happens there.
Also, my PMO (personnel management officer... took a minute to remember that) at AR-PERSCOM (Army Reserve Personnel Command) sent me an e-mail at the end of the day saying that she's got a line on a new position for me in ARL since STB is being shut down. I'll be talking to her tomorrow about it. This likely means that I won't be heading back to be a BSO (Battalion Signal Officer) for some combat unit after this 2 weeks. Wooo hoo!
As of 8:47pm, I have 433 xmms's running on queeg.