« Midnight in the garden of good and evil | Main | All things being equal, LAM rocks »

It all started with parallel bladeenc...

Great quote today from HPC Wire:

"The truth is, great software comes from great programmers, not from a large number of people slaving away. Open source can be useful. It can speed things up. But it's not new, and it's not holy water."

-- William N. Joy,
chief scientist,
Sun Microsystems

There's really two separate thoughts in that quote, but they're both true. More people need to recognize this.


Spent the majority of the last 24 hours writing my talk for tomorrow's LSC lunch. The reason that I've spent 24+ hours on this as opposed to 2-3 is because it has turned out to be much more interesting than I originally thought. Lummy has also found it to be very interesting; it is even possible that this will end up in my dissertation.

So tomorrow's topic actually started many months ago. I was visiting here in Berkeley back in January of this year, and only had a few MP3s loaded on my laptop from my CDs. I was out here for quite a length of time (3 weeks or so), and I was getting sick of the same old MP3s every day. So I started looking at Lummy's CDs (he wanted to rip them, too). So I volunteered to help. :-)

The problem was that it was just too damn slow. The encoding took forever. So after a few nights of hacking, I got a preliminary version of the parallel bladeenc MP3 encoder working (see the "technical details" page on that site for all the nitty gritty of how parallel bladeenc works). This did vastly improve the MP3 encoding process; we could listen to songs faster this way (and before you ask, I only kept MP3s [that were generated from Lummy's CDs] that I already own the CD of). However, I never did get the parallel encoder Right -- it's just about right, but not quite right. Without going into Big Detail, suffice it to say that there's some MP3 framing issues that Jeremy Faller and I tried to figure out and gave up in light of the fact that we had no MP3 documentation.

This brings up an important point -- even though the parallelization of bladeenc was at the very top level, I had to dive down very deep into its tangled code in order to parallelize it. Indeed, there is an odd bit reservoir that has to be drained in order to make the whole scheme work. It took a long time to figure out, and like I said, there's still some MP3 framing issues that are unresolved. To end this already-lengthy explanation: you can't diff the results of parallel bladeenc with that of serial bladeenc. Bonk.

(Some of the following may have already been in a previous journal entry -- I don't remember -- so you'll cope if you've already read it)

So I basically let that go, and stumbled across the vorbis project (I think that I saw it on slashdot or something). Vorbis intrigued me for the following reasons:

  • Vorbis is a totally free encoder -- it does not have the same legal issues surrounding it as MP3 encoders do.

  • The vorbis algorithm supposedly has better sound quality than the MP3 algorithm. I won't argue this either way -- I don't know much about these kinds of things... math, ick -- but it does sound good. :-)

  • The vorbis stuff is all in a library --
    libvorbis.a. So writing an encoder is trivial. Indeed, they have a sample encoder in their distribution. So no diving into the source code to figure out how it works.

  • The vorbis library stuff specifically discretizes the steps in the encoding process; all the steps are fast, except one (hmm...).

  • There is an active vorbis development community. I got very little feedback from the bladeenc community. :-(

So with all this in mind, I posted to the vorbis development list asking "has anyone thought about parallelism?". After some informative discussions with the main developers, I started thinking about it (although not actively coding). After coming up with a suitable architecture, I spent a few hours one night and coded up a multi-threaded vorbis encoder. It seemed to work like a champ... but the output was not diffable with the output from the serial encoder. :-(

I queried the vorbis development list again, and found out that the library is not thread safe -- yet. There's apparently still one issue that makes it not thread safe, something that the developers eventually intend to fix. Oh well.

So then I started thinking about an MPI architecture for such a beast. Conceivably, it would be very similar to the architecture of parallel bladeenc. However, I would like to use threads if/when possible, so I started thinking about mixing MPI and threads (and not in a crass OpenMP/MPI kind of way), and how that would work, particularly in light of the fact that the 2 open source MPI implementations are not thread safe. Ugh.

However, since the vorbis stuff is neatly discretized, it really reduces down to a generic parallel master/slave problem (i.e., replace vorbis_analyze() with a generic calculate() function, and the framework is applicable to any master/slave problem). I realized this yesterday after I made an offhand remark to Lummy that I was thinking about talking about a potential parallel vorbis encoder for my LSC Friday talk.

I spent some time thinking about it (much scrap paper, test programs, etc.), and realize that this is actually quite a thorny issue. It's much more complicated (and interesting!) than one would initially think. Having a generalized master/slave solution for local and remote computation is something that no one has done yet. Yes, we all understand master/slave, but no one has published how to do this with threads and MPI. Indeed, if such a thing was coded up, it could certainly serve as a framework for any task farm kind of parallelism -- only a small number of interface functions would probably be required:

  • input / preprocess (on the master)
  • calculate (on the slaves)
  • postprocess / output (on the master)
  • registration for the marshalling/unmarshalling of data to be exchanged between the master and slave nodes

So I've spent about 24 hours of thinking and writing about this, and when I went to print it out for the first time, I was surprised to discover that I had written about 12-13 LaTeX fullpages about this topic (a lot of which is pseudocode). I've talked with Lummy about this; I only realized late this morning that this is an extension of the parallel image processing toolkit (PIPT) -- it's a few levels beyond what the PIPT is, but some of the ideas in it are definitely influenced by the PIPT. Sadly, I'll only be able to deliver some excerpt of this tomorrow, but I think Lummy and Bill and I might discuss this further this evening. Andy things that Jeremiah and Rich might follow up on this (if I don't) for GGCL and other things.

I'm not going to go into detail here about what I have already written (perhaps I'll fill in the details here later). I'm just struck by the irony that this all started by a personal pet project and may end up as a chapter in my dissertation (we'll see, but Lummy did mention it).

Threading, in general, can get quite complicated. Having multiple independent threads that don't interact with each other is easy --
anyone with a pthreads book (or man page) can do it. But having multiple threads that have to interact with each other and share data can get arbitrarily difficult. This whole scheme pretty much is in the latter domain -- threads need to share queues and data and whatnot; there's some thorny locking issues involved (which makes it so interesting). I haven't even solved all the issues yet -- I don't know how to give some slaves small amounts of work and give other slaves large amounts of work, for example (there's some convoluted locking issues involved).

Who knows -- this might go nowhere. But it seems pretty interesting, so I wanted to get it On The Record.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on August 31, 2000 9:59 AM.

The previous post in this blog was Midnight in the garden of good and evil.

The next post in this blog is All things being equal, LAM rocks.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34