« Stinkbutt | Main | 1-800-J-JAMES »

'morning sir. Are you going to introduce me to your bi-atch?

Tuscon. a.k.a., "I never new that queues could be so complicated!"

DSL went out while I was typing this. <sigh>

Looking through my log, I see that connections to ND were really spotty yesterday (indeed, I felt that heavily as I was working), including a full 20 minute outage around 2pm, a 10 minute outage on Wednesday, an outage from 4am to 7pm on Saturday (although that may well have been my router getting hosed -- when the power blinked recently, the router froze until I rebooted it), fairly crappy connectivity last Monday-Wednesday, and some sustained outages on the previous Saturday....

Overall, it's not as bad as it sounds. Sustained outages (like the one I'm having right now...) aren't too often, and seem to usually be the fault of Bell South (packets dropping in Atlanta). Spotty connectivity does happen not infrequently, but ND might well be to blame for that, because their external router is so overwhelmed, and the internal network is, well, less than perfect (oh for the days when Shawn was running the network...). Indeed, it's quite possible that my connectivity to IU will be better than my connectivity to ND if I only have to worry about the periodic sustained outages and not spotty connectivity.

We'll see how it works out; I don't have accounts a IU yet, but that paperwork is crunching through the vast papermills... I haven't decided yet on how to change my e-mail address. I might wait until after my defense; haven't decided yet.

  • Wow; the MPI queue turned out to be quite complicated; I mostly
    worked out the model in one days, but spent all the next day
    working on the engine itself, and a few details (bugs) in the
    outer parts.

  • everything seems to work except some MPI_Cancels at the end (I
    think the requests are already dead), and it's slow. Gonna
    have to revamp the enqueueing/dequeueing so that you can
    enqueue/dequeue lots of things at once, not just one at a time
    (why didn't I learn my lesson the first time?)

  • enqueueing dequeuing a list at a time really helps
    (std::list<>::splice() is very handy -- O(1), baby!)

  • BRAINSTORM: don't enqueue a list (complicates matters greatly,
    especially w.r.t. temporary buffers and whatnot) -- just get
    control of MPI from the event manager and therefore can do
    direct sends/receives. This also allows for arbitrary and
    potentially interactive send/recv protocols for user data. WOW
    -- that makes an AMAZING difference -- <1sec vs 45-60 seconds!
    (test case is particularly painful: 1024 short messages to each
    slave, each message contains 1 int)

  • Also had another thought -- this polling model in the MPI event manager is definitely sub-optimal (have to periodically steal cycles from the other threads to check for MPI progress). The main reason for the polling model is because they are always pending receiving (from children) -- the children may send to their parent at any time. So the parent always needs to have pending receiving posted. So we have to check periodically if any of the receives have finished --
    hence, the polling model. But what if there was a way to block and wait for such progress? I'm talking about using a mechanism outside of MPI. That is, open a secondary socket that is used just for signaling. When a child sends a message to its parent, it does the MPI_Send, and then tweaks the socket. The parent can be blocking on a select() of all the sockets from its children. When it select() indicates that one of them is ready, it knows to go complete the MPI_Recv. This is somewhat icky because we have to go outside MPI to do it, but it would work, and potentially could save a lot of time since the progressively-slower polling model can make receives wait an arbitrary length of time before they can actually complete. I may or may not pursue this, but I wanted to record the idea...

  • I seem to have gotten everything working now -- for single-level only (i.e., only one RelayCalc). The current scheme won't work with multiple levels because of the way RelayCalc distributes input data and expects to collect output data, and the way that RelayOut sends back data.

  • Had to overhaul the EOF/EOI progression through the queues and relays a bit to make them work and to ensure that there would be no memory leaks. Children now assign their own stream ID's to each input data set; they receive a chunk of input data from the parent, give it a unique stream ID, enqueue it all, and then immediately enqueue an EOF for that stream. The parent keeps track of the "real" stream ID by associating it with the child's ID; when the child returns output data, it uses the ID of the child to look up the stream ID of the data that it is returning. This scheme allows multiple things:

    • Children can completely clean up state after each chunk of data from their parents are processed (trust me on this one), because each chunk of data from a parent is a treated as a discrete, complete stream in itself.

    • The RelayOut can wait to send all of its output data to the parent until it gets the EOF on that stream. When it gets the EOF, it knows that the entire chunk of input data that was initially received from the parent has been processed, and it can send it all back en mass. This component (buffering output data) is needed to allow multiple levels to work, as mentioned above -- it hasn't been implemented yet.

  • Other things that are still needed:

    • Handling of faults -- children (and by induction, their children) will die when their parent dies. Parents need to mark a child down when it dies, and do the necessary bookkeeping to back out of any current transactions with that child, and ensure that that child will be ignored for the rest of the computation.

    • Startup "all at once" with no spawning model. This will be necessary for IMPI runs. This will be more software engineering than rocket science (although it won't be trivial :-\ ) -- the software has to support both models.

    • Support both MPI and non-MPI models. I had some preliminary infrastructure in there for that now (i.e., configure/compile without MPI -- just support a single SMP), but I've long since broken it --
      you can't compile without MPI. I likely won't fix this until after my defense...

There are 483 copies of xmms running, out of 562 total processes (85%).

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


This page contains a single entry from the blog posted on May 25, 2001 10:58 AM.

The previous post in this blog was Stinkbutt.

The next post in this blog is 1-800-J-JAMES.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34