Tuscon. a.k.a., "I never new that queues could be so complicated!"
DSL went out while I was typing this. <sigh>
Looking through my log, I see that connections to ND were really spotty yesterday (indeed, I felt that heavily as I was working), including a full 20 minute outage around 2pm, a 10 minute outage on Wednesday, an outage from 4am to 7pm on Saturday (although that may well have been my router getting hosed -- when the power blinked recently, the router froze until I rebooted it), fairly crappy connectivity last Monday-Wednesday, and some sustained outages on the previous Saturday....
Overall, it's not as bad as it sounds. Sustained outages (like the one I'm having right now...) aren't too often, and seem to usually be the fault of Bell South (packets dropping in Atlanta). Spotty connectivity does happen not infrequently, but ND might well be to blame for that, because their external router is so overwhelmed, and the internal network is, well, less than perfect (oh for the days when Shawn was running the network...). Indeed, it's quite possible that my connectivity to IU will be better than my connectivity to ND if I only have to worry about the periodic sustained outages and not spotty connectivity.
We'll see how it works out; I don't have accounts a IU yet, but that paperwork is crunching through the vast papermills... I haven't decided yet on how to change my e-mail address. I might wait until after my defense; haven't decided yet.
- Wow; the MPI queue turned out to be quite complicated; I mostly
worked out the model in one days, but spent all the next day
working on the engine itself, and a few details (bugs) in the
outer parts. - everything seems to work except some
MPI_Cancels at the end (I
think the requests are already dead), and it's slow. Gonna
have to revamp the enqueueing/dequeueing so that you can
enqueue/dequeue lots of things at once, not just one at a time
(why didn't I learn my lesson the first time?)
- enqueueing dequeuing a list at a time really helps
(std::list<>::splice()is very handy -- O(1), baby!) - BRAINSTORM: don't enqueue a list (complicates matters greatly,
especially w.r.t. temporary buffers and whatnot) -- just get
control of MPI from the event manager and therefore can do
direct sends/receives. This also allows for arbitrary and
potentially interactive send/recv protocols for user data. WOW
-- that makes an AMAZING difference -- <1sec vs 45-60 seconds!
(test case is particularly painful: 1024 short messages to each
slave, each message contains 1 int) - Also had another thought -- this polling model in the MPI event manager is definitely sub-optimal (have to periodically steal cycles from the other threads to check for MPI progress). The main reason for the polling model is because they are always pending receiving (from children) -- the children may send to their parent at any time. So the parent always needs to have pending receiving posted. So we have to check periodically if any of the receives have finished --
hence, the polling model. But what if there was a way to block and wait for such progress? I'm talking about using a mechanism outside of MPI. That is, open a secondary socket that is used just for signaling. When a child sends a message to its parent, it does theMPI_Send, and then tweaks the socket. The parent can be blocking on aselect()of all the sockets from its children. When itselect()indicates that one of them is ready, it knows to go complete theMPI_Recv. This is somewhat icky because we have to go outside MPI to do it, but it would work, and potentially could save a lot of time since the progressively-slower polling model can make receives wait an arbitrary length of time before they can actually complete. I may or may not pursue this, but I wanted to record the idea... - I seem to have gotten everything working now -- for single-level only (i.e., only one RelayCalc). The current scheme won't work with multiple levels because of the way RelayCalc distributes input data and expects to collect output data, and the way that RelayOut sends back data.
- Had to overhaul the EOF/EOI progression through the queues and relays a bit to make them work and to ensure that there would be no memory leaks. Children now assign their own stream ID's to each input data set; they receive a chunk of input data from the parent, give it a unique stream ID, enqueue it all, and then immediately enqueue an EOF for that stream. The parent keeps track of the "real" stream ID by associating it with the child's ID; when the child returns output data, it uses the ID of the child to look up the stream ID of the data that it is returning. This scheme allows multiple things:
- Children can completely clean up state after each chunk of data from their parents are processed (trust me on this one), because each chunk of data from a parent is a treated as a discrete, complete stream in itself.
- The RelayOut can wait to send all of its output data to the parent until it gets the EOF on that stream. When it gets the EOF, it knows that the entire chunk of input data that was initially received from the parent has been processed, and it can send it all back en mass. This component (buffering output data) is needed to allow multiple levels to work, as mentioned above -- it hasn't been implemented yet.
- Other things that are still needed:
- Handling of faults -- children (and by induction, their children) will die when their parent dies. Parents need to mark a child down when it dies, and do the necessary bookkeeping to back out of any current transactions with that child, and ensure that that child will be ignored for the rest of the computation.
- Startup "all at once" with no spawning model. This will be necessary for IMPI runs. This will be more software engineering than rocket science (although it won't be trivial :-\ ) -- the software has to support both models.
- Support both MPI and non-MPI models. I had some preliminary infrastructure in there for that now (i.e., configure/compile without MPI -- just support a single SMP), but I've long since broken it --
you can't compile without MPI. I likely won't fix this until after my defense...
There are 483 copies of xmms running, out of 562 total processes (85%).