« Are you going to cry right here, or run to the bathroom? | Main | What we need around here is an anti-whining ordanance! »

Look Dave, no strings.

Ugh. I've spent the past few days fighting the return semantics of rsh and ssh.
In trying to make the tree-based booter industrial strength by putting it into LAM, I found out that not all rsh implementations are created equal. Grrrr...

It seems that some versions of rsh pretend to close stderr, but will in fact actually send things across it later. i.e., read() will return 0, but then will later return a positive number and have valid bytes in the buffer.

ARRGHHH!!

There's also some mysterious things happening that I don't fully understand yet (this only happens when you scale to above 25 nodes or so). So I finally decided that if rsh cannot be trusted, the whole framework in LAM for generic remote-launching is wrong. i.e., the whole issue is about determining if the remote program started successfully or not. How to do this in a programtic fashion?

It currently goes like this (and rsh can be replaced with ssh or whatever):

  1. Open two pipes
  2. fork() a child process
  3. Close the respective pipe ends in the parent and child processes
  4. Tie the pipes to the stdout and stderr in the child process
  5. The child exec() the rsh command command
  6. The parent watches the pipes:
    • If something comes across stderr, our heuristic says to abort
    • It something comes across stdout, buffer it
    • When stderr and stdout close, the child is done, quit the loop
  7. The parent calls waitpid() to wait for the child to die
  8. If the return status of the child is not 0, abort

If we incorrectly determine that a remote program failed to start (i.e., it actually did start, but the local node thinks it didn't), the remote program gets stranded, and is left running forever because no one will ever contact it again. Among other reasons why this is bad, this is anti-social behavior.

Plus, the code is complicated as well because of the statefull nature it has to maintain while checking multiple data sources in a non-blocking way. Ugh. And I didn't even mention how we have to check and see if the other side is running a Bourne or Korn shell...

The long and the short is that the remote agent (rsh, ssh, whatever) cannot be trusted to give reliable information. So the only thing to do is to disregard the information that it gives and determine if the remote program started correctly by a different means. One way to do that is to have the remote process call the spawning process back with a TCP socket.

If the remote process doesn't call back within a timeout period, the spawner can reason that it failed and give up on it. If the remote process starts up properly and is unable to contact the spawner (perhaps it took a long time to start, and the spawner has timed out already), it will just abort. This prevents orphaned remote processes.

Specifically, I'm looking at something like:

  1. Parent creates listening socket for the callback
  2. Parent launches a thread to wait for the callback on that socket
  3. Parent makes three pipes (for stdin|out|err)
  4. Parent fork()s a child
  5. Parent closes appropriate ends of the pipes
  6. Parent launches two threads to monitor the pipes
  7. Parent launches a thread to block on waitpid()
  8. Child closes appropriate ends of the pipes, ties the other ends to stdout|err
  9. Child exec()'s the remote agent
  10. Parent blocks on a queue

    • When either of the pipe threads wake up on a read, they buffer the data and put it in an event and queue it up for the parent
    • Closing either of the pipes is similar -- an event is queued up for the parent followed by the thread committing suicide
    • When waitpid() returns, the return status is queued up in an event for the parent, and the thread commits suicide
    • When the listening thread succeeds on accept(), it begins the authentication/connection protocol. Upon success, it queues up an event for the parent (including the open socket file descriptor) and commits suicide.

  11. When all the threads die, it means that the remote process has started up, the remote process has authentications and indicated that it wants to run, a socket is still open to the remote process, the remote agent is now dead, and all threads/processes have been reaped, so the parent can now continue.

In the previous scheme, the remote agent would launch the remote program. The remote program would immediately close stdin|out|err and then fork a child into the background as a user-level daemon, and then quit. This would allow the remote agent to finish normally (hah!). The child process would then continue on to do whatever it was launched to do.

In the new scheme, there is no need to have the remote agent finish until the callback to the spawner has completed and there is no more gain to having the remote agent process around anymore. i.e., in the previous (linear) scheme, it was necessary for the remote agent to quit before the next step would proceed (wait for a callback). In this scheme, they are independent events -- the remote agent quitting has little bearing on the callback since those are in different threads. Indeed, it may be advantageous to have the remote agent stick around until the callback occurs successfully to give one more way to abort the remote process if something goes wrong. That is, if something goes wrong and the callback gets mucked up, send a signal or some kind of message down the stdin pipe to the remote agent, which will get passed to the remote process that will cause the remote parent and child to abort.

Additionally, just like giving each remote process a thread to manage it, giving a thread to each of the stdout and stderr pipes eliminates the combined state machine and uses blocking reads. This makes the algorithm for monitoring the pipes much simpler. Hence, we can monitor the pipes, waitpid(), and the callback separately, and therefore greatly simplify the code (why didn't I think of this earlier?).

Jeff's law of non-blocking:

writing blocking algorithms is much simpler than writing non-blocking algorithms.

Jeff's law of blocking:

writing concurrent blocking algorithms introduce their own problems, but generally only in terms of infrastructure, and are typically problems that are already solved.

What's even cooler is that the remote process can startup, call back the spawner, and give a "I'm ready to go" message, or "things suck over here; I can't run so I'm going to abort" message. i.e., the remote process can decide whether it's going to run or not (e.g., check to see if the load is not too high) and send back a yay or nay to the spawner. Even cooler than that -- an integrated startup protocol allows for authentication instead of security through obscurity (security though obscurity isn't, for those of you who care!).

I'm currently in the middle of re-writing all this code (it takes time to setup the infrastructure and whatnot). The result should


xmms currently has 619 of 703 processes on queeg (88%).

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on February 19, 2001 11:46 AM.

The previous post in this blog was Are you going to cry right here, or run to the bathroom?.

The next post in this blog is What we need around here is an anti-whining ordanance!.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34