ReFresco 04: Ordering in Caprice ================================ Caprice forms a distributed system, and as in all distributed systems, it's important to define ordering semantics -- the rules that constrain when messages will arrive, in what order, etc. The Model --------- So, here's the caprice network picture, a bunch of clients clustered around a hub: B C D \ | / A -- S -- ... A, B, etc. are clients, S is the server. When we say client, of course, we don't really mean that in the normal sense; we don't care if A is a single program, a collection of distributed programs, another caprice server doing something weird and clever, or if one program has multiple connections open at once, etc. None of that's visible to the network world. Really, we don't want to talk about clients, but about connections; and for these purposes, by connection we mean a bidirectional order-preserving stream of packets. The Problem ----------- There is no global order on events in the system; trying to impose one would be futile and purposeless. On the other hand, we do need to impose some constraints. Say I have a database handle of some sort, and I do ... ... send_msg(handle, "COMMIT") send_msg(handle, "CLOSE") It would be bad if these arrived out of order; I might end up closing my database connection before committing and thus lose all my work. So, one might think, the important thing is that two messages sent to the same cap by a single connection should arrive in order. But there are also more subtle worries. Suppose instead of closing my database connection above, I want to hand it over to Alice to do some other work. So I do ... ... send_msg(handle, "COMMIT") send_msg(Alice, handle) and then Alice does send_msg(handle, "do something") // Whoops, Alice realizes that was a mistake send_msg(handle, "ROLLBACK") Now, my two messages are sent to two different caps, so the above rule doesn't apply: Alice may get the handle before the handle gets my COMMIT. And then, my message to the handle and Alice's message to the handle are done on different connections, so again the rule doesn't apply and Alice's ROLLBACK may arrive before my COMMIT, causing my work to be rolled back along with hers. This is an example involving correctness, but there are security implications too; it gets even worse if Alice is actively malicious. More generally, this kind of issue can arise with any stateful cap. If I change a cap's state and then pass it to another party, I expect that party to receive a cap that acts like it has the new state, not the old. Or, as the E folks put it, I don't want to send Alice a reference, I want to send Alice a reference-that-has-seen-my-messages. (For another more complete look at these issues, see the E designers's discussion, "Partially-Ordered Message Delivery", at http://www.erights.org/elib/concurrency/partial-order.html Their solution is rather more complicated, both because they assume a full distributed topology (as opposed to our constrained hub-and-spokes model), and because they have reference equality as a primitive.) The Solution ------------ So, we need something secure and correct, that's still plausibly efficient, and it would sure be nice if we could find something that was also simple and intuitive to reason about. We can! Here's the rule for client authors: *) Packets sent down a connection to the server will be processed in strictly the order they were sent. I.e., none of the ordering problems above arise. It's not quite literally true in practice. Let's say I'm A, and I send packets to B and C. send_msg(B, msg1) send_msg(C, msg2) The rule (*) doesn't guarantee that msg1 will be received before msg2; what it guarantees is that as a client author, I can't tell the difference. (At least from within caprice -- if B and C decide to compare timestamps or something, that's their own affair.) More formally, let's analyze things from the point of view of client A. First, associate a counter with each client, and with each packet. The idea is that these counters will represent time as perceived by A. (Note that this counters are not actually stored anywhere, nor appear in the implementation at all; they are purely an analytical tool.) In particular, denote a client X's counter by Xc, and a packet P's counter by Pc. When X sends a packet P, Pc is set to Xc. When X receives a packet P, we check whether Pc > Xc, and if so, set Xc to Pc. All counter starts at 0. A follows different rules; A sends its first packet with counter 1, its second with counter 2, etc. The real formulation of (*) can now be stated: %) Whenever B receives a packet P directly from A, Bc < Pc. This will all be clearer, I think, with an example. Let's take the two I used above, say that the handle lives in Holly's address space, and Alice is still the third party. In the first: When the packets come in the right order, Holly bumps her counter on each incoming packet, and (%) is never violated: Me Holly Counter old Counter new send packet 1 0 0 send packet 2 1 1 receive packet 1 0 1 receive packet 2 1 2 If they come in the wrong order, though: Me Holly Counter old Counter new send packet 1 0 0 send packet 2 0 0 receive packet 2 0 2 receive packet 1 2 <--------------- !!! we see that (%) is violated on the marked line; the incoming packet's number is lower than Holly's current counter. In the second example, with three parties, the correct behavior is: Alice's counter Holly's counter Me Alice Holly old new old new send packet 1 0 0 0 0 send packet 2 0 0 0 0 get packet 1 0 0 0 1 get packet 2 0 2 1 1 send packet 3 2 2 1 1 get packet 3 2 2 1 2 My first packet carries the counter number 1 and bumps Holly's counter; then I send my second packet to Alice and bump her counter to 2. Alice's message to Holly then propagates the value 2, and Holly goes from 1 to 2. The incorrect behavior would be: Alice's counter Holly's counter Me Alice Holly old new old new send packet 1 0 0 0 0 send packet 2 0 0 0 0 get packet 2 0 2 0 0 send packet 3 2 2 0 0 get packet 3 2 2 0 2 get packet 1 2 2 2 <--- !!! Here Holly has already been influenced by my second packet when the first arrives; this is impossible by (%). Essentially, (%) says "from the perspective of each client, time must always flow forward". Implementation -------------- Having a nice invariant is good and all, but it's not much use if we can't implement it. Fortunately, there's a relatively simple implementation: the server must simply guarantee that for each packet: 1) if the server is not the final destination, the server must queue the packet for the real destination synchronously with respect to the originating connection. 2) if the server is the final destination, the server must handle that packet synchronously with respect to that connection (i.e., the server can't handle the next packet until the first one is handled). (Note that in the latter case, "handle synchronously" does not necessarily mean that the packet has to be handled to completion; it simply means that the message is delivered to the appropriate object, and the object has either finished the work _or_ made the decision to delay it. In either case, the network layer's job is done.) This works because: Case 1) when the server receives a packet P destined for client C, it is the first packet to have counter >= Pc. Therefore, Pc > Cc, and in for all packets Q queued for C, Pc > Qc. So when P arrives at C, Cc < Pc, and (%) is satisfied. Case 2) when the server receives a packet P destined for itself, it is the first packet to have counter >= Pc, therefore the server's counter is < Pc, therefore handling it immediately satisfies (%). This covers the basic situation; there are a few more places where wrinkles arise. Wrinkles -------- Fragments: A 'fragment stream' is a string of packets that together make up a larger message. Within a fragment stream, each packet is either the last packet, or not. Here, we ignore the non-last packets: for purposes of satisfying (%), the message as a whole is considered to have been sent exactly when the last packet in the stream is sent. Local call optimization: The above implementation analysis assumes that all packets will pass through the server. For efficiency reasons, however, clients may wish to bypass the server when sending packets whose destination is in their own address space. This is acceptable iff the client is consistent about it -- either all packets must go through the server, or none may. Of course, locally destined packets must be process synchronously in the sense described above. Proof: obviously, if all packets go through the server, the above proof applies. So let us assume that no locally destined packets go through the server. This means that whenever we increase our counter to n, we have already sent out a packet with counter n, and since we process local packets synchronously, we have already received all local packets with counter < n, therefore increasing our counter cannot cause (%) to be violated. Promises Promises are a bit funny. As far as the network layer is concerned, the promise object itself is the ultimate destination for a message, and delivering the message to the promise completes its responsibility. As far as caprice knows, any further forwarding creates an entirely independent message. This is actually what we want; consider a situation where I have an unfulfilled promise A and a cap B, and do send_msg(A, "foo") send_msg(B, "bar") // ... a long time later: send_msg(A, fulfill(B)) The last line will cause A to forward on my first message "foo" to B, and if promise forwarding was not considered to create a new message, (%) would here require "foo" to arrive before "bar" could. This, though, is generally undesirable, and impossible to implement besides. So (%) is too strict to apply to promises; we still need some sort of ordering semantics for them. The rule is the following: #) All messages sent to the _same_ promise will be eventually processed in an order consistent with the rules for other caps. But: there is no guarantee on the relative ordering of messages sent to different caps, even if those caps ultimately refer to the same object. Note that we do not distinguish between fulfilled and unfulfilled promises here. One might think from my example that one could provide stronger guarantees for fulfilled promises. This is wrong, though, because of the local call optimization: if we said that sending a message to a fulfilled promise was exactly the same as sending a message to the underlying cap it points to, then it would be difficult or impossible to meet the local call optimization's consistency requirement. This is why (#) does not care about fulfilledness.