Packets in the Search for Reliability

If connections aren't reliable, then what's a poor cryptoplumber to do?

There isn't a whole lot of choice. In practice, it's either connections or datagrams, so we are left with datagrams.

The Theoretical Underpinnings

Which leads to rather a fundamental question -- if connections are unreliable and unreliable datagrams are the alternative, whatever happened to reliability?

The answer to this lies in a paradox known variously as the Coordination Problem or the Two Generals Problem. This theoretical result of computer science that says it is impossible for two parties to coordinate reliably using messages. As a corollary, no matter what protocol you design, there will always be a weakness, a flaw. That weakness might be data that is lost, or simply that we don't know what the state is, lost or not.

And this is what we see in reliable connections, in that while they are reliable in one sense -- what is read from them is good data -- they are unreliable in another sense -- we don't always know precisely what got to the other end.

Hubristic Expansion in search of Perfection

This tendency to choose one sense of a word and expand its scope to cover all other senses is common in computing. The sales ethic says there must be no shortfall, no weakness, so the "partly-reliable-connection" finds no place in our world. It simply becomes the reliable connection, and generations of programmers are taught that, with nothing more to it.

The design pattern of hubristic expansion of claims is expansively popular -- PKI secures your connections from the MITM, grid computing is the answer to the need for massive computing, smart cards are secure holders of your money, ...

... the list is boringly long. The *why* of it is perhaps more interesting, and I speculate elsewhere that this occurs in the market for silver bullets, being one where neither the suppliers nor the consumers know enough to make informed decisions. Certainly if we agree that neither users of connections and builders of connection protocol software are cognizant of their failings, the basic conditions of the pattern are met, if not the conclusion.

But philosophical whytherefores are out of scope for today's discussion. The question before us is much simpler: what should we do?

The Models

Let's review the models. There are essentially these models, in increasing triviality.

Datagrams. Send a packet, that's it.
Connection. Get a pipe, pour your data in, we'll get it there. And back.
Request-response. Send one message, get one reply back, thanks.
Stream. Send lots of messages, to same place. Nothing comes back.

Well, there might be more, but that's all I can think of for the moment. What is perhaps interesting is that all of them can be built from datagrams (such as UDP/IP), and that all of them can be layered over full connection oriented protocols (such as TCP/IP). UDP might then sit at the extreme left, being the simplest, and TCP at the extreme right, being the most complex. The models described above might all sit somewhere between the two extremes.

So for example, Streams can be simply built from datagrams, or layered over connections. Same with the Request-response model, but with one slight twist: being that the responder must know somehow how to reply.

Call Home ET

That same extra facility of needing a reply address also applies to the Connection, in that when the server end-point sends back acks and data packets, it needs to know where to send it. Setting up the context to do that is a major part of the handshaking, and this might be one reason for the popularity of TCP/IP: it handles the details of the connection between the two endpoints.

Except, datagrams carry the same information. If you program up a quick server to receive any datagram, and inspect that datagram, you find the IP and port number of the sender. It isn't as clearcut as all that.

|12.34.56.78| 1234|Call home ET|

You can send back a reply to that IP and port. This bears three comments.

Firstly, the subtelties of this are somewhat dismissed here. Your routers (all routers) do various tricks to translate the IP numbers in a protocol called NAT or Network Address Translation. We can ignore the heavy lifting here because it is much the same regardless of which model we are talking about. It's mostly done at the IP level, but it wasn't always that way.

Secondly, connections do this too, obviously, but the big difference does bear noting: Connections prove that we have an end-point because they do a little chit-chat with that remote end. That is, a SYN packet comes in, and the server sends out an ACK. If the next packet comes back with a cookie, then we are cool that we have two end-points with a shared understanding.

Thirdly, the notion of recalling the endpoints is what defines a connection, not what you do with the data. That is, the connection is the relationship of the endpoints in that they are both connected in some way. In this sense, IP is connection-less, but both UDP and TCP include connection information in them.

Yet, that's just confusing. Everyone knows that a connection is a way of passing reliable data back and forth, right? And here we see that the thing that is TCP has actually included several features, one of which is the connection. To save further confusion, this essay currently uses connection in the sense of TCP, being the all-encompassing stream, retransmissions and connection service. If I need to use the other end-point connected sense, I'll stick it in quotes like "connection."

Reducing the Request-response

Let's now model this with the request-response model. If we draw out all the packets over a request-response layered over the classical (TCP) connection with setup and tear-down, then we can reduce the whole lot to one packet going out and one coming back. The only difference between the two approaches (full and reduced) is that with a connection setup, you know you are talking to someone, and with the minimal request-response model, you can't tell the difference between talking to someone, and talking to no-one.

I suggest without any more discussion that this is a difference that isn't worth paying for, if all you are interested in doing is the simple request-response model.

Packet Ordering, Reliability, Change

So what about all those other classical benefits of the connection? They are these, loosely lumped under the rubrik of "integrity:"

Data changes
Same Data repeats
Data re-orders
Data streaming
"connection" end-point mapping.

Now we need to get into the nub of the application to answer this.

The Real Model For Applications

Let's say that we want to do reliability. Make that an assumption. So we want to send a piece of data, any piece of data.

It might be a byte, which is what happens with a keystroke. (Every time I pound these keys, connection oriented software such as SSH does a whole lot of work to get my character to an editor on a remote server somewhere.) Or it might be a file. (The whole file, where I tell my laptop to transfer this finished rant up onto a website.)

For most applications, the piece of data is discrete. In other words, we want an exact piece of data to be delivered. We want the piece of data to be delivered exactly as it is on our source machine, no changes, no fiddles.

This means that most deliveries are actually datagrams. And therefore most net software is actually datagram oriented, layered over connections. Consider:

File Copy - one file, exactly, being one huge datagram.
Terminal session - one character, one character echoed back. Note the Unix terminal semantics here that do character echo are partly an assumption about extremely fast return channels, and partly an assumption about the approximate connection semantics of copper.
Web - send one URL in HTTP, get one HTML page back. Keep doing that...
Sound and video like VoIP - send lots of small sound packets. Don't worry if some get lost, as long as most get there. Drop any old ones. Keep the ordering close, but not slavishly.

The Real Practice For Reliability

What happens when each of the above fail? Dismissively, the user tries again.

File copies aren't totally reliable because (for one example) the connection only uses smallish checksums to check the data. Large files can get byte errors in them, which partly explains the popular habit of using an MD5 or SHA1 hash over the data when distributing large software downloads.

What happens with terminal sessions? Well, the user spots the garbled character. Or perhaps the 'space' inserted to result in 'rm * .html' ... Or, she doesn't; it happens, as anyone who recalls those old sub-1k modems will testify. Is this reasonable? Maybe not, but it is accepted in the world of typing mistakes...

Web pages sometimes glitch ... and people see the blank spot where there should be some text, or an image. If our user sees that funny blank image, she can hit Retry, and magically, the image appears!

VoIP is even easier; you can drop all packets, and the program is still working. It's just that the user can't hear anything, as a consequence of quality slipping down to 0%.

What have all the above got in common? In essence, there is another reliability layer, one hitherto secret and denied. It's the user -- she will check the results and rectify manually, or suffer the consequences.

The Real Need For Reliability

But wait, let's contrast this with somewhere where we really need reliability. The financial transaction! Surely we want good solid connection reliability there?

We certainly want reliability but consider what a transaction is. It's a single packet that says "do this transaction." It includes all the data, and it includes all the authority. This packet goes winding its way through the financial system until it gets to its destination, at which point it is proven as total and correct, and then enacted.

So what happens if it is dropped? Well, the user gets no confirmation, and retries. What happens if it is doubled up and sent twice?

Financial transactions are used to this too. Each transaction should bear a user-designated unique number. If the user can't do that, the software should do it for her. Then, each transaction request is checked for repetition against that number.

Do you recall those old poorly designed web sites that say "don't hit the BUY button twice?" What's going on here is that neither the user nor the web site software have created a unique number. When BUY is hit twice, the software will dutifully send two transaction requests, and two widgets will be dutifully purchased. Worse, some software gets a bit confused about its responsibilities and resends the whole transaction request, due to spurious events. It's not even fair to blame the user for this.

This issue is so common that pretty much all Internet transaction software deals with resends automatically, it's part of the requirements. If there is a sense that a connection is relied upon to stop the resends of transaction requests then that financial institution has a big problem to deal with; the basic infrastructure is not sound enough for the job and no connection will fix that.

By way of example, the credit card infrastructure has software that analyses and drops duplicates, because the people at the other end know that the sending software is unreliable. (Just to restate, that's hardly a reliable fix, the real fix is that the sending software should create a unique number at the beginning of the session and stick it in the transaction data. But credit card infrastructure is a copy-book case in how not to do it.)

The Credit Card ...

A financial transaction request starts with a datagram. It happens to be one we really want to not interfere with, so we need a decent checksum (a hash does nicely) and we need a decent authentication (maybe a PKI signature, maybe not) but these are all issues outside the basic datagram semantics. What happens when this is layered over a connection is simply more unreliability, as different concepts of reliability compete with each other and end up lowering each other's end delivery.

So what does that old workhorse, the credit card transaction over SSL gain us? Not a lot. Over the connection it gains cryptographic protection for replay and other character manipulations, but these are lost as soon as the transaction is extracted and passed on. As by far the bulk of processing is done once the CC details leave the website, this is no small issue.

Some will point out how SSL includes strong authentication, but that's a chimera. Firstly, we are not interested in authentication, but authorisation. That exact transaction needs to be approved, and that's not possible using just an SSL connection because the data is co-mingled. Secondly, and further, the authentication is easily spoofed in practical systems (think phishing) and the emerging MITB threat means that good transactions are happily mixed with bad transactions.

Calling a financial transaction a datagram protocol doesn't solve these problems; but it does highlight that certain issues are done at a higher layer, not within the lower security layers. That is, replay protection is accomplished by a unique number at the application level, and reliance on good data delivery (so the number can't be changed). The lower layer can and should provide a correct data delivery.

Anything else?

If we look critically at all the communication applications out there, we discover that most of them are neither pure datagram nor pure connection. They are somewhere in between; datagrams but needing some higher layer features added by the full suite found in TCP.

We can choose to consider them datagrams, in which case we have to add in the specific features that we need, ourselves; Or, we do it all over connections, and put up with the one-size-fits-all approach.

Which to choose may come down to motives such as our original requirement -- reliability. If we really want reliability, connections don't cut the mustard. For real reliability, we must construct our own protocols using precisely the features required. For anything else, we might compromise on connections.

If our requirements are exacting, not compromising, then datagrams is the way to go. We get closer to the metal, we use more of the die, we actually get to tune the app to the net, not the other way around.

The Middle Ground - adding new protocols

UDP and TCP are only the most famous of the "transport-layer" protocols, and the construction of internetworking, or TCP/IP in general, was that it was relatively easy to add new protocols. In part due to the frustrations outlined herein, there are some other protocols [1]:

DCCP - Datagram Congestion Control Protocol - "is a transport protocol that provides bidirectional unicast connections of congestion-controlled unreliable datagrams. DCCP is suitable for applications that transfer fairly large amounts of data and that can benefit from control over the tradeoff between timeliness and reliability [2]."
SCTP - Stream Control Transmission Protocol [3]."
TICP - Transparent Inter Process Communication - "is designed for use in clustered computer environments ... [for] applications that can communicate quickly and reliably with other applications regardless of their location within the cluster [4]."

Are these helpful? Perhaps, but consider the common difficulty: they all require wide distribution of the code into the Internet infrastructure before you can reliably use it. Given that the rise of the Internet was practically determined by how fast we could spread TCP/IP through the world, this is no small issue.

As a practical here and now, that leaves us with the two earlier choices: use TCP or UDP.

Why didn't this happen before...

It is somewhat interesting to look at why we didn't go this path before. One reason is the above pattern of hubristic expansion of claims. Another is that connections are simply easier to work with in the beginning, and what you don't know about the costs of later phases won't hurt you.

Another reason is that once people get used to working with connections, they find it hard to go back; same with software, which deeply embeds connection-oriented thinking within.

Another answer is that we have already travelled this path in the world of VoIP. In that area, UDP is in widespread use.

So... how do I do it?

"I'm sold! Where do I sign?"

It's very expensive to rewire existing designs, so perhaps that is not the best starting point. It's also somewhat expensive more expensive to use datagrams in early demos than it is to use connections; but it's much cheaper later on.

We can probably assist by documenting some patterns. That's really the subject of another essay.