Cannibalize your TTL
A SIGBOVIK paper I didn't writePosted on January 26, 2023 by Eric Hayden Campbell
These days, its rare to watch a network verification talk that doesn’t start with a cliche smattering of headlines describing “Facebook’s Latest Network Outage”, or “This One Crazy BGP Bug brought down Iran’s National Network”. This device is extremely powerful! It’s effective at convincing skeptics that Network Reliability(TM) is not already solved, It doesn’t convince people that formal methods are the right solution for network reliability.
Network Verification evangelists aren’t thinking big enough
Rather than focus on service level objectives like network reliability and availability, network engineers need to focus on the broad engineering goal of reducing packet size. The plain fact of the matter is that verification renders whole portions of the IEEE standard obsolete and network operators should feel empowered to reuse those fields for other purposes! It can even result in a 0.8% improvement in packet size efficiency!
Étude pour l’anachronisme: Time to Live
For example, in the IEEE standards, the IPv4 header maintains a so-called Time To Live (TTL) field that keeps track of how long the packet will stay in the network before it is unceremoniously thrown to the killing floor. Historically this field literally denoted wall-clock time, but in modern networks it is really a “number of hops” that gets decremented at each switch. This is reflected in the IEEE’s renaming of the field to Hop Count for IPv6. I’ll use “TTL” as a generic term for both Time To Live and Hop Count.
Perhaps the most important goal of the TTL field is to eliminate semantic forwarding loops. A semantic forwarding loop occurs when the exact same packet, with all the same header field values, arrives at the same switch (with no network configuration changes). If this occurs the packet will be forwarded identically, which will result in it being looped around to the same switch again.. and again.. and again. The TTL field prevents semantic forwarding loops by ensuring that the TTL field is always decremented. That way the same packet never traverses the same switch twice.
However, forwarding loops can be equally achieved via network verification. A plethora of tools (NetKAT, BagPIPE) can statically verify that your network is loop free. That is, no packet ever traverses the same switch twice. These tools use heavyweight (but efficient!) formal methods to guarantee(!!!) that none of the bad behaviors above can ever occur, which means that we don’t need to use or maintain the TTL field in our networks!
In fact, these formal tools do even better than the TTL field can ever hope to. Not only can they prevent semantic forwarding loops, they can also prevent topological forwarding loops, which are defined to be when the same packet reaches the same switch regardless of whether any field has changed. The TTL simply mitigates the effect of these loops by ensuring that the packet will not traverse such a loop forever. This certainly will not prevent the packet from becoming a so-called “Chernobyl packet”. A Chernobyl packet is one that triggers such an over-accumulation of broadcast and multicast traffic that switches and/or links begin to fail. Its perfectly feasible for such a traffic burst to occur before the TTL can be reduced to 0.
Formal verification tools, however, can prevent topological forwarding loops from ever happening. Even in situations where some topological loops are desired behavior, formal verification tools can permit the desired loops and prevent the undesired ones. We’ll call such a tool a “fine-grained topological loop detector”. This leads me to my first approximate thesis.
The TTL field is obsolete in Verified Networking
Specifically, I mean that the goals achieved by a the TTL field are fully subsumed by a fine-grained topological detector.
An early counterexample:
But of course I’m wrong. The
traceroute tool is an essential debugging tool in the network engineers tool kit.
traceroute well traces the route between one host and another. The way that it works is via the TTL field. It sends a collection of packets to the desired destination host with increasing TTLs. Then, when the TTL of each packet reaches 0, the current switch send an ICMP Time Exceeded error message back to the original host.
traceroute then can reconstruct the traversed packets from these error messages.
So clearly we need the TTL field for this essential task. But is it really that necessary? The clearest use case for
traceroute is when you don’t have good visibility into what the network is doing and how its forwarding your packets, for instance in a BGP network.
However, this use disappears in a software defined network, where the forwarding rules are computed in a logically centralized way, and can be easily logged and snooped on. The increasing popularity of dataplane programming languages like P4 and denotational semantics (Petr4) let you statically compute and verify routes (simulation fidelity not verified by the FDA), rendering the dynamic exploration of
traceroute obsolete in a deeply programmable SDN.
Now we can update our thesis with a few caveats
The TTL field is obsolete in Verified, Deeply-Programmable, Software-Defined Networking (VDPSDN)
(whew, good thing its still in the IEEE standard)
So how can we use this obsolescence?
On Cannibals: Use TTL for Great Good!
Since TTLs are useless for forwarding loop prevention. Lets use them to minimize our packets sizes!
Let’s consider a verified, deeply programmable, software-defined, campus network (VDPSDCN). We want to partition network traffic into three separate classes, Students, Staff and Visitors, each with ordered privilege sets. Students have at least the privileges of Visitors can and Staff has at least the privileges of Students.
To implement these traffic classes, we might use the VLAN header, which is a special header that lives between then Ethernet and IP Headers. We can simply assign different values to the 12 bit VLAN id value for each traffic class and implement our security policy by distinguishing these tags.
This is such a waste of 12 bits in a VDPSDCN! Instead, lets cannibalize the 8-bit TTL field to encode these traffic classes! This way we can save 12 bits of the 1500 bits allowed in a standard packet. This is is 0.8% savings! Wow!
This case study shows conclusively that verifying your DPSDCN will result in massively more performant networking.
TTLs are a great example of how verification and deep programmability can realize operational gains in VDPSDNs. Network verification evangelists should turn away from motivating their work with frivolous service level objectives like “uptime” or “availability”, and instead motivate their work on
NB: This is satire. The “creativity” of these “logical” steps should only be taken in a lighthearted manner. Formal specifications are extremely hard to get right and TTLs are very important as an additional line of defense when they fail