Skip to main content
  1. Blog/

traceroute breaks the time-space continuum!

A bit of head-scratching over traceroute Round Trip Times (RTTs)

What’s traceroute? #

traceroute is UNIX tool1 that attempts to trace packets… Humm, the man page explains it beatifully:

This program attempts to trace the route an IP packet would follow to some internet host by launching probe packets with a small ttl (time to live) then listening for an ICMP “time exceeded” reply from a gateway. We start our probes with a ttl of one and increase by one until we get an ICMP “port unreachable” (or TCP reset), which means we got to the “host” […]

The ICMP Time Exceeded messages it refers to are:

IP versionICMP typeICMP codeRFC
4 (ICMP)type=11 Time Exceeded Messagecode=0 time to live exceeded in transitRFC792
6 (ICMPv6)type=3 Time Exceeded Messagecode=0 Hop limit exceeded in transitRFC4443

traceroute, uses UDP packets to probe the network by default, but it can optionally use ICMP(-I) or TCP (-T) instead.

Tracing to google.com 🗺️ #

OK, let’s do it:

traceroute -n google.com

Results come shortly after:

traceroute to google.com (142.250.200.206), 30 hops max, 60 byte packets
 1  192.168.1.1  3.550 ms  3.424 ms  3.346 ms
 2  81.46.38.147  20.595 ms  20.536 ms  20.475 ms
 3  81.46.34.125  79.685 ms  79.602 ms  79.540 ms
 4  * * *
 5  * * *
 6  * * *
 7  81.173.106.65  26.469 ms  15.869 ms  15.664 ms
 8  74.125.245.171  17.053 ms 108.170.225.251  14.295 ms  17.687 ms
 9  108.170.255.26  27.739 ms 192.178.110.134  16.176 ms 142.251.76.110  13.844 ms
10  108.170.252.173  19.309 ms 64.233.175.63  19.250 ms 108.170.252.173  15.359 ms
11  172.253.65.53  40.944 ms  42.708 ms 142.251.79.237  42.634 ms
12  72.14.234.190  44.364 ms 142.251.254.74  43.244 ms 192.178.85.180  44.738 ms
13  192.178.105.215  43.094 ms  42.461 ms  44.523 ms
14  209.85.243.145  44.422 ms  42.834 ms 209.85.143.123  44.361 ms
15  142.250.200.206  44.262 ms  42.788 ms  40.920 ms

Wait… what’s going on with the RTTs? 🤔 #

Something’s wrong… How come the Round Trip Time (RTT) to 81.46.34.125, the 3d hop, is ~4x higher than the one to 108.170.252.173, the 10th hop? ic_fluent_gif_24_filledCreated with Sketch.

RTTs
Packets to hop 10 time-travel! 🤯

Is traceroute not reporting the real RTT? #

According to the man page, it should:

Three probes (by default) are sent at each ttl setting and a line is printed showing the ttl, address of the gateway and round trip time of each probe.

Are different probes following different paths? #

Ha! Maybe… let’s see.

When probing using UDP with the default options, traceroute sends packets with an unbound UDP source port, and a monotonically increasing destination port:

23:31:27.128545 IP 192.168.1.25.55532 > 142.250.200.206.33434: UDP, length 32
23:31:27.128578 IP 192.168.1.25.59254 > 142.250.200.206.33435: UDP, length 32
23:31:27.128596 IP 192.168.1.25.50618 > 142.250.200.206.33436: UDP, length 32

If intermediate routers have multiple next hops for the route matching 142.250.200.206, known as Equal Cost MultiPath(ECMP), that could certainly be an issue.

ECMP: after a successful lookup in its routing table, the router will choose one of the N possible next hops from the matched route. It will base its decision on a hash function applied to certain values of the packet. The most common configuration is to use the 5-tuple hash (IP src, IP dst, protocol, L4 src port, L4 dst port). The router will do a modulo N operation over the hash to select the index of the next hop to forward the packet to. This ensures packets belonging to the same flow follow the same path towards a destination (in absence of reconvergences), and are not reordered. The TCP state-machine is no fan of re-orderings…

traceroute actually tells us that there is ECMP in parts of the network. Probes to certain hops have more than one IP address, e.g. at hop 8 we see two different routers, 74.125.245.171 for probe #1, and 108.170.225.251 for probes #2 and #3:

 8  74.125.245.171  17.053 ms 108.170.225.251  14.295 ms  17.687 ms

Ok, let’s use TCP probing instead, which allows us to fix the destination 2 and the source port:

sudo traceroute -T --sport=35000 -n 142.250.200.206
23:53:10.410524 IP 192.168.1.25.35000 > 142.250.200.206.80: Flags [S], [...]
23:53:10.415993 IP 192.168.1.25.35000 > 142.250.200.206.80: Flags [S], [...]
23:53:10.418776 IP 192.168.1.25.35000 > 142.250.200.206.80: Flags [S], [...]

The results:

traceroute to 142.250.200.206 (142.250.200.206), 30 hops max, 60 byte packets
 1  192.168.1.1  5.419 ms  2.732 ms  2.773 ms
 2  81.46.38.147  43.087 ms  9.349 ms  24.899 ms
 3  81.46.34.125  15.319 ms  15.933 ms  14.927 ms
 4  * * *
 5  * * *
 6  * * *
 7  81.173.106.65  13.715 ms  37.831 ms  14.513 ms
 8  142.251.231.147  14.288 ms  13.896 ms  14.388 ms
 9  142.250.56.72  16.127 ms  16.108 ms  14.951 ms
10  209.85.242.91  15.183 ms  14.902 ms  14.363 ms
11  172.253.65.53  70.040 ms  43.351 ms  43.109 ms
12  * 142.251.254.74  43.128 ms  40.690 ms
13  192.178.105.161  40.901 ms  40.438 ms  40.415 ms
14  209.85.143.123  40.875 ms  40.392 ms  40.481 ms
15  142.250.200.206  40.234 ms  44.056 ms  41.273 ms

As you can see, the packet follows a different path. This (re)confirms that there is ECMP in the network (or that it has reconverged).

But the RTT to the 8th hop is still lower than for instance the 3d hop! Moreover, the RTT to 81.46.34.125 is now significantly lower than in the previous traceroute.

Asymmetric routing? #

IP routing (in general) is not symmetric. The path from A → B at a given point in time might not be the same as B → A. Could it be that the path back from 81.46.34.125 specifically was different and way slower than the rest?

It could, but observations don’t seem to point to that. Doing a series of traceroutes to the same destination using UDP gives us a wide-range of RTTs to 81.46.34.125 from “as low” as 24ms all the way to… 144ms! If there was a problem on the way back, RTTs should be reasonably stable.

Are we missing something fundamental here…? #

There is something else that is bothering me here; RTT times don’t seem to correlate with the distance packets travel.

I am based in 📍Barcelona (Catalonia, Spain). The IP 108.170.252.173, the 10th hop of the very first traceroute, is announced by AS15169 (Google). I couldn’t reliably geo-localize this IP (3 different webs gave me east, center and west USA…), but assuming for a moment it is the center of USA 📍Wichita(Kansas, USA), one of the locations given, that is some 8.000 kms in straightline from Barcelona. The propagation time, assuming neglible (many) other factors, would be:

$$ RTT = 2 * 8,000 kms / {\frac{299,792 kms}{1 sec}} =~ 53.4 ms $$

(OK, that IP is probably closer to Europe than that 😅)

The 3d hop, 81.46.34.125, which is reasonably well geolocalized to the beatiful city of 📍Madrid (Spain), some 600 Kms away from Barcelona (or ~4ms), gives almost 1.5x the RTT to the middle of USA.

We seem to be measuring something more than purely the RTT, something that sets traceroute RTT measurements way way off.

And the answer is… #

We are also measuring the time the router takes to generate and send the ICMP packet back (processing time). And, due to the hardware architecture of many routers, this is usually NOT done in the Forwarding Plane, but rather in the Control Plane:

TTL=0 packets trigger an exception in the Forwarding Plane, and the offending packet is sent to the Control Plane. Packets destined to the Control Plane are heavily rate-limited and QoS enforced, and can even be lost. In fact, this is likely what happened to the first probe of the 12th hop using TCP:

12  * 142.251.254.74  43.128 ms  40.690 ms

At the same time, the Control Plane might postpone the low-priority task of generating the ICMP time exceeded packet if it has more urgent things to do.

So what we are seeing here is that the router 81.46.34.125 is much slower in generating ICMP messages than the routers in hops 4th-10th.

Case closed! 🥳

Bonus: is it even possible to measure RTTs without involving the Control Plane? #

Yes, sometimes.

A good example is Bidirectional Forwarding Detection. BFD, under a set of very specific conditions, can reliably estimate the RTT at the dataplane level with a L3 adjacent neighbour by using the Echo mode.

In this mode, the router injects BFD packets to the interfaces where it expects BFD peers, with the source and destination IPs set to its own IP address. This forces the directly connected peer router to hair-pin the traffic back to the ingress interface. The packet is sent back directly from the Forwarding Plane, and is a proof that the link is working. In addition, a timestamp can be added when sending the packet, so that the RTT can be reliably calculated when the packet is echoed without the need to keep clocks in tight synchronization between devices.

The echo mode only works for directly connected peers, though (one IP hop away).

Clever, isn’t it? ic_fluent_gif_24_filledCreated with Sketch.

Next #

That… was fun! I hope you liked it too. 😀


  1. Windows has a similar tool called tracert↩︎

  2. why is that not possible with UDP probing 🤷..? Good question. Next one? (👋 Andy) ↩︎