Skip to main content
  1. Tutorials/
  2. Computer Networks/
  3. pwru/

Getting started with pwru

An amazing (❤️) eBPF tool to trace network packets within the Linux Kernel.

What’s pwru? #

pwru logo
pwru logo (credit: Renee French, Vadim Shchekoldin).

Packet, where are you?, pwru - pronounced ‘Peru’ - is an 🐝 eBPF tool written in golang and C that traces network packets (skbs) traversing the Linux Kernel networking stack. It uses Kernel probes to attach to the relevant kernel functions and intercept packets.

pwru was originally developed by the Cilium project to help developers (and users) debug Cilium itself, but its utility goes far beyond Cilium.

What can I do with it? #

  • Debugging packet drops (e.g. iptables/nftables, checksums, MTU, routing, RPF…).
  • Debugging eBPF programs.
  • Troubleshooting complex networking setups (e.g. K8s CNIs, docker networks, multiple network NSs in general..).
  • Profiling / identifying bottlenecks (using --timestamp).

Installing it #

You can download a pre-packaged self-contained golang binary here:

Download from Github

Tracing your first flow 🚀 #

pwru uses pcap filter syntax to determine which skbs to trace.

Let’s start by capturing all ICMP traffic towards 8.8.8.8:

sudo pwru 'host 8.8.8.8 and icmp'

Output:

2025/09/08 20:55:48 Attaching kprobes (via kprobe-multi)...
1475 / 1475 [--------------------------------------------------------------] 100.00% ? p/s
2025/09/08 20:55:48 Attached (ignored 0)
2025/09/08 20:55:48 Listening for events..
SKB                CPU PROCESS          NETNS      MARK/x        IFACE       PROTO  MTU   LEN   TUPLE FUNC

Sending a ping to 8.8.8.8:

ping 8.8.8.8 -c 1

should result in a trace similar to this one:

pwru output for a single ICMP req/reply
SKB                CPU PROCESS          NETNS      MARK/x        IFACE       PROTO  MTU   LEN   TUPLE                             FUNC
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0               0         0x0000 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0               0         0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0               0         0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) ip_output
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) apparmor_ip_postroute
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) __ip_finish_output
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  84    192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7f0331f800 0   ~/bin/ping:18515 4026531840 0          wlp0s20f3:3    0x0800 1500  98    192.168.232.62:0->8.8.8.8:0(icmp) skb_push
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  118   192.168.232.62:0->8.8.8.8:0(icmp) kfree_skbmem
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) inet_gro_receive
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) skb_defer_rx_timestamp
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) ip_rcv_core
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) nf_ip_checksum
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) __skb_checksum_complete
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) ip_route_input_noref
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) ip_route_input_slow
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) fib_validate_source
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  84    8.8.8.8:0->192.168.232.62:0(icmp) __fib_validate_source
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 84    8.8.8.8:0->192.168.232.62:0(icmp) ip_local_deliver
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 84    8.8.8.8:0->192.168.232.62:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 84    8.8.8.8:0->192.168.232.62:0(icmp) ip_local_deliver_finish
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) ip_protocol_deliver_rcu
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) raw_local_deliver
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) icmp_rcv
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 56    8.8.8.8:0->192.168.232.62:0(icmp) ping_rcv
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 56    8.8.8.8:0->192.168.232.62:0(icmp) skb_push
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) skb_clone
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) __ping_queue_rcv_skb
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) sock_queue_rcv_skb_reason
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) sk_filter_trim_cap
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) security_sock_rcv_skb
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) apparmor_socket_sock_rcv_skb
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) bpf_lsm_socket_sock_rcv_skb
0xffff8b7f0331f600 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) __sock_queue_rcv_skb
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) consume_skb
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) skb_release_head_state
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  64    8.8.8.8:0->192.168.232.62:0(icmp) skb_release_data
0xffff8b7f0331f800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  64    8.8.8.8:0->192.168.232.62:0(icmp) kfree_skbmem
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) __sock_recv_timestamp
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) ip_cmsg_recv_offset
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) skb_free_datagram
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) consume_skb
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 65536 64    8.8.8.8:0->192.168.232.62:0(icmp) skb_release_head_state
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 0     64    8.8.8.8:0->192.168.232.62:0(icmp) sock_rfree
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 0     64    8.8.8.8:0->192.168.232.62:0(icmp) skb_release_data
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 0     64    8.8.8.8:0->192.168.232.62:0(icmp) skb_free_head
0xffff8b7f0331f600 4   ~/bin/ping:18515 4026531840 0               0         0x0800 0     64    8.8.8.8:0->192.168.232.62:0(icmp) kfree_skbmem

Digesting the output… #

pwru traces Kernel functions traversed, one at time. __ip_local_out is the first function that receives an skb matching the filter:

SKB                         TUPLE                             FUNC
0xffff8b7f0331f800  [...]   192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out

The skb address is 0xffff8b7f0331f600, and is a direct result of a call to sendto()’s syscall by the ping userspace program.

Note: With few exceptions - when traversing bridges and during packet replication (e.g. multicast) - skb ids are preserved until the packet is either transmitted or dropped.

The packet follows the Linux kernel “output path” (simplified):

The MTU and LEN columns #

Some readers will have spotted two interesting things:

  • MTU column values are sometimes large (65536). MTU has no meaning on the RX path. The large value is a result of Generic Receive Offload.
  • The length of the packet changes after ip_finish_output2(). As discussed above, neigh_output() increases 14 bytes the packet length, as the next hop is directly connected (via an VLAN untagged interface):
MTU   LEN   TUPLE                             FUNC
1500  84    192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output2
1500  98    192.168.232.62:0->8.8.8.8:0(icmp) __dev_queue_xmit

Detecting MTU / fragmentation issues #

Let’s now send a large packet, much bigger than the MTU of the egress device:

ping 8.8.8.8 -c 1 -s 1800
pwru output for a single ICMP req > MTU
SKB                CPU PROCESS          NETNS      MARK/x        IFACE       PROTO  MTU   LEN   TUPLE                           FUNC
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0               0         0x0000 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) __ip_local_out
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0               0         0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0               0         0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_output
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) apparmor_ip_postroute
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) __ip_finish_output
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_do_fragment
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_init
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1500  192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_prepare
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0               0         0x0000 0     348   192.168.1.39:0->8.8.8.8:0(icmp) ip_copy_metadata
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1500  192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7fe482d100 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  1514  192.168.1.39:0->8.8.8.8:0(icmp) skb_push
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  348   192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7fe482c800 5   ~/bin/ping:94539 4026531840 0          wlp0s20f3:3    0x0800 1500  362   192.168.1.39:0->8.8.8.8:0(icmp) skb_push
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7fe482d100 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  1542  192.168.1.39:0->8.8.8.8:0(icmp) kfree_skbmem
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7fe482c800 0   ~147-iwlwifi:667 4026531840 0          wlp0s20f3:3    0x0800 1500  390   192.168.1.39:0->8.8.8.8:0(icmp) kfree_skbmem

ping doesn’t set DF=1, and the L3 packet size is 1828, much bigger than 1500, ip_finish_output() fragments the datagram by invoking ip_do_fragment(), which clones the skb:

MTU   LEN   TUPLE                           FUNC
1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_do_fragment
1500  1828  192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_init
1500  1500  192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_prepare
0     348   192.168.1.39:0->8.8.8.8:0(icmp) ip_copy_metadata
  • 0xffff8b7fe482d100: original skb, with the first fragment of 1500 bytes (1514 with L2).
  • 0xffff8b7fe482c800: last fragment of 348 bytes, 20 bytes for the IP header, 328 of data (a total of 362 with L2).
Note: For security reasons 8.8.8.8 drops any received IP fragments, so ICMP requests are never answered.

Now let’s force DF=1:

ping 8.8.8.8 -M probe -c 1 -s 1800

Output:

PING 8.8.8.8 (8.8.8.8) 1800(1828) bytes of data.
ping: sendmsg: Message too long

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

The result is… nothing! The kernel knows the effective MTU towards 8.8.8.8 is too low, so it returns -EMSGSIZE to the sendto() syscall.

Filters and --filter-track-skb #

pwru will capture the packet as soon as the pcap filter matches the skb.

By default pwru will stop tracing the skb once the filter doesn’t match anymore. This may happen, for instance, as a result of a NAT transformation or a tunnel encapsulation.

--filter-track-skb instructs pwru to, once captured, continue to trace the skb until it’s returned to the pool, either transmitted or dropped/discarded.

As in any tracing / capturing tool (e.g. tcpdump), pwru can affect performance. Try to reduce the scope of the filter to the minimum necessary.

Tip: as with any tracing or packet-capturing tool (e.g., tcpdump), pwru can impact performance. Try to keep your filter as narrow as possible to minimize the overhead.

Debugging netfilter drops #

Let’s now add an nftables rule to drop this flow:

nft add table inet filter
nft add chain inet filter output { type filter hook output priority 0 \; }
nft add rule inet filter output ip daddr 8.8.8.8 icmp type echo-request drop
Equivalent iptables rule
iptables -I OUTPUT -p icmp -d 8.8.8.8 -j DROP

The result is that pwru now shows that the skb is dropped at nf_hook_slow(), a part of the Netfilter subsystem, with the reason SKB_DROP_REASON_NETFILTER_DROP:

TUPLE FUNC
192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out
192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
192.168.232.62:0->8.8.8.8:0(icmp) sk_skb_reason_drop(SKB_DROP_REASON_NETFILTER_DROP)
192.168.232.62:0->8.8.8.8:0(icmp) skb_release_head_state
192.168.232.62:0->8.8.8.8:0(icmp) sock_wfree
192.168.232.62:0->8.8.8.8:0(icmp) skb_release_data
192.168.232.62:0->8.8.8.8:0(icmp) skb_free_head
192.168.232.62:0->8.8.8.8:0(icmp) kfree_skbmem

What’s next? #

In this tutorial we glanced over some of the features of pwru. Next, we will look into how to find bottlenecks and performance issues using pwru.

In the meantime:

But for now, another ☕.