Getting started with pwru
Table of Contents
What’s pwru
? #
pwru
logo (credit: Renee French, Vadim Shchekoldin).
Packet, where are you?, pwru
- pronounced ‘Peru’ - is an 🐝 eBPF tool
written in golang and C that traces network packets
(skbs) traversing the
Linux Kernel networking stack. It uses Kernel probes
to attach to the relevant kernel functions and intercept packets.
pwru
was originally developed by the Cilium project
to help developers (and users) debug Cilium itself, but its utility goes far
beyond Cilium.
What can I do with it? #
- Debugging packet drops (e.g. iptables/nftables, checksums, MTU, routing, RPF…).
- Debugging eBPF programs.
- Troubleshooting complex networking setups (e.g. K8s CNIs, docker networks, multiple network NSs in general..).
- Profiling / identifying bottlenecks (using
--timestamp
).
Installing it #
You can download a pre-packaged self-contained golang binary here:
Download from GithubTracing your first flow 🚀 #
pwru
uses pcap filter syntax
to determine which skbs to trace.
Let’s start by capturing all ICMP traffic towards 8.8.8.8
:
sudo pwru 'host 8.8.8.8 and icmp'
Output:
2025/09/08 20:55:48 Attaching kprobes (via kprobe-multi)...
1475 / 1475 [--------------------------------------------------------------] 100.00% ? p/s
2025/09/08 20:55:48 Attached (ignored 0)
2025/09/08 20:55:48 Listening for events..
SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC
Sending a ping to 8.8.8.8
:
ping 8.8.8.8 -c 1
should result in a trace similar to this one:
pwru output for a single ICMP req/reply
SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 0 0x0000 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 0 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 0 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) ip_output
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) apparmor_ip_postroute
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) __ip_finish_output
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 84 192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7f0331f800 0 ~/bin/ping:18515 4026531840 0 wlp0s20f3:3 0x0800 1500 98 192.168.232.62:0->8.8.8.8:0(icmp) skb_push
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 118 192.168.232.62:0->8.8.8.8:0(icmp) kfree_skbmem
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) inet_gro_receive
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) skb_defer_rx_timestamp
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) ip_rcv_core
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) nf_ip_checksum
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) __skb_checksum_complete
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) ip_route_input_noref
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) ip_route_input_slow
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) fib_validate_source
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 84 8.8.8.8:0->192.168.232.62:0(icmp) __fib_validate_source
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 84 8.8.8.8:0->192.168.232.62:0(icmp) ip_local_deliver
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 84 8.8.8.8:0->192.168.232.62:0(icmp) nf_hook_slow
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 84 8.8.8.8:0->192.168.232.62:0(icmp) ip_local_deliver_finish
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) ip_protocol_deliver_rcu
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) raw_local_deliver
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) icmp_rcv
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 56 8.8.8.8:0->192.168.232.62:0(icmp) ping_rcv
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 56 8.8.8.8:0->192.168.232.62:0(icmp) skb_push
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_clone
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) __ping_queue_rcv_skb
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) sock_queue_rcv_skb_reason
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) sk_filter_trim_cap
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) security_sock_rcv_skb
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) apparmor_socket_sock_rcv_skb
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) bpf_lsm_socket_sock_rcv_skb
0xffff8b7f0331f600 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) __sock_queue_rcv_skb
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) consume_skb
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_release_head_state
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_release_data
0xffff8b7f0331f800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 64 8.8.8.8:0->192.168.232.62:0(icmp) kfree_skbmem
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) __sock_recv_timestamp
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) ip_cmsg_recv_offset
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_free_datagram
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) consume_skb
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 65536 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_release_head_state
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 0 64 8.8.8.8:0->192.168.232.62:0(icmp) sock_rfree
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 0 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_release_data
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 0 64 8.8.8.8:0->192.168.232.62:0(icmp) skb_free_head
0xffff8b7f0331f600 4 ~/bin/ping:18515 4026531840 0 0 0x0800 0 64 8.8.8.8:0->192.168.232.62:0(icmp) kfree_skbmem
Digesting the output… #
pwru
traces Kernel functions traversed, one at time. __ip_local_out
is the
first function that receives an skb matching the filter:
SKB TUPLE FUNC
0xffff8b7f0331f800 [...] 192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out
The skb address is 0xffff8b7f0331f600
, and is a direct result of a call to sendto()’s
syscall by the ping userspace program.
The packet follows the Linux kernel “output path” (simplified):
__ip_local_out()
: at this point the skb only contains the IP header and above. The routing decision has been taken before and cached inskb->dst
. In this function the IPtot_len
andcsum
are adjusted. The skb is then passed to the netfilter subsystem.nf_hook()
: the skb goes through the netfilter output hook (NF_INET_LOCAL_OUT
) processing and it ends up innf_hook_slow()
. The veredict is positive so theokfn
(dst_output()
) is invoked.- […]
ip_finish_output2()
: the output interface counters are incremented, and the directly connected next hop L2 information (GW) is retreived (copy) inip_neigh_for_gw4()
andneigh_output()
which ends up at inneigh_connected_output()
(not seen in the trace).
It is at this point where the L2 header(s) are prepended (including 802.1q/802.1ad etc.). It then invokesdev_queue_xmit()
.- […]
__dev_queue_xmit()
: the skb is passed to the driver which will transmit the frame.
The MTU
and LEN
columns #
Some readers will have spotted two interesting things:
MTU
column values are sometimes large (65536
). MTU has no meaning on the RX path. The large value is a result of Generic Receive Offload.- The length of the packet changes after
ip_finish_output2()
. As discussed above,neigh_output()
increases 14 bytes the packet length, as the next hop is directly connected (via an VLAN untagged interface):
MTU LEN TUPLE FUNC
1500 84 192.168.232.62:0->8.8.8.8:0(icmp) ip_finish_output2
1500 98 192.168.232.62:0->8.8.8.8:0(icmp) __dev_queue_xmit
Detecting MTU / fragmentation issues #
Let’s now send a large packet, much bigger than the MTU of the egress device:
ping 8.8.8.8 -c 1 -s 1800
pwru output for a single ICMP req > MTU
SKB CPU PROCESS NETNS MARK/x IFACE PROTO MTU LEN TUPLE FUNC
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 0 0x0000 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) __ip_local_out
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 0 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 0 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_output
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) nf_hook_slow
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) apparmor_ip_postroute
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) __ip_finish_output
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_do_fragment
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_init
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1500 192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_prepare
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 0 0x0000 0 348 192.168.1.39:0->8.8.8.8:0(icmp) ip_copy_metadata
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1500 192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7fe482d100 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 1514 192.168.1.39:0->8.8.8.8:0(icmp) skb_push
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 348 192.168.1.39:0->8.8.8.8:0(icmp) ip_finish_output2
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) __dev_queue_xmit
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) netdev_core_pick_tx
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_skb
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) netif_skb_features
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) skb_network_protocol
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) validate_xmit_xfrm
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) dev_hard_start_xmit
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) __skb_get_hash_net
0xffff8b7fe482c800 5 ~/bin/ping:94539 4026531840 0 wlp0s20f3:3 0x0800 1500 362 192.168.1.39:0->8.8.8.8:0(icmp) skb_push
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7fe482d100 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 1542 192.168.1.39:0->8.8.8.8:0(icmp) kfree_skbmem
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) sock_wfree
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) consume_skb
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) skb_release_head_state
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) skb_release_data
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) skb_free_head
0xffff8b7fe482c800 0 ~147-iwlwifi:667 4026531840 0 wlp0s20f3:3 0x0800 1500 390 192.168.1.39:0->8.8.8.8:0(icmp) kfree_skbmem
ping
doesn’t set DF=1
, and the L3 packet size is 1828, much bigger than 1500,
ip_finish_output()
fragments the datagram by invoking ip_do_fragment()
,
which clones the skb:
MTU LEN TUPLE FUNC
1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_do_fragment
1500 1828 192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_init
1500 1500 192.168.1.39:0->8.8.8.8:0(icmp) ip_fraglist_prepare
0 348 192.168.1.39:0->8.8.8.8:0(icmp) ip_copy_metadata
0xffff8b7fe482d100
: original skb, with the first fragment of 1500 bytes (1514 with L2).0xffff8b7fe482c800
: last fragment of 348 bytes, 20 bytes for the IP header, 328 of data (a total of 362 with L2).
8.8.8.8
drops any received IP fragments, so
ICMP requests are never answered.Now let’s force DF=1
:
ping 8.8.8.8 -M probe -c 1 -s 1800
Output:
PING 8.8.8.8 (8.8.8.8) 1800(1828) bytes of data.
ping: sendmsg: Message too long
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
The result is… nothing! The kernel knows the effective MTU towards 8.8.8.8
is too low, so it returns -EMSGSIZE
to the sendto()
syscall.
Filters and --filter-track-skb
#
pwru
will capture the packet as soon as the pcap filter matches the skb.
By default pwru
will stop tracing the skb once the filter doesn’t match
anymore. This may happen, for instance, as a result of a NAT transformation or
a tunnel encapsulation.
--filter-track-skb
instructs pwru
to, once captured, continue to trace the
skb until it’s returned to the pool, either transmitted or dropped/discarded.
As in any tracing / capturing tool (e.g. tcpdump), pwru
can affect performance.
Try to reduce the scope of the filter to the minimum necessary.
pwru
can impact performance. Try to keep your filter as narrow as possible to
minimize the overhead.Debugging netfilter drops #
Let’s now add an nftables rule to drop this flow:
nft add table inet filter
nft add chain inet filter output { type filter hook output priority 0 \; }
nft add rule inet filter output ip daddr 8.8.8.8 icmp type echo-request drop
Equivalent iptables rule
iptables -I OUTPUT -p icmp -d 8.8.8.8 -j DROP
The result is that pwru
now shows that the skb is dropped at nf_hook_slow()
,
a part of the Netfilter subsystem, with the reason SKB_DROP_REASON_NETFILTER_DROP
:
TUPLE FUNC
192.168.232.62:0->8.8.8.8:0(icmp) __ip_local_out
192.168.232.62:0->8.8.8.8:0(icmp) nf_hook_slow
192.168.232.62:0->8.8.8.8:0(icmp) sk_skb_reason_drop(SKB_DROP_REASON_NETFILTER_DROP)
192.168.232.62:0->8.8.8.8:0(icmp) skb_release_head_state
192.168.232.62:0->8.8.8.8:0(icmp) sock_wfree
192.168.232.62:0->8.8.8.8:0(icmp) skb_release_data
192.168.232.62:0->8.8.8.8:0(icmp) skb_free_head
192.168.232.62:0->8.8.8.8:0(icmp) kfree_skbmem
What’s next? #
In this tutorial we glanced over some of the features of pwru
. Next, we
will look into how to find bottlenecks and performance issues using pwru
.
In the meantime:
But for now, another ☕.