|
| 1 | +# eBPF Tutorial: cgroup-based Policy Control |
| 2 | + |
| 3 | +This tutorial demonstrates how to use cgroup eBPF programs to implement per-cgroup policy controls for networking, device access, and sysctl operations. |
| 4 | + |
| 5 | +## What is cgroup eBPF? |
| 6 | + |
| 7 | +**cgroup eBPF** allows you to attach eBPF programs to cgroups (control groups) to enforce policies based on process/container membership. Unlike XDP/tc which work on network interfaces, cgroup eBPF works at the process level: |
| 8 | + |
| 9 | +- Policies only affect processes in the target cgroup |
| 10 | +- Perfect for container/multi-tenant/sandbox isolation |
| 11 | +- Covers: network access control, socket options, sysctl access, device access |
| 12 | + |
| 13 | +When a cgroup eBPF program denies an operation, userspace typically sees `EPERM` (Operation not permitted). |
| 14 | + |
| 15 | +## cgroup eBPF Hook Points |
| 16 | + |
| 17 | +### 1. `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` - Socket Address Hooks |
| 18 | + |
| 19 | +Triggered on socket address syscalls (bind/connect/sendmsg/recvmsg): |
| 20 | + |
| 21 | +| Hook | Section Name | Description | |
| 22 | +|------|--------------|-------------| |
| 23 | +| IPv4 bind | `cgroup/bind4` | Filter bind() calls | |
| 24 | +| IPv6 bind | `cgroup/bind6` | Filter bind() calls | |
| 25 | +| IPv4 connect | `cgroup/connect4` | Filter connect() calls | |
| 26 | +| IPv6 connect | `cgroup/connect6` | Filter connect() calls | |
| 27 | +| UDP sendmsg | `cgroup/sendmsg4`, `cgroup/sendmsg6` | Filter UDP sends | |
| 28 | +| UDP recvmsg | `cgroup/recvmsg4`, `cgroup/recvmsg6` | Filter UDP receives | |
| 29 | +| Unix connect | `cgroup/connect_unix` | Filter Unix socket connect | |
| 30 | + |
| 31 | +**Context**: `struct bpf_sock_addr` - contains `user_ip4`, `user_port` (network byte order) |
| 32 | + |
| 33 | +**Return semantics**: `return 1` = allow, `return 0` = deny (EPERM) |
| 34 | + |
| 35 | +### 2. `BPF_PROG_TYPE_CGROUP_DEVICE` - Device Access Control |
| 36 | + |
| 37 | +| Hook | Section Name | Description | |
| 38 | +|------|--------------|-------------| |
| 39 | +| Device access | `cgroup/dev` | Filter device open/read/write/mknod | |
| 40 | + |
| 41 | +**Context**: `struct bpf_cgroup_dev_ctx` - contains `major`, `minor`, `access_type` |
| 42 | + |
| 43 | +**Return semantics**: `return 0` = deny (EPERM), non-zero = allow |
| 44 | + |
| 45 | +### 3. `BPF_PROG_TYPE_CGROUP_SYSCTL` - Sysctl Access Control |
| 46 | + |
| 47 | +| Hook | Section Name | Description | |
| 48 | +|------|--------------|-------------| |
| 49 | +| Sysctl access | `cgroup/sysctl` | Filter /proc/sys reads/writes | |
| 50 | + |
| 51 | +**Context**: `struct bpf_sysctl` - use `bpf_sysctl_get_name()` to get sysctl name |
| 52 | + |
| 53 | +**Return semantics**: `return 0` = reject (EPERM), `return 1` = proceed |
| 54 | + |
| 55 | +### 4. Other cgroup Hooks |
| 56 | + |
| 57 | +- `cgroup_skb/ingress`, `cgroup_skb/egress` - Packet-level filtering |
| 58 | +- `cgroup/getsockopt`, `cgroup/setsockopt` - Socket option filtering |
| 59 | +- `cgroup/sock_create`, `cgroup/sock_release` - Socket lifecycle |
| 60 | +- `sockops` - TCP-level optimization (attached via `BPF_CGROUP_SOCK_OPS`) |
| 61 | + |
| 62 | +## This Tutorial: cgroup Policy Guard |
| 63 | + |
| 64 | +We implement a single eBPF object with three programs: |
| 65 | + |
| 66 | +1. **Network (TCP)**: Block `connect()` to a specified destination port |
| 67 | +2. **Device**: Block access to a specified `major:minor` device |
| 68 | +3. **Sysctl**: Block reading a specified sysctl (read-only, safer for testing) |
| 69 | + |
| 70 | +Events are sent to userspace via ringbuf for observability. |
| 71 | + |
| 72 | +## Building |
| 73 | + |
| 74 | +```bash |
| 75 | +cd src/49-cgroup |
| 76 | +make |
| 77 | +``` |
| 78 | + |
| 79 | +## Running |
| 80 | + |
| 81 | +### Terminal A: Start the loader |
| 82 | + |
| 83 | +```bash |
| 84 | +# Block: TCP port 9090, /dev/null (1:3), reading kernel/hostname |
| 85 | +sudo ./cgroup_guard \ |
| 86 | + --cgroup /sys/fs/cgroup/ebpf_demo \ |
| 87 | + --block-port 9090 \ |
| 88 | + --deny-device 1:3 \ |
| 89 | + --deny-sysctl kernel/hostname |
| 90 | +``` |
| 91 | + |
| 92 | +You should see: |
| 93 | +``` |
| 94 | +Attached to cgroup: /sys/fs/cgroup/ebpf_demo |
| 95 | +Config: block_port=9090, deny_device=1:3, deny_sysctl_read=kernel/hostname |
| 96 | +Press Ctrl-C to stop. |
| 97 | +``` |
| 98 | + |
| 99 | +### Terminal B: Start test servers (outside cgroup) |
| 100 | + |
| 101 | +```bash |
| 102 | +# Start two HTTP servers |
| 103 | +python3 -m http.server 8080 --bind 127.0.0.1 & |
| 104 | +python3 -m http.server 9090 --bind 127.0.0.1 & |
| 105 | +``` |
| 106 | + |
| 107 | +### Terminal C: Test from within the cgroup |
| 108 | + |
| 109 | +```bash |
| 110 | +sudo bash -c ' |
| 111 | +echo $$ > /sys/fs/cgroup/ebpf_demo/cgroup.procs |
| 112 | +
|
| 113 | +echo "== TCP test ==" |
| 114 | +curl -s http://127.0.0.1:8080 >/dev/null && echo "8080 OK" |
| 115 | +curl -s http://127.0.0.1:9090 >/dev/null && echo "9090 OK (unexpected)" || echo "9090 BLOCKED (expected)" |
| 116 | +
|
| 117 | +echo |
| 118 | +echo "== Device test ==" |
| 119 | +cat /dev/null && echo "/dev/null OK (unexpected)" || echo "/dev/null BLOCKED (expected)" |
| 120 | +
|
| 121 | +echo |
| 122 | +echo "== Sysctl test ==" |
| 123 | +cat /proc/sys/kernel/hostname && echo "sysctl read OK (unexpected)" || echo "sysctl read BLOCKED (expected)" |
| 124 | +' |
| 125 | +``` |
| 126 | + |
| 127 | +Expected output: |
| 128 | +- `8080 OK` - Port 8080 is allowed |
| 129 | +- `9090 BLOCKED (expected)` - Port 9090 is blocked |
| 130 | +- `/dev/null BLOCKED (expected)` - Device 1:3 is blocked |
| 131 | +- `sysctl read BLOCKED (expected)` - Reading kernel/hostname is blocked |
| 132 | + |
| 133 | +### Terminal A output (events) |
| 134 | + |
| 135 | +``` |
| 136 | +[DENY connect4] pid=12345 comm=curl daddr=127.0.0.1 dport=9090 proto=6 |
| 137 | +[DENY device] pid=12346 comm=cat major=1 minor=3 access_type=0x... |
| 138 | +[DENY sysctl] pid=12347 comm=cat write=0 name=kernel/hostname |
| 139 | +``` |
| 140 | + |
| 141 | +## Verifying with bpftool |
| 142 | + |
| 143 | +```bash |
| 144 | +sudo bpftool cgroup tree /sys/fs/cgroup/ebpf_demo |
| 145 | +``` |
| 146 | + |
| 147 | +## Key Implementation Details |
| 148 | + |
| 149 | +### 1. Network byte order for sock_addr |
| 150 | + |
| 151 | +```c |
| 152 | +// user_port is network byte order, must convert |
| 153 | +__u16 dport = bpf_ntohs((__u16)ctx->user_port); |
| 154 | +``` |
| 155 | + |
| 156 | +### 2. Return value semantics |
| 157 | + |
| 158 | +```c |
| 159 | +// For sock_addr (connect4/bind4/etc): |
| 160 | +return 1; // allow |
| 161 | +return 0; // deny -> EPERM |
| 162 | + |
| 163 | +// For device: |
| 164 | +return 0; // deny -> EPERM |
| 165 | +return 1; // allow |
| 166 | + |
| 167 | +// For sysctl: |
| 168 | +return 0; // reject -> EPERM |
| 169 | +return 1; // proceed |
| 170 | +``` |
| 171 | + |
| 172 | +### 3. Configuration via .rodata |
| 173 | + |
| 174 | +```c |
| 175 | +// BPF side - const volatile for CO-RE |
| 176 | +const volatile __u16 blocked_tcp_dport = 0; |
| 177 | + |
| 178 | +// Userspace - set before load |
| 179 | +skel->rodata->blocked_tcp_dport = (__u16)port; |
| 180 | +``` |
| 181 | + |
| 182 | +## Files |
| 183 | + |
| 184 | +- `cgroup_guard.h` - Shared data structures |
| 185 | +- `cgroup_guard.bpf.c` - eBPF programs (connect4, device, sysctl hooks) |
| 186 | +- `cgroup_guard.c` - Userspace loader |
| 187 | +- `Makefile` - Build system |
| 188 | + |
| 189 | +## References |
| 190 | + |
| 191 | +- [Kernel docs: libbpf program types](https://docs.kernel.org/bpf/libbpf/program_types.html) |
| 192 | +- [eBPF docs: CGROUP_SOCK_ADDR](https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_SOCK_ADDR/) |
| 193 | +- [eBPF docs: CGROUP_DEVICE](https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_DEVICE/) |
0 commit comments