graph TD
subgraph "Kernel Space"
P["Perf Event / Timer"] -->|"Trigger"| H["handle_perf"]
E["execve"] -->|"Trigger"| M["match_comm_on_exec"]
H --> L1{"PID in filter?"}
L1 -->|"No"| Exit["Exit"]
L1 -->|"Yes"| R1["Reserve Ringbuf Space"]
R1 -->|"Success"| S["Capture Kernel & User Stacks"]
S -->|"Submit"| Sub["Submit Event"]
R1 -->|"Failure"| Err["Log to Error Map"]
M --> L2{"Comm in Trie?"}
L2 -->|"Yes"| R2["Reserve Trigger Space"]
R2 -->|"Success"| Sub2["Submit Match Event"]
end
subgraph "Maps & Buffers"
Sub --> RB1[("stack_events Ringbuf")]
Sub2 --> RB2[("trigger_events Ringbuf")]
Err --> RB3[("errors Ringbuf")]
T[("comm_lpm_trie")] -.-> L2
F[("pid_filter_map")] -.-> L1
end
subgraph "User Space (Rust)"
RB1 --> Reader1["Capture Orchestrator"]
RB2 --> Reader2["Trigger Agent"]
RB3 --> Reader3["Error Logger"]
Control["Agent Logic"] -->|"Update"| F
Control -->|"Update"| T
end
4 The Heart of the Agent: eBPF Programs
4.1 eBPF in 5 Minutes
To understand Bistouri, one must first view eBPF not merely as “code that runs in the kernel,” but as a highly constrained, event-driven virtual machine. Our agent doesn’t “run” continuously; it sleeps until the kernel encounters a specific trigger—a timer interrupt, a system call, or a process execution.
The constraints of this environment dictate our architecture. Because the eBPF verifier must prove our program is safe, we cannot use loops with unknown bounds or allocate memory dynamically. This leads us to a design where all memory is pre-allocated in “maps.” Maps are the bridge between the transient world of kernel events and the persistent world of our Rust userspace agent.
4.2 The Profiler Program
The profiler consists of two primary entry points, each serving a distinct purpose in the lifecycle of a monitored application.
The first is handle_perf, attached to a perf_event. This is our sampling engine. It fires periodically—ideally at a prime frequency like 19Hz to avoid aliasing with periodic application tasks—to capture the state of the CPU. The design here is a “filter-first” approach. The very first thing we do is check pid_filter_map. If the current Process ID isn’t in that map, we exit immediately. This minimizes the overhead on the rest of the system—we only pay the full cost of stack walking for processes we actually care about.
A major design choice in handle_perf is the use of the BPF_F_USER_BUILD_ID flag. This instructs the kernel to perform the VMA walk and resolve the ELF Build ID for every user-space frame during the stack walk. This is significantly more efficient than capturing raw instruction pointers and trying to resolve them in userspace by parsing /proc/<pid>/maps, which is often stale or race-prone.
The second is match_comm_on_exec. Attached to the sched_process_exec tracepoint, this is a lifecycle hook. When a process calls execve, we check if its command name matches our watch list. This allows Bistouri to “discover” and start profiling new instances of a service the moment they start, rather than waiting for a userspace polling loop to find them.
4.4 Satisfying the Verifier
The eBPF verifier is our most demanding “code reviewer.” To satisfy it, we adopt several patterns:
- Null Checks: Every
bpf_map_lookup_elemandbpf_ringbuf_reservecan return NULL. We must check these explicitly before accessing the pointer. - Bounded Copies: When copying the process name (
comm), we use__builtin_memcpywith fixed sizes (TASK_COMM_LEN). This ensures the verifier can statically prove we aren’t overrunning our destination. - Stack Depth Limits: We limit stack traces to 127 frames. This is a compromise between visibility and the eBPF program’s complexity limit. Each frame in the user stack is a 32-byte
user_stack_framestruct; capturing 127 of these pushes the limits of what we can safely handle within the eBPF execution context.
4.5 Build Pipeline (libbpf-cargo)
Bistouri uses libbpf-cargo to bridge the C and Rust worlds. The build process is orchestrated by build.rs, which invokes the SkeletonBuilder.
This generates a “skeleton”—a Rust module that embeds the compiled BPF bytecode and provides a type-safe interface for loading the programs. A key component here is vmlinux.h, generated via CO-RE (Compile Once – Run Everywhere). Instead of relying on local kernel headers, we use a generated header that describes the kernel’s internal structures. This allows a single Bistouri binary to run on different kernel versions, even if internal struct offsets have shifted.
4.6 Memory Layout & repr(C)
The data structures defined in profiler.h are the “contracts” between the kernel and userspace. Because C and Rust have different ideas about struct padding and alignment, these headers must be designed carefully.
We use explicit width types (__u32, __u64) and manual padding (like _pad in error_event) to ensure consistency. In Rust, we mirror these with #[repr(C)].
The user_stack_frame is particularly interesting. It contains a union:
struct user_stack_frame {
__s32 status;
unsigned char build_id[BUILD_ID_SIZE];
union {
__u64 offset; // file offset when status=1 (Valid)
__u64 ip; // raw instruction pointer when status=2 (Fallback)
};
};This union allows the kernel to provide a resolved file offset when it successfully finds a Build ID, or a raw instruction pointer when it cannot (e.g., for vDSO or JITed code). Rust handles this by parsing the status field and interpreting the 8-byte union accordingly.
4.7 Program Flow
The following diagram illustrates how an event flows from a kernel trigger through our eBPF logic and into the ring buffers for userspace consumption.
Auto-generated from commit 02320e5 by Gemini 3.1 Pro. Last updated: 2025-05-15