4 The Heart of the Agent: eBPF Programs

4.1 eBPF in 5 Minutes

To understand Bistouri, one must first view eBPF not merely as “code that runs in the kernel,” but as a highly constrained, event-driven virtual machine. Our agent doesn’t “run” continuously; it sleeps until the kernel encounters a specific trigger—a timer interrupt, a system call, or a process execution.

The constraints of this environment dictate our architecture. Because the eBPF verifier must prove our program is safe, we cannot use loops with unknown bounds or allocate memory dynamically. This leads us to a design where all memory is pre-allocated in “maps.” Maps are the bridge between the transient world of kernel events and the persistent world of our Rust userspace agent.

4.2 The Profiler Program

The profiler consists of two primary entry points, each serving a distinct purpose in the lifecycle of a monitored application.

The first is handle_perf, attached to a perf_event. This is our sampling engine. It fires periodically—ideally at a prime frequency like 19Hz to avoid aliasing with periodic application tasks—to capture the state of the CPU. The design here is a “filter-first” approach. The very first thing we do is check pid_filter_map. If the current Process ID isn’t in that map, we exit immediately. This minimizes the overhead on the rest of the system—we only pay the full cost of stack walking for processes we actually care about.

A major design choice in handle_perf is the use of the BPF_F_USER_BUILD_ID flag. This instructs the kernel to perform the VMA walk and resolve the ELF Build ID for every user-space frame during the stack walk. This is significantly more efficient than capturing raw instruction pointers and trying to resolve them in userspace by parsing /proc/<pid>/maps, which is often stale or race-prone.

The second is match_comm_on_exec. Attached to the sched_process_exec tracepoint, this is a lifecycle hook. When a process calls execve, we check if its command name matches our watch list. This allows Bistouri to “discover” and start profiling new instances of a service the moment they start, rather than waiting for a userspace polling loop to find them.

4.3 Shared Data Structures (Kernel ↔︎ Userspace)

We use three distinct types of BPF maps, each chosen for its specific performance characteristics:

Hash Map (pid_filter_map): Used for O(1) lookups of active targets. When the Orchestrator starts a capture session, it inserts the PID here. The eBPF program treats this as a read-only allow-list.
LPM Trie (comm_lpm_trie): The Longest Prefix Match trie is used for the “trigger” logic. We use BPF_F_NO_PREALLOC here, as required by the kernel for this map type. It allows for both exact matches and prefix-based matching (e.g., matching all processes starting with worker-).
Ring Buffers: We use three dedicated ring buffers (stack_events, trigger_events, and errors). Ring buffers are more memory-efficient than older per-CPU arrays when event rates are bursty. They also support bpf_ringbuf_reserve, which allows us to write data directly into the buffer memory, avoiding an extra copy from the eBPF stack.

Error Telemetry

Bistouri treats eBPF failures as first-class telemetry. If we fail to reserve space in a ring buffer or bpf_get_stack fails, we report a tagged error event. This “meta-profiling” is essential for operators to know if the agent is dropping samples due to resource contention.

4.4 Satisfying the Verifier

The eBPF verifier is our most demanding “code reviewer.” To satisfy it, we adopt several patterns:

Null Checks: Every bpf_map_lookup_elem and bpf_ringbuf_reserve can return NULL. We must check these explicitly before accessing the pointer.
Bounded Copies: When copying the process name (comm), we use __builtin_memcpy with fixed sizes (TASK_COMM_LEN). This ensures the verifier can statically prove we aren’t overrunning our destination.
Stack Depth Limits: We limit stack traces to 127 frames. This is a compromise between visibility and the eBPF program’s complexity limit. Each frame in the user stack is a 32-byte user_stack_frame struct; capturing 127 of these pushes the limits of what we can safely handle within the eBPF execution context.

4.5 Build Pipeline (libbpf-cargo)

Bistouri uses libbpf-cargo to bridge the C and Rust worlds. The build process is orchestrated by build.rs, which invokes the SkeletonBuilder.

This generates a “skeleton”—a Rust module that embeds the compiled BPF bytecode and provides a type-safe interface for loading the programs. A key component here is vmlinux.h, generated via CO-RE (Compile Once – Run Everywhere). Instead of relying on local kernel headers, we use a generated header that describes the kernel’s internal structures. This allows a single Bistouri binary to run on different kernel versions, even if internal struct offsets have shifted.

4.6 Memory Layout & repr(C)

The data structures defined in profiler.h are the “contracts” between the kernel and userspace. Because C and Rust have different ideas about struct padding and alignment, these headers must be designed carefully.

We use explicit width types (__u32, __u64) and manual padding (like _pad in error_event) to ensure consistency. In Rust, we mirror these with #[repr(C)].

The user_stack_frame is particularly interesting. It contains a union:

struct user_stack_frame {
    __s32 status;
    unsigned char build_id[BUILD_ID_SIZE];
    union {
        __u64 offset;  // file offset when status=1 (Valid)
        __u64 ip;      // raw instruction pointer when status=2 (Fallback)
    };
};

This union allows the kernel to provide a resolved file offset when it successfully finds a Build ID, or a raw instruction pointer when it cannot (e.g., for vDSO or JITed code). Rust handles this by parsing the status field and interpreting the 8-byte union accordingly.

4.7 Program Flow

The following diagram illustrates how an event flows from a kernel trigger through our eBPF logic and into the ring buffers for userspace consumption.

graph TD
    subgraph "Kernel Space"
        P["Perf Event / Timer"] -->|"Trigger"| H["handle_perf"]
        E["execve"] -->|"Trigger"| M["match_comm_on_exec"]

        H --> L1{"PID in filter?"}
        L1 -->|"No"| Exit["Exit"]
        L1 -->|"Yes"| R1["Reserve Ringbuf Space"]

        R1 -->|"Success"| S["Capture Kernel & User Stacks"]
        S -->|"Submit"| Sub["Submit Event"]
        R1 -->|"Failure"| Err["Log to Error Map"]

        M --> L2{"Comm in Trie?"}
        L2 -->|"Yes"| R2["Reserve Trigger Space"]
        R2 -->|"Success"| Sub2["Submit Match Event"]
    end

    subgraph "Maps & Buffers"
        Sub --> RB1[("stack_events Ringbuf")]
        Sub2 --> RB2[("trigger_events Ringbuf")]
        Err --> RB3[("errors Ringbuf")]
        T[("comm_lpm_trie")] -.-> L2
        F[("pid_filter_map")] -.-> L1
    end

    subgraph "User Space (Rust)"
        RB1 --> Reader1["Capture Orchestrator"]
        RB2 --> Reader2["Trigger Agent"]
        RB3 --> Reader3["Error Logger"]
        Control["Agent Logic"] -->|"Update"| F
        Control -->|"Update"| T
    end

Auto-generated from commit 02320e5 by Gemini 3.1 Pro. Last updated: 2025-05-15