2 Triggering: Profiling on Demand

2.1 The Problem: When to Profile?

Continuous profiling is the “holy grail” of observability, but it introduces a classic observer’s paradox: the more frequently you profile to catch transient issues, the more CPU and memory you consume, potentially exacerbating the very performance problems you are trying to debug.

In Bistouri, we chose not to profile everything all the time. Instead, we treat profiling as a reactive diagnostic tool. The challenge is defining exactly when the agent should wake up and start recording. If we trigger based on simple CPU usage (e.g., “CPU > 90%”), we often catch healthy processes doing heavy lifting. We need a signal that indicates saturation—where a process wants to run but is being held back by the kernel or hardware.

2.2 PSI as a Trigger Mechanism

We solve the “when” problem using Linux Pressure Stall Information (PSI). Unlike simple utilization metrics, PSI tracks the amount of time tasks spend waiting for hardware resources (CPU, Memory, or IO). It provides a leading indicator of performance degradation: a process might only be using 10% of a CPU, but if it’s stalled 50% of the time waiting for memory pages to swap in, it’s in trouble.

Why PSI?

Utilization (CPU %) tells you how busy a resource is. Pressure (PSI) tells you how much that resource is delaying your application. For a profiler, pressure is a much higher-fidelity signal that the application is actually suffering.

Bistouri’s trigger system allows users to define thresholds in a configuration file (e.g., “Profile node if it spends more than 50ms per second stalled on memory”). We map these rules to specific cgroups. By opening the /proc/pressure/ files (or their cgroup-specific counterparts) and using poll() with the POLLPRI flag, the kernel notifies us the moment a threshold is crossed. This allows Bistouri to remain nearly dormant until a specific process actually experiences distress.

2.3 Trie-Based Process Routing

Once we have a trigger rule, we need to find the processes it applies to. In a dynamic environment like Kubernetes, we cannot rely on PIDs, which are recycled and ephemeral. We rely on the process “comm” name (the filename of the executable).

To handle this efficiently, we use a dual-trie architecture:

Userspace Radix Trie: A standard radix trie allows us to store arbitrary rules and perform \(O(L)\) lookups (where \(L\) is the string length) during our periodic system scans.
BPF LPM Trie: We mirror these rules into a BPF BPF_MAP_TYPE_LPM_TRIE.

The BPF trie is the “fast path.” When a new process is exec’d, our BPF programs check the trie to see if the process name matches any profiling rules. If it does, it immediately notifies userspace to set up a PSI watcher for that process’s cgroup.

2.4 Configuration Matching & Glob Semantics

Process naming is often messy. A pool of workers might be named worker-01, worker-02, etc. To handle this, Bistouri supports two types of matching: Exact and Prefix.

The design of the prefix match is constrained by the Linux kernel’s TASK_COMM_LEN, which limits process names to 15 characters (plus a null terminator). Our BpfTrie implementation must carefully translate these into Longest Prefix Match keys that the kernel understands.

// In src/trigger/trie.rs: Translating a rule to a BPF-compatible key
match rule {
    MatchRule::Exact { comm } => {
        // We include the NUL terminator in the bit-prefix to ensure
        // "node" matches "node" but NOT "nodejs".
        key.prefixlen = ((bytes.len() + 1) * 8) as u32;
    }
    MatchRule::Prefix { comm } => {
        // Pure prefix match: "worker-" matches "worker-1"
        key.prefixlen = (bytes.len() * 8) as u32;
    }
}

This translation ensures that our userspace logic and BPF logic remain perfectly synchronized, even though they use different trie implementations.

2.5 The proc_walk Loop

While BPF gives us high-performance, event-driven notifications for new processes, it has blind spots. It might miss processes that were already running before Bistouri started, or processes that fork() without calling exec() (common in some pre-forking web servers).

To guarantee completeness, we implement a proc_walk. Every 30 seconds, the agent scans /proc, matches every running process against our userspace trie, and ensures a PSI watcher is active for every matching cgroup. This “belt and braces” approach combines the low latency of BPF with the reliability of a periodic scan.

Additionally, the walk serves as a metadata collection phase. For every matched process, we read /proc/<pid>/auxv to resolve its vDSO address range. This information is stored in a shared LRU cache, allowing the profiler to classify stack frames without performing blocking I/O during the high-frequency sampling path.

2.6 Hot Reload Without Downtime

Bistouri is designed for long-running production use, meaning we must support configuration changes without restarting the agent. The TriggerAgent manages this through an internal event loop.

When a configuration file change is detected (via inotify), the agent:

Signals the current proc_walk to cancel.
Purges the PsiRegistry, effectively “unplugging” all current kernel watchers.
Rebuilds the userspace matcher and repopulates the BPF trie.
Immediately triggers a fresh proc_walk to re-establish watchers based on the new rules.

This transition is atomic from the perspective of the event loop, ensuring we never have “leaked” watchers from an old configuration running alongside the new one.

2.7 Eventual Consistency Model

Because the BPF trie and the userspace agent communicate asynchronously via a ring buffer, there is a tiny window for race conditions during a hot reload. An old BPF rule might fire just as the userspace configuration is being cleared.

We handle this by assigning a unique, monotonically increasing rule_id to every target in the config. When the TriggerAgent receives an event from BPF, it re-validates the rule_id against its current userspace matcher. If the ID is unknown or the process name no longer matches that ID, the event is dropped as “stale.” This ensures that the system eventually converges on the correct state defined in the latest config file.

2.8 Data Flow

The following diagram illustrates how a process discovery event flows from the kernel up to the point where a PSI watcher is registered.

graph TD
    subgraph "Kernel Space"
        K_EXEC["Process Exec/Fork"] -->|"Tracepoint"| BPF_PROG["BPF Probe"]
        BPF_PROG -->|"LPM Lookup"| BPF_TRIE["BPF LPM Trie"]
        BPF_PROG -->|"Match Found"| RING_BUF["BPF Ring Buffer"]
    end

    subgraph "Userspace - Trigger Agent"
        RING_BUF -->|"Event"| EV_LOOP["Event Loop"]
        PROC_SCAN["Periodic /proc Walk"] -->|"Match"| EV_LOOP

        INOTIFY["inotify Watcher"] -->|"Reload Config"| EV_LOOP

        EV_LOOP -->|"Validate Rule ID"| MATCHER["Comm Matcher"]
        MATCHER -->|"Valid"| PSI_REG["PSI Registry"]

        PSI_REG -->|"New Cgroup"| SPAWN["Spawn Watcher Task"]
        SPAWN -->|"poll POLLPRI"| PSI_FILE["/proc/pressure/..."]
    end

    PSI_FILE -->|"Threshold Crossed"| SPAWN
    SPAWN -->|"Trigger"| PROFILER["Profiler Agent"]

Auto-generated from commit 02320e5 by Gemini 3.1 Pro. Last updated: 2026-05-10