Overview and Workflow

neutrino is implemented with three major components:

Hook Driver (neutrino/src/): Providing runtime support for assembly tracking, code caching, implemented in C and injected into client programs.
Probe Engine (neutrino/probe/): Instrumenting probes based on assembly probes (.toml), implemented in Python and exposed as a CLI tool.
DSL Compiler (neutrino/language/): Compile the platform-independent Tracing DSL (.py) into low-level assembly probes (.toml), implemented in Python and exposed as a CLI tool.

These modules are integrated together with fork/exec to expose a simple command-line interface similar to bpftrace and valgrind:

The basic workflow are broken down into following steps:

CLI Entry will load the probe in -p/--probe option.
If probe is of DSL (.py), DSL Compiler will be invoked to compile, and verify into assembly probes (.asm) wrapped in TOML.
CLI Entry will fork a subprocess to exec the workload (python main.py) and inject the hook driver via LD_PRELOAD.
Hook Driver will continuously capture the GPU workload, particularly the GPU kernels launched. For each kernel:
1. Hook driver fork a subprocess to exec the probe engine.
2. Probe engine will objdump, probe, and reassemble the kernel.
3. Hook driver wait for probe engine, and load the probed kernel (and metadata) back.
4. Hook driver malloc the probe maps on device and host, then launch the kernel and syncrhonize.
5. After synchronization, hook driver memcpy the probe maps from device to host, then fwrite to file system.