logoNeutrino

Reading Traces

As a comprehensive system, traces in Neutrino is not only a file but a system.

Trace Folder Structure

NEUTRINO's traces will be placed in folders, commonly like trace/Apr24_231539_1860576/, of:

  • A parent folder, default to be ./trace and is configurable via --tracedir option of NEUTRINO CLI.
  • Traces of each process or run will be placed in a subfolder whose name follows: <DATE>_<TIME>_<PID>. For example, Apr24_231539_1860576 means a trace recorded at 23:15:39 of April 24 (local timezone) from process 1860576.

Each trace folder are commonly structured as:

probe.toml
event.log

At minimum, there are two files, event.log of Hook Driver internal logs for troubleshooting and probe.toml of compiled assembly probes for reproducing experiments. There are also two folders, result containing .bins of trace results and kernel contains kernel captured and jitted.

Trace Structure

Analysis Codegen Supported!

NEUTRINO now supports code generation for parsing traces into Python. You probably don't need to handle trace parsing manually!

Each trace (.bin in result/) is structured in ELF-like manner, starting with a header,

Neutrino Trace Header
typedef struct {
    uint32_t gridDimX;
    uint32_t gridDimY;
    uint32_t gridDimZ;
    uint32_t blockDimX;
    uint32_t blockDimY;
    uint32_t blockDimZ;
    uint32_t sharedMemBytes;
    uint32_t numMaps; // number of traces exposed
} trace_header_t;

followed by several sections (each for a map), and raw binaries whose offset corresponds to section definitions:

Neutrino Trace Section
typedef struct {
    uint32_t size;    // size of record per thread/warp in byte
    uint32_t warpDiv; // warpSize for warp-level, 1 for thread-level
    uint64_t offset;  // offset for fseek
} trace_section_t;

By fseek to section.offset and fread(gridSize * blockSize * section.size), one can safely read out the traces in CPU for analysis.

Event Log Structure

event.log is the operation logs of the Hook Driver, and many functionalities, such as measuring probe overhead or troubleshooting, relies on parsing this log, like the following:

event.log
[init] pid 3183093
[init] cmd 33 python unittest/kernel/torch_.py �#
[init] dl 0x13f9690
[init] success
[mem] cuMemAlloc_v2 0 dptr 7f1292000000 bytesize 33554432
[mod] cuLibraryLoadData 0 lib 0x7ce5c60 code 0x7f1414279a28
[bin] type warpped_fatbin size 4962432
[mod] cuLibraryGetModule 0 mod 0x7861540 lib 0x7ce5c60
[mod] cuModuleGetFunction func 0x8209510 mod 0x7861540 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_
[exec] funcmap-not-find 0x8209510
[probe] find 0x8209510 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ bin 0x7f1343b41010 size 4962432
[probe] rename _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ 0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6
[probe] mkdir ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6
[probe] write ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6/original.bin
[probe] subproc /home/huangs0/anaconda3/bin/python /home/huangs0/anaconda3/lib/python3.11/site-packages/neutrino/probe/cuda.py ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6 _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_
[probe] python succeed
[probe] read ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6/kernel.info name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ n_param 3 n_probe 1 
[probe] finish 0x8209510 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ n_param 3
[exec] funcmap-find 0x8209510 success
[exec] 1750652293770568192 param 1000000 10000000000 7f1292000000 
[exec] grid 32768 1 1 block 128 1 1 shared 0
[exec] grid 32768 warp  4 probe 16 total 2097152
[exec] probe-mem 2097152 (bytes)
[exec] succeed 0
[exec] save ./trace/Jun23_121812_3183093/result/0.611403.bin size 2097200
[exec] prologue 234.087418 kernel 0.065536 epilogue 2.705856 ratio 3614.178711
[callback] subproc /home/huangs0/anaconda3/bin/python neutrino/tools/block_sched_callback.py ./trace/Jun23_121812_3183093/result/0.611403.bin
[callback] succeed

From top to down, there are several important section:

  • [init]: messages from hook driver initialization, includes the pid and cmd (important for reproducing experiment!), as well as dl address loaded and the init success for troubleshooting.
  • [mod]: messages from module management, recording the CU/hipModules loaded to the system.
  • [probe]: interaction with the probe engine, records the subproc creation for probe engine and loads back.
  • [exec]: kernel execution logs, with kernel pointer, parameters, probe map allocations, execution status, path of saved traces, time spending, etc.

Kernel Folder Structure

Each folder inside kernel corresponds to a kernel, named by an auto-incremental index and the SHA1 of kernel name (since C++ template demangled name can easily exceeds Linux folder name limitation). Within each folder, there are:

  • process.log: the operation log of the Probe Engine.
  • original.bin and original.asm: the raw machine code and assembly dumped by the Hook Driver. Be aware, they can be large!
  • pruned.bin and pruned.asm: the pruned machine code and assembly to understand the original code.
  • probed.bin and probed.asm: the probed machine code (to be loaded back and execute) and assembly for troubleshooting.