Reading Traces
As a comprehensive system, traces in Neutrino is not only a file but a system.
Trace Folder Structure
NEUTRINO's traces will be placed in folders, commonly like trace/Apr24_231539_1860576/
, of:
- A parent folder, default to be
./trace
and is configurable via--tracedir
option of NEUTRINO CLI. - Traces of each process or run will be placed in a subfolder whose name follows:
<DATE>_<TIME>_<PID>
. For example,Apr24_231539_1860576
means a trace recorded at 23:15:39 of April 24 (local timezone) from process 1860576.
Each trace folder are commonly structured as:
At minimum, there are two files, event.log
of Hook Driver internal logs for troubleshooting and probe.toml
of compiled assembly probes for reproducing experiments.
There are also two folders, result
containing .bin
s of trace results
and kernel
contains kernel captured and jitted.
Trace Structure
Analysis Codegen Supported!
NEUTRINO now supports code generation for parsing traces into Python. You probably don't need to handle trace parsing manually!
Each trace (.bin
in result/
) is structured in ELF-like manner, starting with a header,
typedef struct {
uint32_t gridDimX;
uint32_t gridDimY;
uint32_t gridDimZ;
uint32_t blockDimX;
uint32_t blockDimY;
uint32_t blockDimZ;
uint32_t sharedMemBytes;
uint32_t numMaps; // number of traces exposed
} trace_header_t;
followed by several sections (each for a map), and raw binaries whose offset corresponds to section definitions:
typedef struct {
uint32_t size; // size of record per thread/warp in byte
uint32_t warpDiv; // warpSize for warp-level, 1 for thread-level
uint64_t offset; // offset for fseek
} trace_section_t;
By fseek
to section.offset
and fread(gridSize * blockSize * section.size)
, one can safely read out the traces in CPU for analysis.
Event Log Structure
event.log
is the operation logs of the Hook Driver, and many functionalities, such as measuring probe overhead or troubleshooting, relies on parsing this log, like the following:
[init] pid 3183093
[init] cmd 33 python unittest/kernel/torch_.py �#
[init] dl 0x13f9690
[init] success
[mem] cuMemAlloc_v2 0 dptr 7f1292000000 bytesize 33554432
[mod] cuLibraryLoadData 0 lib 0x7ce5c60 code 0x7f1414279a28
[bin] type warpped_fatbin size 4962432
[mod] cuLibraryGetModule 0 mod 0x7861540 lib 0x7ce5c60
[mod] cuModuleGetFunction func 0x8209510 mod 0x7861540 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_
[exec] funcmap-not-find 0x8209510
[probe] find 0x8209510 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ bin 0x7f1343b41010 size 4962432
[probe] rename _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ 0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6
[probe] mkdir ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6
[probe] write ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6/original.bin
[probe] subproc /home/huangs0/anaconda3/bin/python /home/huangs0/anaconda3/lib/python3.11/site-packages/neutrino/probe/cuda.py ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6 _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_
[probe] python succeed
[probe] read ./trace/Jun23_121812_3183093/kernel/0_bf524e8c48fdebc3962c3b36823d8d68b093ebb6/kernel.info name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ n_param 3 n_probe 1
[probe] finish 0x8209510 name _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c104HalfEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_ n_param 3
[exec] funcmap-find 0x8209510 success
[exec] 1750652293770568192 param 1000000 10000000000 7f1292000000
[exec] grid 32768 1 1 block 128 1 1 shared 0
[exec] grid 32768 warp 4 probe 16 total 2097152
[exec] probe-mem 2097152 (bytes)
[exec] succeed 0
[exec] save ./trace/Jun23_121812_3183093/result/0.611403.bin size 2097200
[exec] prologue 234.087418 kernel 0.065536 epilogue 2.705856 ratio 3614.178711
[callback] subproc /home/huangs0/anaconda3/bin/python neutrino/tools/block_sched_callback.py ./trace/Jun23_121812_3183093/result/0.611403.bin
[callback] succeed
From top to down, there are several important section:
[init]
: messages from hook driver initialization, includes thepid
andcmd
(important for reproducing experiment!), as well asdl
address loaded and the init success for troubleshooting.[mod]
: messages from module management, recording theCU/hipModule
s loaded to the system.[probe]
: interaction with the probe engine, records the subproc creation for probe engine and loads back.[exec]
: kernel execution logs, with kernel pointer, parameters, probe map allocations, execution status, path of saved traces, time spending, etc.
Kernel Folder Structure
Each folder inside kernel
corresponds to a kernel, named by an auto-incremental index and the SHA1 of kernel name (since C++ template demangled name can easily exceeds Linux folder name limitation).
Within each folder, there are:
process.log
: the operation log of the Probe Engine.original.bin
andoriginal.asm
: the raw machine code and assembly dumped by the Hook Driver. Be aware, they can be large!pruned.bin
andpruned.asm
: the pruned machine code and assembly to understand the original code.probed.bin
andprobed.asm
: the probed machine code (to be loaded back and execute) and assembly for troubleshooting.