Limitation and Roadmap
info
We are calling for contributors! Please contact us via Github if you're interested in any features!
As a kick-off open-source project, Neutrino still faces many critical challenges and limitations, compared with mature projects like eBPF, awaiting for talents like you to contribute!
Limitation
NEUTRINO mainly has two inherent drawbacks:
- Neutrino cannot profile unprogrammable hardware events such as the cache miss but DMAT can be used for simulation.
- Neutrino profiling is based on exeuction so it is hard to profile stall cycles due to instruction-level scheduling.
Roadmap
Tracks milestones achieved and is going to achieve.
Hook Driver
- Platform Support
- NVIDIA CUDA
- AMD ROCm
- Apple Metal
- Intel oneAPI
- Features
- Trace file system:
event.log
,kernel/
,result/
- Internal storage for binary and function
- Benchmark mode to measure
- Static Datamodel, e.g.,
warp:16:1
supporting parallel saving - Dynamic Datamodel, e.g.,
warp:16:count
supporting runtime determined probe size - Callback:
callback='dmat.py'
supporting runtime trace analysis - Platform-independence in trace file system and trace format
- Support
--memusage
to measure the maximum memory used in profiling - Support "--benchmark" to measure probing effect to kernel performance
- Support multi-threading via mutex for thread-safety
- Isolation of storage and probe supporting probes sharing one storage like eBPF map
- Structured kernel metadata via JSON
- Trace file system:
Probe Engine
- Platform Support
- NVIDIA PTX
- AMD GCMAsm CDNA
- AMD GCNAsm RDNA
- Apple AIR
- Intel VISA
- Features
- Reading runtime operands:
out
,in1
,in2
,in3
,addr
- Using device-side clock and time
- Automatically directing buffer for thread/warp
- Supporting
count
kernel for dynamic datamodel - Collecting assembler metadata like
no.register
- Supporting runtime security verification
- Supporting kernel filtering (
--kernel
/--filter
) - Making core modules platform-agnostic
- Reading runtime operands:
DSL and JITTER
- Backend Support
- NVIDIA PTX
- AMD GCMAsm CDNA
- AMD GCNAsm RDNA
- Apple AIR
- Intel VISA
- Features
- Supporting Python frontend via
ast
- Supporting runtime operands and lowering:
out
,in1
,in2
,in3
,addr
- Supporting device-side clock and time:
clock()
,time()
,cuid()
- Formalizing eBPF-like ISA, separating
out
fromdst
- Supporting 32bit register (eBPF
add32
etc). - Migrating eBPF Verifier for Neutrino
- Supporting
if/else/elif
- Supporting
for/while
under strict security verification
- Supporting Python frontend via
Utilities and Extensions
- Getting trace directory handle for interoperability with hook driver
- Tensor Trace: getting tensor shape and name from PyTorch
- NVTX-like source annotation API
- Integration with CUPTI / ROCTracer
- Integration with PyTorch/JAX built-in profiler