logoNeutrino

Limitation and Roadmap

Call for contributors!

As a kick-off open-source project, Neutrino still faces many critical challenges and limitations, awaiting for talents like you to contribute! Should you be interested, please raise an issue / pull requests in the Github!

Roadmap

Tracks milestones achieved and is going to achieve.

Hook Driver

  • Platform Support
    • NVIDIA CUDA
    • AMD ROCm
    • Intel oneAPI
  • Features
    • Trace file system: event.log, kernel/, result/, probe.toml
    • Internal storage for binary and function
    • Benchmark mode (--benchmark) to measure probes performance impact
    • Static Datamodel, e.g., warp:16:1 supporting parallel saving
    • Dynamic Datamodel, e.g., warp:16:count supporting runtime determined probe size
    • Callback: callback='dmat.py' supporting runtime trace analysis
    • Platform-independence in trace file system and trace format
    • Support --memusage to measure the maximum memory used in profiling
    • Support multi-threading with mutex for thread-safety
    • Isolation of storage and probe supporting probes sharing one storage like eBPF map
    • Structured kernel metadata via JSON

Probe Engine

  • Platform Support
    • NVIDIA PTX
    • AMD GCMAsm CDNA
    • AMD GCNAsm RDNA
    • Intel VISA
  • Features
    • Reading runtime operands: out, in1, in2, in3, addr
    • Using device-side clock and time
    • Automatically directing buffer for thread/warp
    • Supporting count kernel for dynamic datamodel
    • Collecting assembler metadata like no.register
    • Supporting runtime security verification
    • Supporting kernel filtering (--kernel/--filter)
    • Making core modules platform-agnostic

DSL and JITTER

  • Backend Support
    • NVIDIA PTX
    • AMD GCMAsm CDNA
    • AMD GCNAsm RDNA
    • Intel VISA
  • Features
    • Supporting Python frontend via ast
    • Supporting runtime operands and lowering: out, in1, in2, in3, addr
    • Supporting device-side clock and time: clock(), time(), cuid()
    • Formalizing eBPF-like ISA.
    • Supporting 32bit register (eBPF add32 etc).
    • Migrating eBPF Verifier for Neutrino
    • Supporting if/else/elif
    • Supporting for/while under strict security verification

Utilities and Extensions

  • Getting trace directory handle for interoperability with hook driver
  • Tensor Trace: getting tensor shape and name from PyTorch
  • NVTX-like source annotation API
  • Integration with CUPTI / ROCTracer
  • Integration with PyTorch/JAX built-in profiler

Limitation

NEUTRINO mainly has two inherent drawbacks:

  1. Neutrino cannot profile unprogrammable hardware events such as the cache miss but DMAT can be used for simulation.
  2. Neutrino profiling is based on exeuction so it is hard to profile stall cycles due to instruction-level scheduling.