Highlights
Programmability
Fine-Granularity
Versatility
Easy-to-use for Performance Engineers.
1
Write the probe.
Define contexted registers, Maps and Probes in Tracing DSL.
from neutrino import probe, Map
import neutrino.language as nl
# declare maps via decorated class for persistence
@Map(level="warp", type="array", size=8, cap=1)
class block_sched:
start: nl.u64
# declare probe registers shared across probes
start: nl.u64 = 0 # starting clock
# declare probe via decorated function
@probe(pos="kernel", level="warp", before=True)
def thread_start():
start = nl.clock()
2
Run it.
Apply probes to your workload with simple CLI.
Terminal
neutrino -p probe.py python -c "import torch; torch.zeros((4096, 4096), dtype=torch.float16)"◆-- Trace Saving --◆[info] trace in ./trace/Apr24_231539_1860576 ◆-- Trace Analysis --◆vectorized_elementwise:No.block:32768 Exec:680869 Sched:142674 (cycle/SM)
3
Analyze the Trace.
Easily reading traces with auto-generated code for analysis.
import struct
from neutrino import TraceHeader, TraceSection
class block_sched(NamedTuple):
start: int
def parse(path: str):
with open(path, "rb") as f:
header: TraceHeader = TraceHeader(struct.unpack("iiiiiiii", f.read(32)))
sections: List[TraceSection] = []
for _ in range(header.numProbes):)
event.log
4
Share it.
Share your probes with community via Github Issues or Gists.
Compatible with Most Ecosystem.
Hardware Compatibility
Works fluently on commonly used hardwares.
Platform | Support |
---|---|
NVIDIA/CUDA | ✅ Fully Supported |
AMD/ROCm | ✅ Supported on CDNA |
Intel/oneAPI | 🚀 Planning |
More to Come! | Raise Github Issue if you need! |
Software Compatibility
Integrated seamlessly with ecosystems
Platform | Support |
---|---|
PyTorch (and everything on top) | ✅ Supported (with custom build) |
Triton | ✅ Supported |
JAX | ✅ Supported (with envariable) |
More to Come! | Raise Github Issue if you need! |
Hackable for your need.
Designed with Extensibility
An approachable framework.
Neutrino consists of three components: Entry & Compiler, Hook Driver, and the Probe Engine. All can be easily extended.
Hook Driver
Hook driver captures driver call (load & launch) to provide runtime support, such as caching code(assembly) loaded.
Probe Engine
Probe engine extracts, prunes, probes and reassembles the GPU assembly from hook driver with probes from entry.
Demos Available at Simple Clicks.
Fully Open-Sourced and Evaluated
- Battery guaranteed.Actively maintained, open for contributions.
- Fully open-source.Open source, available on Github.
- Truly Collaborative.Share your probe via Issues or Gists.
- Read docsCheck codes