logoNeutrino

Command-Line Interface

You can find the code in neutrino/cli.py.

Structure

neutrino CLI is a standard Python argparser program that:

  1. Reads command-line arguments, particularly -p/--probe for probes and REMAINDER for user programs.
  2. Invokes DSL compiler if -p/--probe is Pythonic Tracing DSL (.py).
  3. Set the environment variables for Hook Driver and Probe Engine.
  4. folk and exec the user program.

Envariables set by Neutrino CLI

Core functionality:

  • LD_PRELOAD: inject preload.so to client program for dynamic loading.
  • LD_LIBRARY: inject hook driver for static linking.
  • NEUTRINO_PROBES: probes in TOML format (dumped to strings).
  • NEUTRINO_REAL_DRIVER: Path to the real driver.
  • NEUTRINO_DRIVER_NAME: Name of the driver, used by preload.so
  • NEUTRINO_HOOK_DRIVER: Path to the hook driver.
  • NEUTRINO_PYTHON: Python executable path, avoid exec not found.
  • NEUTRINO_PROBING_PY: path to Probe Engine (cuda.py or rocm.py).

Utilities:

  • NEUTRINO_FILTER and NEUTRINO_KERNEL: filter out/in kernel by name, see below.
  • NEUTRINO_TRACEDIR: parent folder of traces.
  • NEUTRINO_BENCHMARK: enable benchmarking mode.
  • NEUTRINO_MEMUSAGE: enable memory usage measuring mode.
  • NEUTRINO_DYNAMIC: enable dynamic mode.
  • TRITON_LIBHIP_PATH: a fix for Triton.

Reference

neutrino CLI expose following options:

  • --tracedir TRACEDIR: Specifying parent folder of traces (default: ./trace), see Hook Driver
  • --driver DRIVER: Specify Path to the real CUDA/HIP driver shared library. Default value is in config.toml, see building.
  • --python PYTHON: path to python executable used. Default value is in config.toml, see building.
  • --filter FILTER: filter OUT buggy kernels by (part of) name, multiple values shall be split by :
  • --kernel KERNEL: filter IN the kernel by (part of) name, multiple values shall be split by :
  • --memusage: measure the memory usage only, helpful to prevent OOM in profiling (default: False)
  • --benchmark: enable benchmark mode to evaluate overhead w.r.t. the original kernel (default: False)

--memusage will not execute the real profiling. --benchmark will not save the traces to disk and will not analyze the traces