Tracing DSL and JIT
Neutrino IR
⚠️ Currently only support operations in 64bit registers, ONLY ld/st
supports 32bit saving for more efficient memory operation.
⚠️ This is not finalized and is up to further modfication
Neutrino IR's design is inspired by the eBPF ISA, but we take a separation of output/input as most target GPU ISA has separated input/output.
For example, add
in eBPF ISA is add dst, src;;
aka dst += src
but in Neutrino IR it is add out, in1, in2
aka out = in1 + in2
(you can also use add dst, dst, src
for a similar semantic like eBPF ISA).
Currently, as we works on Python, we encode every instruction to be list[str]
where first item is instruction name and operands followed by, i.e., no binary format.
Special Operands
Other than standard registers, we plan to have following operands for better value profiling
Special Operand | Description | NVIDIA PTX | AMD GCNAsm |
---|---|---|---|
dst | will be replaced by destination | ✅ | ✅ |
src | will be replaced by source | ✅ | ✅ |
out | will be replaced by output (mostly 1st operand) | ✅ | ✅ |
in1 | will be replaced by 1st input | ✅ | ✅ |
in2 | will be replaced by 2nd input | ✅ | ✅ |
in3 | will be replaced by 3rd input | ✅ | ✅ |
bytes | will be replaced by inst width | ✅ (only ld/st/cp) |
ALU Instructions
Instruction | Description | NVIDIA PTX | AMD GCNAsm |
---|---|---|---|
add, out, in1, in2 | out = in1 + in2 | ✅ | ✅ |
sub, out, in1, in2 | out = in1 - in2 | ✅ | ✅ |
mul, out, in1, in2 | out = in1 * in2 | ✅ | |
div, out, in1, in2 | out = in1 / in2 | ✅ | |
mod, out, in1, in2 | out = in1 % in2 | ✅ | |
lsh, out, in1, in2 | out = in1 << in2 | ✅ | |
rsh, out, in1, in2 | out = in1 >> in2 | ✅ | |
and, out, in1, in2 | out = in1 and in2 | ||
or, out, in1, in2 | out = in1 or in2 | ||
xor, out, in1, in2 | out = in1 ^ in2 |
TODO: Support add32 kind of 32bit ALU instructions
Memory Instructions
Due to stricter alignment requirements on GPU, we support limited memory instructions compared with standard eBPF semantics:
Instruction | Description | NVIDIA PTX | AMD GCNAsm |
---|---|---|---|
stw, addr, reg | (u32)addr=reg | ✅ | ✅ |
stdw, addr, reg | (u64)addr=reg | ✅ | ✅ |
ldw, addr, reg | reg=(u64)addr | ||
lddw, addr, reg | reg=(u64)addr |
Vectorized loading may be automatically (and implicitly) applied if backend find continuous saving opportunity.
Other Instructions
We support many other kind of instructions for profiling usage:
Instruction | Description | NVIDIA PTX | AMD GCNAsm |
---|---|---|---|
mov, out, in | out = in | ✅ | ✅ |
clock, out | out = current clock | ✅ | |
time, out | out = current time | ✅ | |
cuid, out | out = compute uint id | ✅(smid) |
We may also add supports for threadIdx
and blockIdx
.
Branch Instructions
Currently we don’t support branch instructions (like the early stage of eBPF) as existing security verifier is not complete enough for safe branching.