Tracing DSL and JIT

Neutrino IR

warning

⚠️ Currently only support operations in 64bit registers, ONLY ld/st supports 32bit saving for more efficient memory operation.

⚠️ This is not finalized and is up to further modfication

Neutrino IR's design is inspired by the eBPF ISA, but we take a separation of output/input as most target GPU ISA has separated input/output.

For example, add in eBPF ISA is add dst, src;; aka dst += src but in Neutrino IR it is add out, in1, in2 aka out = in1 + in2 (you can also use add dst, dst, src for a similar semantic like eBPF ISA).

Currently, as we works on Python, we encode every instruction to be list[str] where first item is instruction name and operands followed by, i.e., no binary format.

Special Operands

Other than standard registers, we plan to have following operands for better value profiling

Special Operand	Description	NVIDIA PTX	AMD GCNAsm
`dst`	will be replaced by destination	✅	✅
`src`	will be replaced by source	✅	✅
`out`	will be replaced by output (mostly 1st operand)	✅	✅
`in1`	will be replaced by 1st input	✅	✅
`in2`	will be replaced by 2nd input	✅	✅
`in3`	will be replaced by 3rd input	✅	✅
`bytes`	will be replaced by inst width	✅ (only ld/st/cp)

ALU Instructions

Instruction	Description	NVIDIA PTX	AMD GCNAsm
`add, out, in1, in2`	out = in1 + in2	✅	✅
`sub, out, in1, in2`	out = in1 - in2	✅	✅
`mul, out, in1, in2`	out = in1 * in2	✅
`div, out, in1, in2`	out = in1 / in2	✅
`mod, out, in1, in2`	out = in1 % in2	✅
`lsh, out, in1, in2`	`out = in1 << in2`	✅
`rsh, out, in1, in2`	`out = in1 >> in2`	✅
`and, out, in1, in2`	`out = in1 and in2`
`or, out, in1, in2`	`out = in1 or in2`
`xor, out, in1, in2`	`out = in1 ^ in2`

TODO: Support add32 kind of 32bit ALU instructions

Memory Instructions

Due to stricter alignment requirements on GPU, we support limited memory instructions compared with standard eBPF semantics:

Instruction	Description	NVIDIA PTX	AMD GCNAsm
`stw, addr, reg`	(u32)addr=reg	✅	✅
`stdw, addr, reg`	(u64)addr=reg	✅	✅
`ldw, addr, reg`	reg=(u64)addr
`lddw, addr, reg`	reg=(u64)addr

Vectorized loading may be automatically (and implicitly) applied if backend find continuous saving opportunity.

Other Instructions

We support many other kind of instructions for profiling usage:

Instruction	Description	NVIDIA PTX	AMD GCNAsm
`mov, out, in`	out = in	✅	✅
`clock, out`	out = current clock	✅
`time, out`	out = current time	✅
`cuid, out`	out = compute uint id	✅(smid)

We may also add supports for threadIdx and blockIdx.

Branch Instructions

Currently we don’t support branch instructions (like the early stage of eBPF) as existing security verifier is not complete enough for safe branching.

Neutrino IR​

Special Operands​

ALU Instructions​

Memory Instructions​

Other Instructions​

Branch Instructions​