WGSL Code Generation
The ringkernel-wgpu-codegen crate provides a Rust-to-WGSL transpiler that lets you write GPU kernels in a Rust DSL and generate equivalent WGSL (WebGPU Shading Language) code.
Overview
The transpiler provides full feature parity with ringkernel-cuda-codegen, supporting the same three kernel types:
- Global Kernels - Standard
@computeentry points - Stencil Kernels - Tile-based kernels with
GridPosabstraction - Ring Kernels - Persistent actor kernels (host-driven emulation)
The same Rust DSL code works with both CUDA and WGSL backends—just change the import.
Global Kernels
Generic compute shaders without stencil-specific patterns. Use DSL functions for thread/workgroup indices.
use ringkernel_wgpu_codegen::transpile_global_kernel;
use syn::parse_quote;
let kernel: syn::ItemFn = parse_quote! {
fn saxpy(x: &[f32], y: &mut [f32], a: f32, n: i32) {
let idx = block_idx_x() * block_dim_x() + thread_idx_x();
if idx >= n { return; }
y[idx as usize] = a * x[idx as usize] + y[idx as usize];
}
};
let wgsl_code = transpile_global_kernel(&kernel)?;
Generated WGSL:
@group(0) @binding(0) var<storage, read> x: array<f32>;
@group(0) @binding(1) var<storage, read_write> y: array<f32>;
@group(0) @binding(2) var<storage, read> a: f32;
@group(0) @binding(3) var<storage, read> n: i32;
@compute @workgroup_size(256, 1, 1)
fn saxpy(
@builtin(local_invocation_id) local_invocation_id: vec3<u32>,
@builtin(workgroup_id) workgroup_id: vec3<u32>,
@builtin(global_invocation_id) global_invocation_id: vec3<u32>,
@builtin(num_workgroups) num_workgroups: vec3<u32>
) {
var idx: i32 = i32(workgroup_id.x) * i32(256u) + i32(local_invocation_id.x);
if (idx >= n) { return; }
y[idx] = a * x[idx] + y[idx];
}
Stencil Kernels
For stencil computations (e.g., FDTD, convolutions), use the GridPos abstraction:
use ringkernel_wgpu_codegen::{transpile_stencil_kernel, StencilConfig};
let kernel: syn::ItemFn = parse_quote! {
fn fdtd(p: &[f32], p_prev: &mut [f32], c2: f32, pos: GridPos) {
let lap = pos.north(p) + pos.south(p) + pos.east(p) + pos.west(p)
- 4.0 * p[pos.idx()];
p_prev[pos.idx()] = 2.0 * p[pos.idx()] - p_prev[pos.idx()] + c2 * lap;
}
};
let config = StencilConfig::new("fdtd")
.with_tile_size(16, 16)
.with_halo(1);
let wgsl_code = transpile_stencil_kernel(&kernel, &config)?;
Stencil Intrinsics:
pos.idx()- Current cell linear indexpos.north(buf)- Access cell above (buf[idx - buffer_width])pos.south(buf)- Access cell below (buf[idx + buffer_width])pos.east(buf)- Access cell to the right (buf[idx + 1])pos.west(buf)- Access cell to the left (buf[idx - 1])pos.at(buf, dx, dy)- Access cell at relative offset
Ring Kernels
Persistent GPU kernels that process messages in a loop.
Important: WebGPU does not support true persistent kernels. Ring kernels are emulated using host-driven dispatch loops—the host re-dispatches the kernel until termination.
use ringkernel_wgpu_codegen::{transpile_ring_kernel, RingKernelConfig};
let handler: syn::ItemFn = parse_quote! {
fn process(ctx: &RingContext, msg: &Request) -> Response {
let tid = ctx.global_thread_id();
ctx.sync_threads();
let result = msg.value * 2.0;
Response { value: result, id: tid as u64 }
}
};
let config = RingKernelConfig::new("processor")
.with_workgroup_size(256)
.with_hlc(true);
// Note: K2K is NOT supported in WGPU
let wgsl_code = transpile_ring_kernel(&handler, &config)?;
RingKernelConfig Options
| Option | Default | Description |
|---|---|---|
workgroup_size |
256 | Threads per workgroup |
enable_hlc |
false | Enable Hybrid Logical Clocks |
enable_k2k |
false | NOT SUPPORTED - will return error |
max_messages_per_dispatch |
1024 | Messages processed per dispatch |
K2K Limitation
Kernel-to-kernel (K2K) messaging is not supported in WGPU due to WebGPU’s execution model. Attempting to enable K2K will return an error:
let config = RingKernelConfig::new("processor")
.with_k2k(true); // This will error!
let result = transpile_ring_kernel(&handler, &config);
assert!(result.is_err());
// Error: "Kernel-to-kernel (K2K) messaging is not supported in WGPU"
Use host-mediated messaging instead for cross-kernel communication.
DSL Features
Control Flow
// If/else
if condition {
do_something();
} else {
do_other();
}
// Match (transpiles to switch/case)
match value {
0 => handle_zero(),
1 => handle_one(),
_ => handle_default(),
}
// Early return
if idx >= n { return; }
Loops
// For loops (range patterns)
for i in 0..n {
data[i] = 0.0;
}
// Inclusive range
for i in 0..=n {
process(i);
}
// While loops
while condition {
iterate();
}
// Infinite loops
loop {
if done { break; }
continue;
}
Note: Loops are forbidden in Stencil validation mode but allowed in Generic and RingKernel modes.
Shared Memory (Workgroup Variables)
use ringkernel_wgpu_codegen::{SharedTile, SharedArray};
// 2D tile: var<workgroup> tile: array<array<f32, 16>, 16>;
let tile: SharedTile<f32, 16, 16>;
// 1D array: var<workgroup> cache: array<f32, 256>;
let cache: SharedArray<f32, 256>;
Struct Literals
// Rust struct literal
let response = Response { value: result, id: tid as u64 };
// Transpiles to WGSL constructor:
// Response(result, u64(tid))
Type Mapping
| Rust Type | WGSL Type | Notes |
|---|---|---|
f32 |
f32 |
Direct mapping |
f64 |
f32 |
Warning: Downcast (WGSL has no f64) |
i32 |
i32 |
Direct mapping |
u32 |
u32 |
Direct mapping |
i64 |
vec2<i32> |
Emulated as lo/hi pair |
u64 |
vec2<u32> |
Emulated as lo/hi pair |
bool |
bool |
Direct mapping |
usize |
u32 |
WGSL uses 32-bit addressing |
&[T] |
array<T> (storage, read) |
Storage buffer |
&mut [T] |
array<T> (storage, read_write) |
Storage buffer |
GPU Intrinsics
Thread/Workgroup Indices
| Rust DSL | WGSL |
|---|---|
thread_idx_x() |
local_invocation_id.x |
thread_idx_y() |
local_invocation_id.y |
thread_idx_z() |
local_invocation_id.z |
block_idx_x() |
workgroup_id.x |
block_idx_y() |
workgroup_id.y |
block_idx_z() |
workgroup_id.z |
block_dim_x() |
WORKGROUP_SIZE_X (constant) |
global_thread_id() |
global_invocation_id.x |
grid_dim_x() |
num_workgroups.x |
Synchronization
| Rust DSL | WGSL |
|---|---|
sync_threads() |
workgroupBarrier() |
thread_fence() |
storageBarrier() |
thread_fence_block() |
workgroupBarrier() |
Atomics
| Rust DSL | WGSL |
|---|---|
atomic_add(ptr, val) |
atomicAdd(ptr, val) |
atomic_sub(ptr, val) |
atomicSub(ptr, val) |
atomic_min(ptr, val) |
atomicMin(ptr, val) |
atomic_max(ptr, val) |
atomicMax(ptr, val) |
atomic_exchange(ptr, val) |
atomicExchange(ptr, val) |
atomic_cas(ptr, cmp, val) |
atomicCompareExchangeWeak(ptr, cmp, val) |
atomic_load(ptr) |
atomicLoad(ptr) |
atomic_store(ptr, val) |
atomicStore(ptr, val) |
Math Functions
| Rust DSL | WGSL |
|---|---|
sqrt(x) |
sqrt(x) |
rsqrt(x) |
inverseSqrt(x) |
abs(x) |
abs(x) |
floor(x) |
floor(x) |
ceil(x) |
ceil(x) |
round(x) |
round(x) |
sin(x) |
sin(x) |
cos(x) |
cos(x) |
tan(x) |
tan(x) |
exp(x) |
exp(x) |
log(x) |
log(x) |
powf(x, y) |
pow(x, y) |
min(a, b) |
min(a, b) |
max(a, b) |
max(a, b) |
clamp(x, lo, hi) |
clamp(x, lo, hi) |
fma(a, b, c) |
fma(a, b, c) |
mix(a, b, t) |
mix(a, b, t) |
Subgroup Operations (Warp Equivalents)
Note: These require the chromium_experimental_subgroups extension, which may not be available on all platforms.
| Rust DSL | WGSL |
|---|---|
warp_shuffle(val, lane) |
subgroupShuffle(val, lane) |
warp_shuffle_up(val, delta) |
subgroupShuffleUp(val, delta) |
warp_shuffle_down(val, delta) |
subgroupShuffleDown(val, delta) |
warp_shuffle_xor(val, mask) |
subgroupShuffleXor(val, mask) |
warp_ballot(pred) |
subgroupBallot(pred) |
warp_all(pred) |
subgroupAll(pred) |
warp_any(pred) |
subgroupAny(pred) |
lane_id() |
subgroup_invocation_id |
warp_size() |
subgroup_size |
64-bit Integer Emulation
WGSL 1.0 does not support 64-bit integers or atomics. The transpiler emulates them using lo/hi u32 pairs:
Type Representation
// u64 becomes vec2<u32> where:
// - x (lo) = lower 32 bits
// - y (hi) = upper 32 bits
Generated Helper Functions
The transpiler automatically includes these helpers for ring kernels:
// Read 64-bit value from atomic pair
fn read_u64(lo: ptr<storage, atomic<u32>, read_write>,
hi: ptr<storage, atomic<u32>, read_write>) -> vec2<u32> {
return vec2<u32>(atomicLoad(lo), atomicLoad(hi));
}
// Atomically increment 64-bit value
fn atomic_inc_u64(lo: ptr<storage, atomic<u32>, read_write>,
hi: ptr<storage, atomic<u32>, read_write>) {
let old_lo = atomicAdd(lo, 1u);
if (old_lo == 0xFFFFFFFFu) {
atomicAdd(hi, 1u);
}
}
// Compare two 64-bit values: returns -1, 0, or 1
fn compare_u64(a: vec2<u32>, b: vec2<u32>) -> i32 {
if (a.y > b.y) { return 1; }
if (a.y < b.y) { return -1; }
if (a.x > b.x) { return 1; }
if (a.x < b.x) { return -1; }
return 0;
}
Validation Modes
The transpiler uses validation modes to control what constructs are allowed:
| Mode | Loops | Description |
|---|---|---|
Stencil |
Forbidden | Classic stencil kernels (default for transpile_stencil_kernel) |
Generic |
Allowed | General-purpose kernels (default for transpile_global_kernel) |
RingKernel |
Required | Persistent actor kernels |
WGSL vs CUDA Differences
| Feature | CUDA | WGSL |
|---|---|---|
| Thread index | threadIdx.x |
local_invocation_id.x |
| Block/workgroup | blockIdx.x |
workgroup_id.x |
| Sync | __syncthreads() |
workgroupBarrier() |
| Shared memory | __shared__ |
var<workgroup> |
| 64-bit integers | Native | Emulated (vec2) |
| f64 | Native | Not supported |
| Persistent kernels | GPU loop | Host-driven dispatch |
| K2K messaging | Supported | Not supported |
Testing
The crate includes 50 tests covering all features:
cargo test -p ringkernel-wgpu-codegen
Test categories:
- Type mapping (primitives, slices, 64-bit emulation)
- Expression transpilation (binary ops, method calls, indexing)
- Intrinsics (thread indices, barriers, math)
- Loops (for/while/loop patterns)
- Validation (mode-specific constraints)