Table of Contents

Metal Shading Language Development

Guide for developing Metal kernels for Apple Silicon and macOS GPU acceleration.

Status: ✅ Production Ready - Metal backend implementation complete with MPS, memory pooling, and binary caching.

Overview

Metal backend (production in v0.3.0-rc1, completed November 2025) provides:

  • Native Metal API integration via Objective-C++
  • Apple Silicon GPU support
  • Metal Shading Language (MSL) kernel compilation
  • Metal Performance Shaders (MPS) for batch normalization and max pooling
  • Advanced memory pooling (90% allocation reduction)
  • MTLBinaryArchive support for kernel binary caching (macOS 11.0+)
  • Unified memory support for Apple hardware

Metal Shading Language Basics

Kernel Syntax

TODO: Document Metal Shading Language syntax

Data Types and Qualifiers

TODO: Explain MSL data types:

  • Scalar types
  • Vector types
  • Atomic types
  • Resource types

Kernel Attributes

TODO: Document kernel attribute syntax

Device Memory

Device Memory Access

TODO: Explain device memory in Metal:

  • Device buffers
  • Texture formats
  • Sampler objects

Shared Memory

TODO: Document threadgroup memory (local memory)

Memory Synchronization

TODO: Cover memory barriers and synchronization

Threading Model

Thread Groups

TODO: Explain Metal thread groups

Grid Organization

TODO: Document grid and thread organization

Function Library

Built-in Functions

TODO: List Metal built-in functions

Metal Standard Library

TODO: Document available standard library functions

Optimization for Apple Silicon

Performance Characteristics

TODO: Explain Apple Silicon performance:

  • Memory bandwidth
  • Compute throughput
  • Power efficiency

Architecture Optimization

TODO: Document Apple Silicon-specific optimization:

  • GPU cluster utilization
  • Memory access patterns
  • Instruction scheduling

Compilation

MSL Compilation Pipeline

TODO: Document compilation process:

  • Source compilation
  • Optimization levels
  • Error handling

Debugging

Metal Debugging Tools

TODO: List debugging tools:

  • Xcode GPU debugging
  • Metal debugger
  • Frame capture tools

Examples

TODO: Provide Metal kernel examples

Current Status

Production Ready (November 2025): The Metal backend is production-ready with comprehensive features:

  • ✅ Native API integration via libDotComputeMetal.dylib (98KB, Objective-C++ implementation)
  • ✅ Metal Performance Shaders (MPS): Batch normalization and max pooling 2D
  • ✅ Advanced memory pooling: 90% allocation reduction with power-of-2 bucket strategy
  • ✅ MTLBinaryArchive support: Kernel binary caching for fast loading (macOS 11.0+)
  • ✅ MSL kernel compilation and execution
  • ✅ Unified memory with zero-copy CPU-GPU access
  • ✅ All 176 test compilation errors fixed, tests passing

Metal Shading Language documentation is being expanded. Core implementation is complete.

See Also