Installation¶
This guide covers all installation options for PyDotCompute.
Requirements¶
- Python: 3.11 or higher
- Operating System: Linux, macOS, or Windows
- GPU (optional):
- NVIDIA GPU with CUDA 12.x for CUDA acceleration
- Apple Silicon (M1/M2/M3/M4) for Metal acceleration (macOS only)
Installation Options¶
Basic Installation (CPU only)¶
For development, testing, or systems without NVIDIA GPUs:
This installs the core package with CPU backend support. All features work, but GPU kernels run on CPU for simulation.
With Performance Optimizations (Recommended)¶
For optimal message latency on Linux/macOS:
This includes uvloop, which provides:
- 21μs message latency (p50)
- 20-40% faster event loop performance
- Automatic installation at import time
uvloop Auto-Installation
PyDotCompute automatically installs uvloop when you import pydotcompute.ring_kernels. No manual setup required!
With CUDA Support¶
For full GPU acceleration:
This includes:
- CuPy: GPU array library (NumPy-compatible)
- Numba: JIT compiler with CUDA support
- pynvml: NVIDIA GPU monitoring
CUDA Version
The cuda extra installs cupy-cuda12x. If you need a different CUDA version, install CuPy manually:
With Metal Support (macOS)¶
For Apple Silicon GPU acceleration on macOS:
This includes:
- MLX: Apple's machine learning framework for Metal
Apple Silicon Only
Metal support requires macOS with Apple Silicon (M1, M2, M3, M4 chips). Intel Macs are not supported for Metal acceleration.
With Cython Extensions (Maximum Performance)¶
For multi-process scenarios requiring ultimate performance:
This provides:
- 0.33μs queue operations (vs 1.8μs for pure Python)
- Lock-free SPSC queues
- Best for multi-process IPC scenarios
Combined Installation¶
For full GPU acceleration with performance optimizations:
Development Installation¶
For contributing to PyDotCompute:
Development dependencies include:
- pytest: Testing framework
- pytest-asyncio: Async test support
- mypy: Static type checking
- ruff: Linting and formatting
Documentation Build¶
To build the documentation locally:
Verifying Installation¶
Check Basic Installation¶
Check CUDA Availability¶
from pydotcompute.core.accelerator import cuda_available, get_accelerator
print(f"CUDA available: {cuda_available()}")
acc = get_accelerator()
print(f"Devices: {acc.device_count}")
for device in acc.devices:
print(f" - {device.name} ({device.device_type.name})")
Run a Quick Test¶
import asyncio
from pydotcompute import RingKernelRuntime, ring_kernel, message
@message
class Ping:
value: int = 0
@ring_kernel(kernel_id="echo", auto_register=False)
async def echo_actor(ctx):
while not ctx.should_terminate:
try:
msg = await ctx.receive(timeout=0.1)
await ctx.send(msg)
except:
continue
async def test():
async with RingKernelRuntime() as runtime:
await runtime.launch("echo", echo_actor)
await runtime.activate("echo")
await asyncio.sleep(0.1) # Let actor start
await runtime.send("echo", Ping(value=42))
response = await runtime.receive("echo", timeout=1.0)
print(f"Echo test: {response.value == 42}")
return response.value == 42
result = asyncio.run(test())
print(f"Installation verified: {result}")
Troubleshooting¶
ImportError: No module named 'cupy'¶
CuPy is not installed. Install it with:
CUDA driver version is insufficient¶
Your NVIDIA driver is too old. Update your GPU drivers from NVIDIA's website.
No CUDA-capable device is detected¶
- Verify GPU with
nvidia-smi - Check CUDA installation with
nvcc --version - Ensure the GPU is visible to Python
Tests fail with "CUDA not available"¶
This is expected on systems without NVIDIA GPUs. Tests automatically skip CUDA-specific tests.
Optional Dependencies¶
| Package | Purpose | Install |
|---|---|---|
uvloop |
Fast event loop (Linux/macOS) | pip install uvloop |
cupy-cuda12x |
GPU arrays | pip install cupy-cuda12x |
numba |
CUDA JIT compiler | pip install numba |
pynvml |
GPU monitoring | pip install pynvml |
cython |
Maximum performance queues | pip install cython |
pytest |
Testing | pip install pytest |
mypy |
Type checking | pip install mypy |
Disabling uvloop¶
If you need to disable uvloop auto-installation (e.g., for debugging or compatibility):
Or set it in your Python code before importing:
import os
os.environ["PYDOTCOMPUTE_NO_UVLOOP"] = "1"
from pydotcompute import RingKernelRuntime # uvloop NOT installed
Performance Tiers¶
PyDotCompute offers three performance tiers:
| Tier | Implementation | Latency (p50) | Use Case |
|---|---|---|---|
| 1 (Default) | uvloop + FastMessageQueue | 21μs | Async Python code |
| 2 | ThreadedRingKernel | ~100μs | Blocking I/O, C extensions |
| 3 | CythonRingKernel | 0.33μs queue ops | Multi-process IPC |
See the Performance Tiers Guide for detailed usage.
Next Steps¶
- Quick Start: Get up and running
- First Ring Kernel: Build your first actor
- Performance Tiers: Choose the right tier