Struct CheckpointManager
pub struct CheckpointManager {
config: CheckpointConfig,
storage: Box<dyn CheckpointStorage>,
actors: HashMap<u32, (String, String)>,
last_snapshot: HashMap<u32, Instant>,
pending: HashMap<u64, PendingSnapshot>,
next_request_id: u64,
checkpoint_history: HashMap<u32, Vec<String>>,
total_completed: u64,
total_failed: u64,
}Expand description
Manages periodic checkpointing for persistent GPU actors.
The CheckpointManager orchestrates the checkpoint lifecycle:
- Periodically determines when a snapshot is due
- Issues
SnapshotRequests (caller sends as H2K commands) - Processes
SnapshotResponses (caller feeds from K2H responses) - Persists completed checkpoints to storage
- Enforces retention policy (deletes old checkpoints)
§Usage
use ringkernel_core::checkpoint::{CheckpointConfig, CheckpointManager};
use std::time::Duration;
let config = CheckpointConfig::new(Duration::from_secs(10))
.with_max_snapshots(3)
.with_storage_path("/tmp/checkpoints");
let mut manager = CheckpointManager::new(config);
manager.register_actor(0, "wave_sim_0", "fdtd_3d");
// In your poll loop:
for request in manager.poll_due_snapshots() {
// Send as H2K SnapshotActor command
h2k_queue.send(H2KMessage::snapshot_actor(
request.request_id,
request.actor_slot,
request.buffer_offset,
));
}
// When K2H SnapshotComplete arrives:
manager.complete_snapshot(SnapshotResponse { ... })?;Fields§
§config: CheckpointConfig§storage: Box<dyn CheckpointStorage>§actors: HashMap<u32, (String, String)>§last_snapshot: HashMap<u32, Instant>§pending: HashMap<u64, PendingSnapshot>§next_request_id: u64§checkpoint_history: HashMap<u32, Vec<String>>§total_completed: u64§total_failed: u64Implementations§
§impl CheckpointManager
impl CheckpointManager
pub fn new(config: CheckpointConfig) -> CheckpointManager
pub fn new(config: CheckpointConfig) -> CheckpointManager
Create a new checkpoint manager with file storage at the configured path.
pub fn with_storage(
config: CheckpointConfig,
storage: Box<dyn CheckpointStorage>,
) -> CheckpointManager
pub fn with_storage( config: CheckpointConfig, storage: Box<dyn CheckpointStorage>, ) -> CheckpointManager
Create a checkpoint manager with a custom storage backend.
pub fn register_actor(
&mut self,
actor_slot: u32,
kernel_id: impl Into<String>,
kernel_type: impl Into<String>,
)
pub fn register_actor( &mut self, actor_slot: u32, kernel_id: impl Into<String>, kernel_type: impl Into<String>, )
Register an actor for periodic checkpointing.
pub fn unregister_actor(&mut self, actor_slot: u32)
pub fn unregister_actor(&mut self, actor_slot: u32)
Unregister an actor from checkpointing.
pub fn is_enabled(&self) -> bool
pub fn is_enabled(&self) -> bool
Check if checkpointing is enabled.
pub fn config(&self) -> &CheckpointConfig
pub fn config(&self) -> &CheckpointConfig
Get the checkpoint configuration.
pub fn pending_count(&self) -> usize
pub fn pending_count(&self) -> usize
Get the number of pending snapshot requests.
pub fn total_completed(&self) -> u64
pub fn total_completed(&self) -> u64
Get total completed snapshots.
pub fn total_failed(&self) -> u64
pub fn total_failed(&self) -> u64
Get total failed snapshots.
pub fn poll_due_snapshots(&mut self) -> Vec<SnapshotRequest>
pub fn poll_due_snapshots(&mut self) -> Vec<SnapshotRequest>
Poll for actors that are due for a snapshot.
Returns a list of SnapshotRequests that should be sent to the device
as H2K SnapshotActor commands.
Each actor is only requested once per interval, and only if no prior request for that actor is still pending.
pub fn complete_snapshot(
&mut self,
response: SnapshotResponse,
) -> Result<Option<String>, RingKernelError>
pub fn complete_snapshot( &mut self, response: SnapshotResponse, ) -> Result<Option<String>, RingKernelError>
Process a completed snapshot response from the device.
If the snapshot succeeded, the data is persisted to storage and the retention policy is enforced.
Returns the checkpoint name on success.
pub fn request_snapshot(&mut self, actor_slot: u32) -> Option<SnapshotRequest>
pub fn request_snapshot(&mut self, actor_slot: u32) -> Option<SnapshotRequest>
Manually request a snapshot for a specific actor, bypassing the interval timer.
This is useful for on-demand snapshots (e.g., before a risky operation)
or in tests. Returns None if the actor is not registered.
pub fn cancel_pending(&mut self, request_id: u64) -> bool
pub fn cancel_pending(&mut self, request_id: u64) -> bool
Cancel a pending snapshot request.
Returns true if the request was found and cancelled.
pub fn cancel_all_pending(&mut self)
pub fn cancel_all_pending(&mut self)
Cancel all pending snapshot requests.
pub fn load_latest(
&self,
actor_slot: u32,
) -> Result<Option<Checkpoint>, RingKernelError>
pub fn load_latest( &self, actor_slot: u32, ) -> Result<Option<Checkpoint>, RingKernelError>
Load the most recent checkpoint for an actor.
pub fn list_checkpoints(
&self,
actor_slot: u32,
) -> Result<Vec<String>, RingKernelError>
pub fn list_checkpoints( &self, actor_slot: u32, ) -> Result<Vec<String>, RingKernelError>
List all checkpoint names for an actor.
pub fn storage(&self) -> &dyn CheckpointStorage
pub fn storage(&self) -> &dyn CheckpointStorage
Get a reference to the storage backend.
Auto Trait Implementations§
impl Freeze for CheckpointManager
impl !RefUnwindSafe for CheckpointManager
impl Send for CheckpointManager
impl Sync for CheckpointManager
impl Unpin for CheckpointManager
impl !UnwindSafe for CheckpointManager
Blanket Implementations§
§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
§type ArchivedMetadata = ()
type ArchivedMetadata = ()
§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<F, W, T, D> Deserialize<With<T, W>, D> for F
impl<F, W, T, D> Deserialize<With<T, W>, D> for F
§fn deserialize(
&self,
deserializer: &mut D,
) -> Result<With<T, W>, <D as Fallible>::Error>
fn deserialize( &self, deserializer: &mut D, ) -> Result<With<T, W>, <D as Fallible>::Error>
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more