Interface IAdaptiveHealthMonitor
- Namespace
- DotCompute.Abstractions.Health
- Assembly
- DotCompute.Abstractions.dll
Adaptive health monitoring with HLC-based causal analysis and ML-powered failure prediction. Adjusts monitoring intervals dynamically based on system health trends.
public interface IAdaptiveHealthMonitor : IDisposable
- Inherited Members
- Extension Methods
Remarks
Key Features:
- HLC timestamping of all health events for causal analysis
- Adaptive monitoring intervals (faster when unhealthy, slower when stable)
- Historical trend analysis with sliding windows
- ML-powered failure prediction using simple heuristics
- Causal failure correlation via HLC happened-before relationships
Monitoring Intervals:
- Critical: 100ms - System in critical state, maximum monitoring
- Degraded: 500ms - System degraded, increased monitoring
- Healthy: 2s - System stable, normal monitoring
- Optimal: 10s - System optimal, reduced monitoring overhead
Properties
CurrentMonitoringInterval
Gets the current monitoring interval based on system health.
TimeSpan CurrentMonitoringInterval { get; }
Property Value
MonitorId
Gets the monitor identifier.
string MonitorId { get; }
Property Value
TotalSnapshotCount
Gets the total number of health snapshots collected.
long TotalSnapshotCount { get; }
Property Value
Methods
AnalyzeCausalFailuresAsync(HlcTimestamp, TimeSpan, CancellationToken)
Performs causal failure analysis using HLC happened-before relationships. Identifies which failures may have causally contributed to a given failure event.
Task<CausalFailureAnalysis> AnalyzeCausalFailuresAsync(HlcTimestamp failureTimestamp, TimeSpan lookbackWindow = default, CancellationToken cancellationToken = default)
Parameters
failureTimestampHlcTimestampHLC timestamp of the failure to analyze.
lookbackWindowTimeSpanHow far back to search for causal failures (default: 5 minutes).
cancellationTokenCancellationTokenCancellation token.
Returns
- Task<CausalFailureAnalysis>
Causal analysis results with happened-before failure chain.
AnalyzeTrendsAsync(HlcTimestamp, TimeSpan, CancellationToken)
Analyzes health trends and predicts potential failures.
Task<HealthAnalysisResult> AnalyzeTrendsAsync(HlcTimestamp currentTime, TimeSpan analysisWindow = default, CancellationToken cancellationToken = default)
Parameters
currentTimeHlcTimestampCurrent HLC timestamp for analysis window.
analysisWindowTimeSpanTime window for trend analysis (default: 1 minute).
cancellationTokenCancellationTokenCancellation token.
Returns
- Task<HealthAnalysisResult>
Health analysis results with failure predictions.
GetRecentHistoryAsync(int, CancellationToken)
Gets recent health history for manual inspection.
Task<IReadOnlyList<TimestampedHealthSnapshot>> GetRecentHistoryAsync(int count = 100, CancellationToken cancellationToken = default)
Parameters
countintNumber of recent snapshots to retrieve (default: 100, max: 1000).
cancellationTokenCancellationTokenCancellation token.
Returns
- Task<IReadOnlyList<TimestampedHealthSnapshot>>
List of recent health snapshots ordered by HLC timestamp.
RecordSnapshotAsync(DeviceHealthSnapshot, HlcTimestamp, CancellationToken)
Records a health snapshot with HLC timestamp for causal analysis.
Task RecordSnapshotAsync(DeviceHealthSnapshot snapshot, HlcTimestamp timestamp, CancellationToken cancellationToken = default)
Parameters
snapshotDeviceHealthSnapshotThe health snapshot to record.
timestampHlcTimestampHLC timestamp for causal ordering.
cancellationTokenCancellationTokenCancellation token.
Returns
- Task
Task representing the async operation.
ResetAsync()
Resets monitoring state and clears history (for testing or recovery scenarios).
Task ResetAsync()