Table of Contents

Interface IAdaptiveHealthMonitor

Namespace
DotCompute.Abstractions.Health
Assembly
DotCompute.Abstractions.dll

Adaptive health monitoring with HLC-based causal analysis and ML-powered failure prediction. Adjusts monitoring intervals dynamically based on system health trends.

public interface IAdaptiveHealthMonitor : IDisposable
Inherited Members
Extension Methods

Remarks

Key Features:

  • HLC timestamping of all health events for causal analysis
  • Adaptive monitoring intervals (faster when unhealthy, slower when stable)
  • Historical trend analysis with sliding windows
  • ML-powered failure prediction using simple heuristics
  • Causal failure correlation via HLC happened-before relationships

Monitoring Intervals:

  • Critical: 100ms - System in critical state, maximum monitoring
  • Degraded: 500ms - System degraded, increased monitoring
  • Healthy: 2s - System stable, normal monitoring
  • Optimal: 10s - System optimal, reduced monitoring overhead

Properties

CurrentMonitoringInterval

Gets the current monitoring interval based on system health.

TimeSpan CurrentMonitoringInterval { get; }

Property Value

TimeSpan

MonitorId

Gets the monitor identifier.

string MonitorId { get; }

Property Value

string

TotalSnapshotCount

Gets the total number of health snapshots collected.

long TotalSnapshotCount { get; }

Property Value

long

Methods

AnalyzeCausalFailuresAsync(HlcTimestamp, TimeSpan, CancellationToken)

Performs causal failure analysis using HLC happened-before relationships. Identifies which failures may have causally contributed to a given failure event.

Task<CausalFailureAnalysis> AnalyzeCausalFailuresAsync(HlcTimestamp failureTimestamp, TimeSpan lookbackWindow = default, CancellationToken cancellationToken = default)

Parameters

failureTimestamp HlcTimestamp

HLC timestamp of the failure to analyze.

lookbackWindow TimeSpan

How far back to search for causal failures (default: 5 minutes).

cancellationToken CancellationToken

Cancellation token.

Returns

Task<CausalFailureAnalysis>

Causal analysis results with happened-before failure chain.

AnalyzeTrendsAsync(HlcTimestamp, TimeSpan, CancellationToken)

Analyzes health trends and predicts potential failures.

Task<HealthAnalysisResult> AnalyzeTrendsAsync(HlcTimestamp currentTime, TimeSpan analysisWindow = default, CancellationToken cancellationToken = default)

Parameters

currentTime HlcTimestamp

Current HLC timestamp for analysis window.

analysisWindow TimeSpan

Time window for trend analysis (default: 1 minute).

cancellationToken CancellationToken

Cancellation token.

Returns

Task<HealthAnalysisResult>

Health analysis results with failure predictions.

GetRecentHistoryAsync(int, CancellationToken)

Gets recent health history for manual inspection.

Task<IReadOnlyList<TimestampedHealthSnapshot>> GetRecentHistoryAsync(int count = 100, CancellationToken cancellationToken = default)

Parameters

count int

Number of recent snapshots to retrieve (default: 100, max: 1000).

cancellationToken CancellationToken

Cancellation token.

Returns

Task<IReadOnlyList<TimestampedHealthSnapshot>>

List of recent health snapshots ordered by HLC timestamp.

RecordSnapshotAsync(DeviceHealthSnapshot, HlcTimestamp, CancellationToken)

Records a health snapshot with HLC timestamp for causal analysis.

Task RecordSnapshotAsync(DeviceHealthSnapshot snapshot, HlcTimestamp timestamp, CancellationToken cancellationToken = default)

Parameters

snapshot DeviceHealthSnapshot

The health snapshot to record.

timestamp HlcTimestamp

HLC timestamp for causal ordering.

cancellationToken CancellationToken

Cancellation token.

Returns

Task

Task representing the async operation.

ResetAsync()

Resets monitoring state and clears history (for testing or recovery scenarios).

Task ResetAsync()

Returns

Task