Class MetalAlertsManager
- Namespace
- DotCompute.Backends.Metal.Telemetry
- Assembly
- DotCompute.Backends.Metal.dll
Threshold monitoring and alerting system for Metal backend
public sealed class MetalAlertsManager : IDisposable
- Inheritance
-
MetalAlertsManager
- Implements
- Inherited Members
- Extension Methods
Constructors
MetalAlertsManager(ILogger<MetalAlertsManager>, MetalAlertsOptions)
public MetalAlertsManager(ILogger<MetalAlertsManager> logger, MetalAlertsOptions options)
Parameters
loggerILogger<MetalAlertsManager>optionsMetalAlertsOptions
Properties
ActiveAlerts
Gets all currently active alerts
public IReadOnlyList<Alert> ActiveAlerts { get; }
Property Value
Methods
CheckErrorRate(MetalError)
Checks for high error rates and potentially triggers an alert
public void CheckErrorRate(MetalError error)
Parameters
errorMetalError
CheckHighGpuUtilization(double)
Checks for high GPU utilization and potentially triggers an alert
public void CheckHighGpuUtilization(double utilizationPercentage)
Parameters
utilizationPercentagedouble
CheckHighMemoryPressure(MemoryPressureLevel, double)
Checks for high memory pressure and potentially triggers an alert
public void CheckHighMemoryPressure(MemoryPressureLevel level, double percentage)
Parameters
levelMemoryPressureLevelpercentagedouble
CheckHighMemoryUtilization(double)
Checks for high memory utilization and potentially triggers an alert
public void CheckHighMemoryUtilization(double utilizationPercentage)
Parameters
utilizationPercentagedouble
CheckHighResourceUtilization(ResourceType, double)
Checks for high resource utilization and potentially triggers an alert
public void CheckHighResourceUtilization(ResourceType resourceType, double utilizationPercentage)
Parameters
resourceTypeResourceTypeutilizationPercentagedouble
CheckKernelExecutionFailure(string, TimeSpan)
Checks for kernel execution failure and potentially triggers an alert
public void CheckKernelExecutionFailure(string kernelName, TimeSpan duration)
Parameters
CheckMemoryAllocationFailure(long)
Checks for memory allocation failure and potentially triggers an alert
public void CheckMemoryAllocationFailure(long sizeBytes)
Parameters
sizeByteslong
CheckSlowOperation(string, TimeSpan)
Checks for slow operations and potentially triggers an alert
public void CheckSlowOperation(string operationName, TimeSpan duration)
Parameters
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
EvaluateActiveAlerts(MetalTelemetrySnapshot)
Evaluates active alerts against current telemetry data
public void EvaluateActiveAlerts(MetalTelemetrySnapshot snapshot)
Parameters
snapshotMetalTelemetrySnapshot