JBoss Community Archive (Read Only)

RHQ 4.9

Trends and proactive alarming

Currently we are only able to alert operations when a problem has already occured. There is a possibility to have alerts on crossing thresholds like 10% away from the upper or lower vaules of a metric or when a metric value crosses some threshold around the baseline of a metric.

In order to proactively inform operations about upcoming issues, we should add trending functionality and the possiblity to compute a time delta when a critical situation will arise (with a certain probability). Lets have a look at the following graph:

images/author/download/attachments/73139329/Trends.png

Here we have a dynamic metric and a trend function. In addition we have a threshold value („[\~hrupp:SLA]Service Level Agreement]"). With the help of the current value and the trend graph, we could compute a deltaT time value when the metric would hit the threshold value. DeltaT could then be fed into the alert subsystem to alert if deltaT is less than a given value.
Of course, this is not limited to dynamic metrics, but would work even better for trendsup or trendsdown metrics, as the extrapolation is easier in that case.

This algorithm is targeted at metrics, where it is expected for operations to react before a critical situation arises. An example for this would be the used capacity of a storage array - if its know that the storage will be full within the next two days, operations still has time to proactively add an additional disk or replace an existing with a bigger one.
An example where this makes no sense would be the cpu load. One could argue that if the load reaches some level, an additional machine could be brought into service, but the changes in cpu load are too quick to effectively trigger this with some trend function.

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-13 08:16:35 UTC, last content change 2013-09-18 19:40:59 UTC.