JBoss Community Archive (Read Only)

RHQ 4.5

Design-AlertsAuditTrail

Background

deleted bits are used today is for the alerts audit trail, instead of removing the record from the database upon the user requesting the alert definition to be deleted through the web console

use case:

  • create an alert definition

  • several alerts fire from this definition

  • user comes along and changes and/or deletes the alert definition

expected/actual result:

  • corresponding alerts remain in tact, all "alert info" can still be shown

the "alert info" you see on the alert history details page shows you everything about the alert - name, priority, conditions logs (value that triggered the condition to fire), notification logs (notifications that were sent at the time the condition set and dampening rules became true), etc. but the logs (in particular, the condition logs) aren't necessarily the only piece of information you want to see. a condition log might be something like "metricA was 50", but that doesn't immediately tell you / remind you what the metricA threshold was that would make that condition trigger. so, instead of duplicating alert definition data, the alerts audit trail maps back to alert definition entity instead.

the problem here is how to implement a list of unnamed alert conditions, entities whose primary keys were used as identifiers in a cache construct. if the conditions are unnnamed, then how do we know that a user didn't just reorder them in the UI, or only added a new one, or deleted a specific one (which offset all others below it in the list), etc etc. all we could hope to do here is best-effort matching from the existing (already persisted) conditions to the ones coming back from the form submittal in the UI. this didn't seem worth the effort (or that it would be easy to create perfectly reproducible results all the time), so instead of trying to calculate and mucking with delta updates I implemented editing of alert definitions as a full delete/recreate. however, in order to satisfy the use case above, the delete wasn't a "true" delete - and thus was borne the deleted bit.

this bit, aside from enabling the user to see full alert definition info (corresponding to each and every already-fired alert at the specific time each was fired), also ensures that out-of-band processing of alert-able data (availability, metrics, traits, operations, events, config changes) can reach a successful termination state. the root issue is that it's possible that an alert condition triggers (cache hit), puts a message on the queue for out-of-band processing (keyed by the alert condition id), and then that alert definition (and corresponding alert conditions) are updated and/or deleted. if the conditions were deleted, the out-of-band processing would blow up with exceptions...which of course we could catch and ignore, but is that the right logic? supporting the deleted bit allows the condition processing to proceed/complete as if nothing happened, because the original condition ids are still present in the database. any new triggers out of the cache will use the newly created alert condition ids.

cap

so, to recap, the delete bits have a dual purpose: 1) rich audit trail (though this could have been solved in a better isolated fashion by capturing more data in the trail itself), and 2) continuous, constructive processing of out-of-band alerts-triggered data against "deprecated" condition ids (MUCH more difficult to solve without the deleted bit).

Summary

all that said, i have mixed feelings about using "deleted" flags. granted, they can sometimes simplify logic that would otherwise be more tedious and/or more difficult to write. but, depending on the data profile, they might introduce considerable database bloat because those entries can not be deleted through explicit means. so now you might have to write clean-up routines to look for entities that are no longer linked to by other subsystems, and this logic can be tedious and/or difficult to write. generally, when it comes to audit trails, the data should be as isolated as possible (and, yes, sometimes that means denormalization through duplication of data in the audit trail).

in this scenario, the use cases were compelling enough to warrant the use of a deleted bit.

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-12 12:43:49 UTC, last content change 2008-11-06 05:15:57 UTC.