RHQ currently has no notion of dependencies between resources except for the implicit parent-child notion of the resource tree, but even there, no real dependency is expressed in the form that e.g. when the parent becomes unavailable, the non-availability of the children is not explicitly reported (i.e. hidden)
The classical application is often denoted like this:
We have an Application consisting of a load balancer LB, two App servers and a database.
RHQ currently has no notion of the Application itself (which here is just a mixed group) and is thus not able to just tell
- Application is running well
- Application is running with some issued (e.g. one of the two App servers down)
- Application is down
Currently the administrator needs to "sort this out by hand"
Situation gets worse when you add more resources in the game like a JMS broker
or even a compute grid or more applications that all have some dependencies in the form of a directed graph
I was recently on a call with a customer. The following comes from an email wrt that call. The customer has a strong demand to be able to start and stop apps with their dependencies. The following graphic illustrates this:
Before the Application (s) on AS can start, the compute grid must be started, as well as the DB and the load balancer. Also in compute grid and the AS also depend on some MQ broker. For app3 to start, the grid + MQ needs to
be present and for app4, only MQ.
Also for stopping the order needs to be reversed (with checks about what may still be needed, but this is not the subject of this email).
Veritas cluster manager is able to produce such a graph as displayed above
and also to hande this dependencies.
RHQ actually already provides the biggest part of this puzzle: triggering start + stop operations on the managed resources.
We got the idea that they could actually implement the triggering logic by implementing a server plugin, that reads an external file that hosts the dependency graph and then triggers the start and stop operations in turn.
As server plugins have "controls", there are even "buttons" in the UI to trigger those start and stop "meta-operations".
- it is not intuitive to start app4 to go to Admin->ServerPlugins->....
- ops people (for the apps) may not even have access to the Admin section of RHQ
So we would need a way to associate such an operation of a server plugin to an actual resource, so that the operator would go to App4, operations tab and issue a "start with dependencies" operation, that goes to the server plugin, which then does the work of scheduling all the needed start operations on plugin level (in turn and with waiting for results).
There are now several options on how to represent the metadata for this
- server plugin could indicate which operations it would offer for what resource types
- agent plugins would get a new <server-operation> section that contains the name of the server plugin + metadata about parameters etc.
In any case the name of the resource where the server-operation is triggered from needs to be passed to the server plugin, so that it can go to the right entry in the table and do its work.
Alerting needs to be able to understand the concept of Application, so that the user only needs to set up an alert on application level and e.g. un-availabilities on the level of dependencies will bubble up to the level of the application.
Classifying a set of resources now allows to create dashboards that can aggregate system state like only showing a traffic light for the state of the whole application - if the user needs more knowledge, he can drill down into the application
We need to provide graphical views like the ones at the top, where the user can see the dependencies, alert state and via context / hover can easily see relevant metrics
There exists already a design of a Relationship Service that could be used to implement the graph here. This is not enough though, as just following those links is not enough, as you also need a driver to sit on top to drive all the operations to be triggered.
We should also consider adding a first-class concept of managed resource lifecycle management. The new plugin API could look something like:
If we had this, along with defined relationships, either the Server or the plugin container could orchestrate starting a managed Resource and all its dependency Resources in the correct order. I think this would allow us to provide generic support for dependency-aware Resource lifecycle management, except for the case where the user wants to define a set of Resources as runtime dependencies of their "application", where that application is not represented as an RHQ Resource. That case could require a server plugin and/or even further additions to the domain model.