This is the RHQ 4.4 release. It was released on May 9th, 2012
If upgrading from RHQ 4.2 you must first make a manual change to your database. Apply this change only if upgrading from RHQ 4.2, not earlier versions. Execute the following SQL to update the schema version from 2.114 to 2.115:
After this update proceed with the upgrade normally.
New Features (since RHQ 4.3)
There have been several changes made to enhance availability collection, reporting and alerting. For more on all of the changes to availability see: Availability Improvements.
In addition to DOWN and UP availability types RHQ now has UNKNOWN and DISABLED as well.
The UNKNOWN availability allows RHQ to better represent resources for whom we don't know the current availability. The best example of this is when an agent is down. It's managed resources may be up or down, we don't know.
RHQ now allows users to mark resources DISABLED. This is an availability type primarily assigned by users, not by agent reporting. DISABLED resources will ignore availability reported by the agent. This is useful for planned outages or resources that are expected to be, or are somehow set administratively down. Since DISABLED resources are not DOWN, they are omitted from dashboard portlets and availability alerting scenarios.
|Behavioral Change: Agent Down Handling|
When an agent is down its platform resource will be marked DOWN but all of the platform children will now be marked UNKNOWN to represent that the RHQ server is not getting updated. In the past the children were also marked as DOWN. Note that DISABLED children will be left as DISABLED (see more below).
|Behavioral Change: Alerting|
Existing Goes DOWN alert conditions will not fire when a resource is set to UNKNOWN. So, the new availability assignment for down agents can affect existing alerting. The intent is to be more accurate and avoid false positives but if the prior behavior is desired the alert conditions should be updated to Goes NOT UP, which is a new option.
|Behavioral Change: Group Availability|
The introduction of new availability types forced changes to the way group availability is determined. Group availability is now determined with the following algorithm, evaluated top to bottom in the table below:
|Behavioral Change: Remote API|
The remote API method ResourceManagerRemote.getLiveResourceAvailability() no longer returns null for unknown, it now properly returns AvailabilityType.UNKNOWN. This may affect existing remote clients or CLI scripts.
Note that 'Goes DOWN' alert conditions remain unchanged and are unaffected by the upgrade. And are satisfied as before, when the resource's availability changes from NOT DOWN to DOWN. But note that resources moving to UNKNOWN or DISABLED will not meet the condition. There is now a 'Goes NOT UP' operator that will match when the state moves from UP to any other availability type.
In addition to Availability Change alert conditions, it is now possible to create Availability Duration conditions. 'Stays DOWN for Xm' will match if a resource goes from UP to DOWN and stays down for X minutes. 'Stays NOT UP' is similar, but affects changes from UP to any other availability.
This is a major change. Previously, all resources were checked on every availability scan. By default every five minutes. This could caused 'peak and valley' issues with CPU and/or memory spikes. It also did not provide any way to favor checking of critical resources and lessen priority for many non-critical, service-level resources. With the changes:
- Provide resource-level granularity for collecting avail information.
- Every non-platform resource type will have a built-in metric called "AvailabilityType"
- The value is in seconds
The new Availability metric schedule will be added automatically to all types in updated plugins. So, for upgrades, new versions (updated MD5) of current plugins must be deployed. Custom plugins must be rebuilt and redeployed to get the new metric schedule.
|Behavioral Change: Availability Check Intervals|
Previously an availability check was performed on all resources with a 5 minute interval, and all resources were checked in one pass. Now, availability checking is performed based on the Availability metric schedule. If not set in the plugin descriptor the resource type's default availability check interval is based on its category:
This means that Availability collection intervals can be set, like other metric schedules, at the Template, Group and Resource levels. And can be changed at the user's discretion. If the metric is disabled then affected resources will defer to their parent's availability type.
|Behavioral Change: Agent Avail Prompt Command|
The Avail prompt command generated either a changes-only or full report, and that is still true. But it always performed an avail check on every resource. With the introduction of prioritized availability checking that is not true, the avail check will be performed only if there is no current availability for the resource, or it's scheduled time is past. There is a new option, --force that can be specified to force the availability checks. Note that this option will increase execution time.
For best performance it is recommended that the collection interval for non-interesting resources be set to a large interval, or be disabled.
|Behavioral Change: Availability Check Approach|
Availability checking now happens incrementally. The availability job runs at 30 second intervals and not every resource is checked on each pass. Instead, checking is spread out, still respecting the desired intervals as much as possible, but in a fashion that avoids the 'peak and valley' issues of the past.
|Behavioral Change: Agent Max Quiet Time|
Back-filling of an agent's platform resources was performed after a 15 minute period of no communication from the agent. This period is set as the AGENT_MAX_QUIET_TIME_ALLOWED system setting. This was true of an agent shut down gracefully or one that went down unexpectedly. The upgrade will now set this value to 5 minutes, which is being reduced due to architectural improvements. Also, agents shut down gracefully will be back-filled immediately.
|Behavioral Change: Operations that Affect Availability|
Operations now have the ability to request an immediate availability check after completion. All of the RHQ plugins have been updated for any Start/Stop/Restart operations. So, availability should typically be updated within 60s of the operation completing and can be reflected in the UI if it is refreshed.
The REST api has been enhanced. This API is included to get the effort started to build a REST interface into RHQ so that the server is better accessible from other tools and languages.
|This API IS NOT STABLE. Do not rely on it. IT WILL CHANGE|
- RESTEasy has been updated to version 2.3.2.Final
- Updated Japanese translations by Fusaykui Minamoto
- Initial Russian translations by Denis Krusko
- The reports under /coregui/#Reports can now be exported in CSV format.
- Several of the reports offer filtering capabilities to generate a fixed data set.
- [bug 805987] The platform plugin now reports metrics for the actual free and actual used system memory.
- The JBoss AS 7 plugin has been renamed. If you install RHQ 4.4 into an existing RHQ 4.3 database and had the as7 plugin installed before, you should remove it before the upgrade.
There is now a project RHQ samples on GitHub available that lists additional sample code that works together with RHQ. This also contains examples in other programming languages than Java to access the REST api.
- The embedded agent may fail to find the server / register with the server - this means that it will not be able
to discover and manage any resources. Please use an external agent. BZ 819766
- For the group display, it may look if resource counts are wrong when you have resources sitting in the autodiscovery queue BZ 819897
The GWT part of the UI has partially been translated into German, Portuguese, Japanese, Chinese and Russian. The language should be automatically selected depending on your browser settings. You can explicitly access other translations by appending a locale specifier in the URL. For example to select the German translation you would append ?locale=de to the base URL, e.g. http://localhost:7080/coregui/?locale=de.
Supported locales are:
- zh for Chinese
- de for German
- ja for Japanese
- pt for Portuguese
- ru for Russian
Please ping us if you want to help translating the UI to your language. Translations are done via the translations project on GitHub, which also has some instructions on how to start.
Please report all bugs you find in Bugzilla. If you find a bug that has been recorded in the above list, please leave a comment on them especially if this needs special steps to reproduce.
You can download the release here.