JBoss Community Archive (Read Only)

RHQ 4.9

Release 1.1.0

Having been around since the project's inception, I can say with confidence that the 1.1.0 release of RHQ is the most robust, stable, and scalable version we've delivered to date.

The major focus of this release was to eliminate the single point of failure on the server-side. In previous versions, if the RHQ Server ever went down - whether for some scheduled maintenance on the box, a network blip between the agents, unanticipated firewall rule changes, hardware failures, power outages, etc - the system would appear down. You wouldn't be able to access the web console, the data the agents were collecting couldn't get to the database, and you wouldn't receive any alerts that the RHQ system wasn't functioning (because the mechanism for sending alerts wasn't running).

All that changes in the 1.1.0 release. With it comes the high availability and failover feature set. The first part, high availability, seeks to provide redundancy at the server-side - the layer that collects data from the agents, inserts it into the database, runs periodic jobs, triggers alerts, and provides the web console. The second part, failover, enables agents to switch which server they are communicating with so that collected data can make it into the database in a timely manner.

Let me jump the gun and say that even though we weren't explicitly planning on better scale through increased throughput, that's serendipitously what happened. At the start of this release, we thought that the ability to monitor 100 agents simultaneously (with default metric collection intervals) would be an improvement we could be proud of. So then it should come as no surprise why I couldn't stop smiling when we ramped the system up to ~350 agents...and it kept humming. And we weren't just collecting data and using the server as a pass through to the database; the system had thousands of alert definitions set up and every single report sent up from any agent had to first be inspected by the alerting engine, to see if it should fire off any alerts.

So is that the only thing the team was working on these last 3 months? Hardly. When I take a step back and reflect on everything else that was accomplished this release, I couldn't be more proud to work with such a capable team of engineers that could knock out such a formidable amount of work given the timeframes we had. We closed out more than 200 issues, which is nearly a quarter of all that had been opened to date!

And in order to share with the community what 1.1.0 has to offer, I took the time to go through every single one of them today, and tried to come up with a short list to describe all the major points that this release has to offer.

Platform Improvements

Auditing

Plugin Enhancements

Notable Features

Performance Improvements

General UI Enhancements

Now that the platform is stable and, more importantly, that we know we can scale to hundreds of agents (monitoring tens of thousands of managed resources), it's time to focus on how we can enable RHQ to ease the management of large environments. And I wouldn't be surprised if that was a primary focus for the 1.2.0 release. Stay tuned!

Joseph Marques
Technical Lead, RHQ Project
Senior Software Engineer, Red Hat

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-13 07:54:54 UTC, last content change 2013-09-18 19:40:18 UTC.