The RHQ project is an abstraction and plug-in based systems management suite that provides extensible and integrated systems management for multiple products and platforms across a set of core features. The project is designed with layered modules that provide a flexible architecture for deployment. It delivers a core user interface that delivers audited and historical management across an entire enterprise. A Server/Agent architecture provides remote management and plugins implement all specific support for managed products.
- Hierarchical parent-child relationships between server and service resources
- Resource types are defined by the plugins themselves, making the system easy to add new types of any kind
- From the UI you can manually create resources (not the managed resources themselves, but rather the "RHQ resource" wrapper - see Content for the creation of managed resources)
- AAA (Authorization Authentication Accounting or Audit)
- Supports basic username/password authentication
- Integrates with LDAP for authentication only - planning for future integration with additional providers to support authorization (e.g. IPA)
- Provide standard RBAC (role-based access control) for authorization when using the database to store the authz data
- Does not currently support kerberos authentication
- Would need extension to support custom permissions by application and plugin developers
- Cannot plugin additional security modules - would have to add new JAAS modules or new synchronization infrastructure to synchronize with an external security subsystem (e.g. IPA)
- Resource types define their configuration properties in the plugins
- The UI allows for the immediately and scheduling of resource configuration updates
- An audit trail is maintained to track who/what/when of configuration updates
- The UI is not pluggable - the configuration UI is generic to allow the viewing and editing of any configuration as it can be defined in the plugin descriptor, but the UI itself is not pluggable to support wizard-y type configuration updates
- Monitoring is a first class concept in RHQ - although plugins are not forced to support it. That is to say, even though the RHQ core provides the monitoring infrastructure, plugins do not have to support the emission of measurement data (e.g. in the case when the managed product does not emit measurement data itself)
- You can schedule what metrics are collected and how often to collect them. You can also disable measurement collections for any or all metrics (i.e. even if a plugin supports monitoring, a user has the option to completely disable the collection of measurement data)
- Availability is an integral part of the monitoring subsystem - you can see which managed resources are up or down, regardless if the plugin does not support emitting measurement data.
- Measurement data includes numeric data and "usually-fixed" string data that are called traits (traits are things like "number of CPUs" or "installed RAM" - values that can change, but rarely do over time)
- Ability to collect complex, tabular data for things like URL response time data or EJB call time statistics
- RHQ has the ability to alert users based on certain criteria such as values of monitored measurement data or the results of an operation invocation.
- A plugin can define what operations are available to a resource - plugins can support zero, one or more operations
- An operation can accept any number of parameters and can return a generic result set
- Operations can be invoked immediately or scheduled
- An audit trail is maintained to track the who/what/when of operation invocations and what their results were (answers questions like "was it successful?" and "what were the results if it was successful?")
- RHQ can discover, inventory and push content to/from agents.
- You can update content from the UI (i.e. if you update a script, upload it to the server and push it to the agent(s))
- This is the basis for the middleware EAR/WAR/JAR deployment facilities (i.e. how you can deploy new or updated applications)
- RPMs are considered simply a piece of content that can be pushed to agents and later installed.
- You can create managed resources by pushing new artifacts to an agent (e.g. a new JBossAS datasource or a new Tomcat WAR application can be create by pushing a .xml or .war file, respectively)
- Software patching is a first class concept. This allows for "patches" or "updates" to be delivered to an agent and the agent will run a series of steps to install the update following a JBPM-defined set of instructions and rules.
- Content Sources can be plugged into the server using content source server plugins. Content Source adapters pull content from remote repositories (such as YUM repos or RSS feeds). This content can then be pushed down to remote machines for installation.
RHQ's plugin system (called AMPS, for Advanced Management Plugin System) allows one to define the model for managing third-party products and to implement the necessary code to adapt to the abstraction model of RHQ.
RHQ is built on the JBoss AS EJB3 implementation provided by Hibernate 3 and provides a database-agnostic application. The installation and upgrade code is currently written to support Oracle and PostgreSQL, but can be extended to support other databases. Some native SQL is used within the code base, so there would be some work necessary to port to different database vendors, but it is possible should the community request additional database support such as MySQL and MS SQL Server.
The plugins support integrating new managed products via the abstraction model of RHQ and the AMPS API. AMPS will be a fully documented API by which developers and even customers can write their own plugins for deployment within the RHQ environment.
There is not currently a model for server side introduction of arbitrary application components beyond the standard component model of EJB3 and Jave EE. We do expect to solidify the API within the the RHQ server application in order to reliably integrate third-party components at this level. This API would be the same one used to build the RHQ user interface and would be exposed as a local API and via web services in order to provide remote integration capabilities. The important thing is to complete and manage the stability of these remoted server APIs.
There is the concept of a pluggable server-side plugin for integrating remote content repositories. Developers can write their own content source adapters should they wish to pull down content from remote repositories such FTP servers, SVN repositories or WebDAV folders.
RHQ utilizes an install-in-place web installation system. This allows the user to simply start the product once downloaded and then answer a few questions in the UI. The system then runs the installation including database setup or upgrade as necessary and then completes deployment.
The agent can be installed without user intervention if the user installs it with a complete configuration file. By default, the agent will ask the user for configuration settings at initial startup time.
The one thing that is required prior to installation is the existing of an installed database. Installation instructions are on the wiki to provide a way to quickly get a PostgreSQL database installed so developers won't have to spend alot of time getting a database up and running.
The RHQ server tier exposes an API that is used by the Web UI. This API is also exposed view Web Services so external applications can access the RHQ core services.
The RHQ Server is pure Java and runs on any OS that has Java 5 support. The SIGAR native libraries running in the RHQ Agent supports many platforms (with a Java-only mode that allows you to run the agent on any Java5-supported platform).
The RHQ Agent is designed to throttle messages to the server to avoid flooding the server in a message storm. Conversely, the RHQ Server is designed to drop messages if it feels it is getting flooded with messages that it can't handle. The RHQ Agent is also designed to only send the minimal amount of data that is necessary (e.g. by utilizing the Externalizable interface that Java provides, we can reduce the over-the-wire size of some objects; another example is we limit the amount of availability data that flows to the server by only sending the data that changed since the last availability report; etc)
There needs to be a lot of testing in this area to ensure RHQ can scale at the level we need.
The server has the ability to upgrade the database schema and its data to newer versions as new RHQ releases are delivered (utilizing the dbupgrade utility). This means if we ship a new RHQ version that requires changes to the existing data model, the installer will be able to upgrade the database without causing the customer to lose data.
The agent can automatically pull down new or updated product plugins remotely - just deploy the plugins to the server and the agent (upon request or upon startup) will pull down the new plugins.
We could push down new files (jars, configuration files, scripts) to upgrade the agent core using the content subsystem (in effect eat our own dog food and manage the agent like any other manageable product). This hot-redeployment of the core agent pieces needs to be implemented, but it is possible with the core services already in place.
For important messages, the server and agent can send them with guaranteed delivery which means those messages will be queued and sent when the other end comes back up (e.g. if the server goes down, metric data is queued and will be sent when the server comes back up).
We can run multiple servers that can be run in a clustered environment or as a hot-backup. These types of configurations need to be tested.
Using JBossAS/EJB3 clustering features built into the infrastructure upon which RHQ was built, we can provide a high degree of server availability.
The agent currently is able to detect a downed server and wait for it to come up - but there is no fail over to a backup server. The agent can be modified to have the capability introduced (some customers, in fact, have asked for this ability of the agent to switch to a backup server if the primary goes down).
- The agent core is fully I18N ready and localized to English today (with some German translations existing already). So it is easily localizable.
- The AMPS plugin system can be made to be i18n ready (we have some designs in place) but this is not implemented yet. This means a custom plugin can be localized.
- The server on the other hand is going to require more specific effort to support i18n and l10n.
Today, the server can be installed non-interactively. When the admin user first connects to it, he is required to step through the installer setup UI to tell it where the database is, what additional features you want to enable, etc. So there is some setup that is needed but that setup occurs after the software is installed. An enhancement could be easily made to allow a predefined configuration file.
The agent today can be made to be a non-interactive install - it is designed to be so. You simply copy/unpackage the agent distro to the remote agent box and provide it with a custom configuration file that defines things like the IP it listens to and the IP of the server it talks to. The agent can startup without requiring any user interaction.
- The server always guarantees that the user is shown the latest configuration before editing begins to ensure the configuration hasn't changed in the backing store. Additionally, we will be adding another check to ensure the configuration hasn't changed during the user-think time on the configuration edit page.
- Operations against the managed resource can be managed per resource to allow safe, multi-user access to the system.
- Operation invocations, artifact deployments, software updates and configuration changes have history associated with them - the audit trail will provide things like a) the user that did it b) when the user did it c) what the user did and d) what the results were (success or failure). Triggered alerts are also stored so you can view alert history.
The agent-server communications support includes reliable and client-persisted delivery of messages in both directions without complicating code in the core services. If the server is down, for example, the agent will be able to spool unsuccessfully sent messages until the connection is regained. Multicast detection of agents and servers is also supported. A future desired enhancement is the agent's automated failover between clustered servers.