Skip to end of metadata
Go to start of metadata

Frequently Asked Questions

Contents

General

What is the difference between RHQ, Jopr and JON?

RHQ is an extensible management platform. RHQ is licensed as a fully open-sourced project.

Jopr was an open source project that contained JBoss middleware specific plugins, such as the JBossAS plugin, Tomcat plugin, et. al. Jopr followed the same licensing agreement as the RHQ project. The Jopr codebase has been rolled into the RHQ project and there is no longer a standalone "Jopr" project. All plugin code that used to be in the Jopr source code repository has been copied into the RHQ project and is being further developed there. Building RHQ today gives you both RHQ and what was termed "Jopr" combined. Therefore, you will find the development team now uses the term "RHQ", and rarely mentions Jopr except when talking in historical contexts (like this FAQ entry is).

JON (aka "JBoss Operations Network" or "JBoss ON") is a commercial product offered to Red Hat customers and is thus fully quality-tested and certified. JON 2.0 and later is based on RHQ; JON 1.x was a closed-source product.

Of the three projects mentioned here (RHQ, Jopr, JON), the JON product is the only one officially supported by Red Hat.

What does "RHQ" stand for?

RHQ is not an acryonym - each letter does not correspond to any specific word. The full and complete project name is "RHQ".

What documentation is available?

RHQ user documentation can be found here.
RHQ developer documentation can be found here.

Note: the commercially-available-only JBoss ON product has its documentation at http://www.redhat.com/docs/en-US/JBoss_ON/.

Is there a publicly available issue tracker system to search for bugs and submit enhancement requests?

Yes. If you would like to search for a bug, report a bug, or submit an enhancement request then use the RHQ Bugzilla located at https://bugzilla.redhat.com/browse.cgi?product=RHQ%20Project.

Is XXX database supported?

No. PostgreSQL and Oracle are the only supported databases for use as the RHQ backend data store.

Who governs RHQ?

Red Hat and Hyperic/SpringSource mutually terminated their collaboration agreement as of August 27, 2009 and mutually agreed that Red Hat will take over governance and administration of the RHQ Project from that date forward. Since the 27th of August, 2009, Red Hat has had complete administrative and governance control of the RHQ Project.

What is the syntax for regular expressions used within RHQ?

RHQ uses regular expressions in several places. When you encounter a place (user interface, configuration file, etc) that requires you to enter a regular expression, consult Java's javadoc documentation for the syntax rules. Below are the Javadocs for regular expression syntax and date/time syntax for Java5 and Java6:


Data Model

What is a "Measurement Definition" versus a "Measurement Schedule"?

In order to know what schedule and a definition represent, you need to understand the underlying data model for the measurement data.

In RHQ, there are "resource types" and "resources". A "resource type" represents a kind of resource (like "JBossAS server" or "Apache Web Server"). A "resource" is an instance of a resource type (like "My JBossAS App Server" or "hostname foo Apache").

RHQ has analogous entities in the realm of the measurement subsystem, too. For each resource type, we have "measurement definitions", sometimes alternatively called "metric definitions" (because these are the <metric> definitions in the XML plugin descriptors) - these represent a "kind" of measurement. For each resource, there is an instance of each metric definition called a "measurement schedule". So, for example, a "Linux Platform resource type" has a "Free Memory metric definition". Each of your Linux boxes would therefore have a "Free Memory measurement schedule". Each schedule has their own collection intervals associated with them - that's why you can collect Free Memory for platform A every 30 minutes but you can collect Free Memory for platform B every 15 minutes.

So, it is like this:

  • Resource Types have Metric Definitions.
  • Resources have Measurement Schedules.
  • A Resource is an instance of a Resource Type.
  • A Measurement Schedule is an instance of a Metric Definition.

Therefore, "measurement definition" refers to the kind of metric being collected for that resource's type (e.g. it refers to the "Free Memory" metric defined on the "Linux platform" resource type - it does not refer to any specific resource or any specific piece of data, rather it identifies the "kind" of metric). "Measurement schedule" refers to the specific measurement data that was collected for a specific resource (e.g. it refers to the Free Memory measurement data for the specific Linux platform resource named "myhost").


User Interface

How can I ignore an autodiscovered resource?

If your agent discovered a new platform and found a few resources that you do not want to take into inventory, you have to tell the RHQ Server to ignore those resources.

First, you can just select the resources to import in the auto-discovery portlet and deselect the unwanted resource. As long as they are shown in the portlet, they are not imported. Of course this has the disadvantage that it might be confusing to always see them.

The other option is to select the resource you do not want to import and click on "Ignore", so it no longer shows up in the portlet. Although, if you try this on a resource on a freshly discovered platform, it will fail. The reason behind this is that the inventory is organized in a tree like manner with the platform as a tree-root and when a server or service is taken into the system (no matter if imported or ignored) it will be attached below that root. When the platform is not yet imported into the inventory, there is no root where the ignored resource can be attached to.

So to ignore a server on a platform: first import the platform and leave that server unchecked. When the platform is successfully imported, select the server and click on "ignore".

From the above explanation you can see that it is not possible to ignore just a platform. If you want to ignore a platform, just do not run an agent on it.

How can I find what my user preferences are?

Execute this SQL:

select id, name, string_value
  from rhq_config_property
 where configuration_id = (select configuration_id
                             from rhq_subject
                            where name = 'your-user-name')

You can do this from your own database client, or the admin/test/sql.jsp page.

Errors and stack traces in the GWT Message Center are sometimes not helpful - how can I find out what the real problem is?

If you see errors in the browser client app (e.g. the GWT Message Center), and you see an exception with an error ID enclosed in square brackets, you can use that to track down the error and stack trace in your RHQ server's log file. For example, if you see something like this in the error message in the GWT GUI:

java.lang.RuntimeException:[1312480384219] ...

You can go to your RHQ server's log file, and find the exception with that same error ID number. That server-side log information will be more useful because these exceptions are actually happening server-side and are just forwarded over the wire to the GWT client (and much of the server-side stack trace information will not be available to the GWT client).


Server

How do I get debug messages from the RHQ Server?

You can change "rhq.server.log-level" to DEBUG in <server-install-dir>/bin/rhq-server.properties and restart the server.

You can edit the <server-install-dir>/jbossas/standalone/configuration/standalone-full.xml configuration file to enable debug messages out of non-RHQ classes and packages (see the loggers under the logging subsystem). But generally, you will want to just enable RHQ debugging (rhq.server.log-level as mentioned above) which is the org.rhq category - this will emit debug messages for all RHQ subsystems to the log file (but not the console).

Note that by default the console window will not show the debug messages. This is because the CONSOLE appender has a threshold at INFO. If you want your debug messages to also go to the console, you must change the CONSOLE appender's Threshold setting to DEBUG in the standalone-full.xml file.

In some cases, you will want to get debug messages from the RHQ Server launcher scripts. To do this, you need to set the environment variable RHQ_SERVER_DEBUG to "true". Now when you start, the launcher scripts will output debug messages.

Log files emitted by the RHQ Server are found in <server-install-dir>/logs.

How does RHQ integrate with external LDAP user repositories?

RHQ uses passwords to authenticate users. Authentication information, comprising user names and passwords, can be stored in an internal database (the default) or in an external LDAP repository. It is important to note that support for LDAP currently does not include storing attributes other than user names and passwords. In particular, authorization information such as roles used to control access to RHQ resources is persisted in the internal database.

  • Configuring RHQ to use LDAP for authentication
    1. In order to configure RHQ to use LDAP for authentication, navigate to the Server Configuration page (Administration>System Configuration>Settings). The following configuration parameters can be specified:
      1. URL of the LDAP server: This defaults to ldap://localhost on port 389 (or port 636 if the SSL option is selected). Do not use "ldaps:", the "Use SSL" option for that.
      2. Username/Password: The username and password to connect to the LDAP server. The username is typically the full LDAP distinguished name (DN) of a manager user, e.g. "cn=Manager,o=JBoss,c=US".
      3. Search Base: Base of the directory tree to search for usernames and passwords while authenticating users, e.g., o=JBoss, c=US
      4. Search Filter: Any additional filters to apply when doing the LDAP search. This is useful if the population to authenticate can be identified via given LDAP property, e.g., RHQUser=true
      5. Login Property: The LDAP property that contains the user name. Defaults to cn. If multiple matches are found, the first entry found is used.
      6. Use SSL: Provides the option to use SSL while communicating with the LDAP server.
    2. The configuration settings are captured and stored in the internal database. No attempt is made to validate the information at this point: any misconfiguration would be detected when a user attempts to log in to the GUI console.
  • Authenticating users via LDAP
    1. Once the RHQ Server has been configured to use LDAP for authentication, subsequent attempts to login to the GUI console result in requests to the LDAP server to validate users' credentials. Communication with the LDAP server is handled by a class that implements a JAAS Login Module for LDAP. The login module first searches the set of base directories for a matching username applying any search filters. If a matching name is found, a bind request specifying both the username and password is sent to LDAP to validate the credentials. Authentication is deemed successful if the bind request returns normally.
    2. Irrespective of whether LDAP is selected for authentication, the credentials of the root user rhqadmin are stored in the internal database. Stacking login modules makes for seamless authentication: when the LDAP option is selected, normal users are authenticated in LDAP and the root user rhqadmin is authenticated in the database.
  • Impact of user administration on the LDAP repository
    1. As stated earlier, RHQ uses LDAP only to perform credential validation. Auxiliary information about a user such as first/last name, phone number, email address, and roles is stored in the RHQ internal database. Furthermore, user administration actions performed in RHQ do not impact the LDAP repository. For instance, the LDAP repository is not populated with the username and password when a user is registered in RHQ. The user must be defined in the LDAP repository independently of RHQ administration, and is assumed to have credentials populated in the LDAP repository when he/she is ready to access the GUI console. In other words, RHQ uses the LDAP repository in read-only mode.
    2. One of the interesting features of LDAP integration in the product is the support for self-registration in RHQ available to those who are identified as potential RHQ users in the LDAP repository. One way of identifying RHQ users in the LDAP repository is to define attributes that can be specified in a search filter in the RHQ configuration, e.g., RHQUser=true. When such a user accesses the GUI console for the first time, he/she is first authenticated in the LDAP repository, and then redirected to the RHQ registration page to capture auxiliary information such as first/last name and email address. This alleviates the task of user registration for RHQ administrators, and reduces the likelihood of errors as information is entered directly by the registrants.

RHQ does not currently check server certificates for LDAP over SSL, nor can it provide client side certificates to the LDAP server. However, developers should be able to customize RHQ to perform these tasks - please see https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-2064 for more information.

How do I set up LDAP group authorization?

  1. LDAP authorization is set up in the Administration tab, under System Settings.
  2. First set up RHQ to allow LDAP users to authenticate using LDAP user accounts. (LDAP authentication isn't required, but it is recommended.)
  3. Then, configure RHQ to check for LDAP groups on the LDAP server. There are five elements in the LDAP server configuration that you need to know to configure LDAP group authorization:
    1. The information to connect to the LDAP server, in the form of an LDAP URL. For example, ldap://server.example.com:1389
    2. The username and password to use to connect to the server. This account should have read access to the subtrees being searched.
    3. The search base. This is the point in the directory tree to begin looking for entries. This should be high enough to include all entries that you want to include and low enough to improve performance and prevent unwanted access. For example, if you have ou=Web Team,dc=example,dc=com and ou=Engineering,dc=example,dc=com and you want to include groups in both subtrees in RHQ, then set the search base high up the tree, to dc=example,dc=com. If you only want the engineering groups to be used by JBoss ON, then set the search base to ou=Engineering,dc=example,dc=com.
    4. The group filter. This creates the search filter to use to search for group entries. This can use the group object class, which is particularly useful if there is a custom attribute for RHQ-related entries. This can also point to other elements — like the group name, a locality, or a string in the entry description — that are useful or meaningful to identify RHQ-related groups.
    5. The member attribute. There are different types of group object classes, and most use different attributes to identify group members. For example, the groupOfUniqueNames object classes lists its members with the uniqueMember attribute.
  4. After LDAP authorization is enabled, then you can associate the roles in RHQ to the appropriate groups in the LDAP directory.
    1. Go to the Administration > Roles area.
    2. Click the name of the role to edit.
    3. Click the LDAP Groups tab.
    4. The list of the LDAP groups discovered in the search base will be listed. Select the groups to associate with the role from the list.

How can I specify command-line options for the Server JVM?

On UNIX

If you want to override the default max heap and permgen sizes, set them via the RHQ_SERVER_JAVA_OPTS environment variable, e.g.:

Set all other JVM options via the RHQ_SERVER_ADDITIONAL_JAVA_OPTS environment variable, e.g.:

On Windows
Overriding Properties in the Java Service Wrapper

How can I confirm my server's email/SMTP settings are correct?

Each server is configured to talk to a particular SMTP server. This configuration
is found in the rhq-server.properties file:

# Email
rhq.server.email.smtp-host=localhost
rhq.server.email.smtp-port=25
rhq.server.email.from-address=rhqadmin@localhost

If you want to confirm that these settings are correct and the server can actually send emails successfully, log into the GUI as the "rhqadmin" user and go to the "test email" page located at http://<your-server>:7080/admin/test/email.jsp.

When do Baselines auto-calculate?

Go to the Administration>SystemConfiguration>Settings page of the RHQ GUI. You will see settings for Automatic Baseline Configuration Properties. Baseline Frequency determines how often the baselines will be calculated. By default it is 3 days. This means that every 3 days a new set of baselines are calculated (except for those that were manually set by the user - those remain pinned to the baselines set by the user). Baseline Dataset determines the minimum set of data that must have been collected for a measurement before a baseline for that measurement is calculated. The default is 7 days. For example, when it is determined that baselines should be calculated (every 3rd day by default), only those measurements that have data that are 7 days old or older will get a baseline calculated. Any measurements that do not yet have data from 7 days ago will be skipped. This ensures that when a measurement's baseline is calculated, you have a good representative set of data to include in the calculation (e.g. by default, you will have 7 days worth of data that will be included in the baseline calculation).

I deleted a Platform from inventory. How do I get it be rediscovered, so I can re-import it?

Just force an Agent discovery by issuing the following command at the Agent command prompt:

Alternatively, you can register a new Agent by restarting the agent, specifying the --fullcleanconfig (-L) option, on the machine corresponding to the Platform you deleted. The Platform will get rediscovered. That is, first quit the agent (using the 'quit' command), then run:

In previous versions of RHQ (RHQ 4.2 and earlier), you will want to use --cleanconfig (-l). What you want to do is, after your platform and agent have been purged from the server side database, you need to start the agent without any previous security token. In RHQ 4.2 and earlier, you did this by starting the agent with the --cleanconfig option. RHQ 4.3 and up, it is --fullcleanconfig (or -L).

My server machine does not have a writable directory called /var/run. How can I get my rhq-server.sh script to successfully write out its pidfile?

Set the environment variable RHQ_SERVER_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-server.sh and change /var/run to the directory that you want.

The default location for this pid file has changed in 2.2 and up - it is now written to the /bin directory of the server install directory.

When I try to start the server, I get an exception whose cause says "Exception creating identity" and the server fails to start. How can I fix this?

The message you are probably refering to looks something like this:

Caused by: java.lang.RuntimeException: Exception creating identity: my.host.name.com: my.host.name.com
| at org.jboss.remoting.ident.Identity.get(Identity.java:211)

This is not RHQ specific - its JBoss/Remoting failing. See: https://jira.jboss.org/jira/browse/JBREM-769. The core issue (that is hidden from you, because JBoss/Remoting isn't bubbling up the real error message, as per that JIRA) is typically because your hostname is not resolvable. Make sure your hostname (as reported to you in that exception message, e.g. "my.host.name.com" is a valid hostname and make sure it is resolvable by your machine (i.e. is it in /etc/hosts?? can you get an IP for it via nslookup??)

My server logs are showing the message "Have not heard from agent ... Will be backfilled since we suspect it is down". What does that mean?

When you see

[org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [<some agent name>]
since [<some date/time>]. Will be backfilled since we suspect it is down

it means that the agent did not send up its availability report in the required amount of time (which called the "agent quiet time" - 15 minutes by default but is configurable in the Administration>SystemConfiguration>Settings page). When this happens, the server gets worried and suspects that agent is down - at which time it "backfills" the availability of ALL resources managed by that agent to DOWN (you'll see the availabilities turn RED).

This can happen for a number of reasons:

  1. the agent really did shutdown or crash
  2. the machine the agent is running on completely shutdown or crashed
  3. the network between the agent and server went down, thus prohibiting the agent from connecting to the server and sending the availability report
  4. the machine the agent is running on is bogged down thus slowing up the agent and prohibiting the agent from being able to send up reports fast enough.

What ports do I have to be concerned about when setting up a firewall between servers and agents?

See Prerequisites#FirewallConfiguration.

I installed the Server as a Windows Service, but it is failing to start with no error messages. How can I start the Server as a Windows Service?

You probably installed the Server to run as the "Local System Account" and that account probably doesn't have the proper permissions to run the Server. Perhaps your machine has been locked down due to security concerns and that Local System Account cannot access the network or run Java or any number of things. To solve this, create a user on your Windows box that can run the Server properly (you can test it, log in as the user and execute "rhq-server.bat console" to see if it can be run by that user). Then, install the Server as a Windows Service with the RHQ_SERVER_RUN_AS_ME environment variable set to "true":

rhq-server.bat remove
set RHQ_SERVER_RUN_AS_ME=true
rhq-server.bat install

For more information on installing the Server as a Windows Service, see Windows Installation.

How do I fix a 'ORA-12519, TNS:no appropriate service handler found' error when using Oracle XE?

Although not for production use it is not uncommon to use Oracle XE for test or development environments. For Oracle XE 10g the following setting should be applied in addition to any other settings, if it has not been set to any other non-default value:

This setting requires a restart of Oracle XE.

When I try to create a bundle by uploading a Ant recipe XML directly, the XML content seems to get corrupted and tags are placed out of order.

If you file upload a ANT script as the recipe, you can't use XML notation like <property name="a" /> - you need to explicitly provide the ending tag like this: <property name="a"></property>. If you don't want to be forced into using that notation, just copy-n-paste the ANT script content directly in the text field, as opposed to using the file upload mechanism.

How do I stop the server from periodically logging messages that say a plugin is "the same logical plugin" but has "different content" and "will be considered obsolete"?

This is a known issue and is documented as bugzilla 676073. There is a workaround that is documented in that issue, which basically tells you to shutdown the server, remove the plugin jars from the server's filesystem, and restart the server.

How do I restrict which agents are allowed to connect to the server?

You can change the RHQ Server's JBoss/Remoting servlet configuration to restrict the IPs agent requests can come from. Therefore, if you know your agents are on a specific subnet, you can restrict connections to only that subnet. Here's how. You need to create a file named <rhq-server-install-dir>/jbossas/server/default/deploy/rhq.ear/jboss-remoting-servlet-invoker-2x.r3040.jon.war/WEB-INF/context.xml with the following content:

where the "allow=" attribute are the IPs you want to allow agents to come from. All other IPs will not be allowed to connect.

When I try to pass --reconfig option to the rhq-installer.sh script, it tells me it is not a valid option anymore.

In RHQ 4.6, the installer script had a deprecated --reconfig option, but versions thereafter no longer have that option.

Due to some limitations within JBoss AS7.1.1.Final, there were some settings that would not take effect immediately upon restart. These were settings such as Tomcat security settings, the SMTP server hostname and the database username and password. The installer's --reconfig option (which was automatically invoked for you by RHQ 4.6's rhq-server.[sh,bat] script whenever you started the server) attempted to reconfigure the server if changes were detected in the rhq-server.properties. Because RHQ 4.7+ no longer supports AS 7.1.1.Final (and later versions of the app server fixed almost all of the limitations), those workarounds invoked by the installer's --reconfig option are no longer needed and were removed. Therefore, the --reconfig option itself was removed.

java.io.tmpdir is not accessible or does not exist

If the server fails to start and you see a message like...

  • Startup failed: java.io.tmpdir 'xxxx' does not exist
  • Startup failed: java.io.tmpdir 'xxxx' is not a directory
  • Startup failed: java.io.tmpdir 'xxxx' is not readable
  • Startup failed: java.io.tmpdir 'xxxx' is not writable

... make sure the temporary directory denoted by java.io.tmpdir exists and has correct permission settings.

On Windows

See The location of java.io.tmpdir.

On *NIX

Almost always, java.io.tmpdir will default to the system default temporary directory (e.g. /tmp), which should exist and have correct permissions. To use something else, you can add -Djava.io.tmpdir=xxxx to the RHQ_SERVER_JAVA_OPTS environment variable.


Agent

How do I get debug messages from the RHQ Agent?

The easiest and quickest way to get your agent to start logging debug messages is, before starting your RHQ Agent, to set the environment variable RHQ_AGENT_DEBUG to "true". Now when you start the agent, both the launcher scripts and the agent itself will output debug messages. When you use this environment variable, the agent will use an internal log4j configuration file called "log4j-debug.xml" which is located in the agent's main jar file.

If you want more fine-grained control over what log4j categories have DEBUG priority, you can directly edit the conf/log4j.xml file (modifying this file requires an agent restart in order to pick up the changes). You must not set RHQ_AGENT_DEBUG if you want the agent to use this log4j.xml file (setting that environment variable will cause the agent to override this log4j.xml with the internally configured log4j-debug.xml file which enabled all categories for the DEBUG level).

The log messages can be found in the log files located in the <agent-install-dir>/logs directory. If you are launching the RHQ Agent on Windows using the service wrapper, you must set RHQ_AGENT_DEBUG and then install the service via rhq-agent-wrapper.bat install.

If you want to enable or disable debug messages while the agent is still running, you can use the "debug" prompt command (type "help debug" at the agent prompt for more info).

You can write your own log4j.xml files, put them in /conf and use them via the debug -f command. For example, debug -f custom-log4j.xml. This means that while the agent is running, you can switch between log4j.xml files if you want by simply using debug -f and passing in the log4j.xml file you want to use. RHQ also ships with log4j-warn.xml in the agent jar too - this can be used if you want the agent to be especially quiet (only WARN and above messages are logged, INFO and below are not).

For example, during runtime you can invoke debug -f log4j-debug.xml which will "turn on debugging" while the agent is still running. When you are done debugging, you can invoke debug -f log4j.xml which switches the agent to the default log4j configuration without having to shutdown and restart the agent. You can get fancy with your own log4j xml files - so if you want to just enable debug for your own plugin for example, you can write your own log4j.xml, put it in conf/ and switch between that log4j configuration and the default one all without having to recycle the agent.

How do I start the RHQ Agent fresh, as if newly installed?

If you want the agent to clean itself of all previous inventory and force itself to re-register with the server, shutdown the agent and restart it with the --cleanconfig command line option. If you do this, you may also pass in the --config argument to have it start up with a configuration file you specify (otherwise, the default conf/agent-configuration.xml will be used). The -l option is an alias for --cleanconfig and -c is an alias of --config - therefore your command line can be similar to the example below (if on Windows, replace .sh with .bat):

rhq-agent.sh -l

or

rhq-agent.sh -l -c my-agent-configuration.xml

where both will clean the old configuration, but the first loads the default agent-configuration.xml and the second loads your custom my-agent-configuration.xml (it will look in the conf/ directory, unless you specify a full path to a location other than conf/).

In RHQ 4.3, the semantics of the --cleanconfig option is slightly different from previous versions. --cleanconfig (-l for short) will now remove all old agent configuration settings except for the security token, which is used to identify the registration of this agent with the server. If you want to purge even that security token configuration setting, use --fullcleanconfig (or -L for short). RHQ 4.3's --fullcleanconfig is the same as RHQ 4.2's --cleanconfig option.

My resources went "red" after starting the agent with -u / --purgedata or -l / --cleanconfig

If you purge the persisted data that the Agent maintains, you must also reset the "connection properties" for each resource that Agent is managing. If your resource had manually overridden connection properties (ones that you used the web console to set), then you will need to set those again. To ease the burden of doing this, consider creating compatible groups for these resources; this will enable you to set the connection properties across all members in the group at the same time.

In RHQ 2.1 and later, this is no longer an issue, since connection properties are synchronized from the Server after an Agent's data files are purged.

How can I update the plugins on all my agents?

When you add a new plugin to your system, or you upgrade an existing plugin, you normally want to tell all of your agents to update their existing plugins with the new plugin versions. You can individually do this by executing the prompt command "plugins update" at any agent prompt. Or you can individually execute the operation "Update All Plugins" from the UI's Operation Tab for each "RHQ Agent" resource. If you want to update all of your agents so they all download the latest plugins, you can use the DynaGroup feature along with the Group Operation feature to do this. First, create a DynaGroup with the expression:

resource.type.plugin = RHQAgent
resource.type.name = RHQ Agent

This creates a compatible group that dynamically adds all RHQ Agents as members to that group. Note that if you already have a compatible group with your agents as members, you can skip this group creation step.

Next, traverse to that compatible group that contains all your agents. You should see an Operations tab. From here, just invoke the "Update All Plugins" operation on that group. This will tell all of your agents in that group to update their plugins. Once that group operation is completed, all of your agent will have the latest, most up-to-date versions of all plugins.

How can I change the agent name after it has already been registered?

When you start the agent for the first time, the first setup question asked is for the "agent name". This is a name that must be unique across all agents in your environment. Once registered you cannot change this name. Anytime you attempt to re-register this agent, you must re-register it with the same name that it was registered under before. To literally change agent name, you must uninventory the platform managed by that agent (if its not yet in inventory, you must commit it to inventory, then immediately uninventory it). This cleans up the database of any remnants of that agent. You can then start the agent clean (using the --fullcleanconfig (or -L for short) option) and start over. In RHQ 4.2 and earlier, the option to use is --cleanconfig (or -l for short).

Note that this "agent name" is not the same as the "RHQ Agent resource name" that you see in the UI. If you import an RHQ Agent resource into inventory, that resource's name will be something like "agentname RHQ Agent" where "agentname" is agent name you provided at agent setup time. This RHQ Agent resource name can be changed by editing its value within the Inventory tab. Changing this name does not change the name that the agent is registered under. Your agent is still registered under its original agent name.

How can I run more than one agent on a single machine?

This should be rather easy. First, you must start each agent with their own set of preferences. You do this with the command line argument "-p" (aka "--pref"). Each agent must be given their own named preferences store. For example, to start two agents on the same box, these two commands can be issued:

rhq.agent.sh -p agent1
rhq.agent.sh -p agent2

In order to make sure your agents are uniquely identified within the system, you must ensure that they each have unique agent names (this is no different than starting a single agent - all agent names within the RHQ environment must be unique). So when you configure your agents, make sure they use different agent names.

Finally, you must ensure they have unique hostname/port combinations. Since the server will need to talk to your agents individually and separately, you must assign them different hostname/port combinations. You cannot have two agents with the same hostname/port combination.

Other than those caveats, running multiple agents on the same machine can be done rather easily.

I want to run agents on all my machines, but only one starts OK - the rest fail due to binding to a wrong address

If you want to run multiple agents, but many fail to start with this error:

FATAL [main] (org.jboss.on.agent.AgentMain)-
{AgentMain.startup-error}The agent encountered an error during startup
and must abort java.net.BindException: Cannot assign requested address

then there are a couple of things you need to consider.

First, if you changed your agent-configuration.xml manually (say, to change IP addresses), did you do that after you initially setup the agent? The agent's configuration XML file is not referenced after the agent is setup - it doesn't need to because its configuration is persisted using Java Preferences (this is so it can support agent updates or agent re-installs without losing its configuration). If you want to change the agent's configuration file and have those changes picked up, restart the agent and pass it the --config command line option (or -c which is shorthand for --config). This tells the agent to re-read the configuration file and make that its configuration, overriding any old configuration it persisted before.

The other question to ask is - is your home directory stored on NFS? If so, then you are probably picking up the same Java Preferences across all your machines (see $HOME/.java - that is the default location where Java stores Java Preferences on UNIX - on Microsoft Windows, it goes in the registry so this might not be relevant if you are on Windows). If you are running the agents as the same user and your user's home directory is shared (via NFS or some other sharing technology) then one solution is to have your agents use different Java Preferences names.

Each time that you start your agents, you need to tell them where they can find their preferences. You tell the agent your new preference name via --pref (or its shorthand notation of -p). Each agent must have their own preference node name. On UNIX, you could use `hostname` as its value, for example.

Read the comments at the top of agent-configuration.xml, it has some relevant info in there. You can also read the usage help too: rhq-agent.sh --help.

If you are using RHQ v1.1 agents, you must edit your agents' agent-configuration.xml files and change their Java preferences node names from "default" to something else that makes them unique across all agents. For example, change:
<node name="default">

to

<node name="another-agent-default">

Since you changed the configuration file, don't forget that the first time you restart the agent you need to pass in -c too):

rhq-agent.sh -p another-agent-default -c agent-configuration.xml

Thereafter, the agent need only be passed in -p every time you restart it (the new configuration will be persisted for you).

If you do not want to be forced to edit your configuration files or pass the -p option, the other alternative is that you can define the system property java.util.prefs.userRoot to point to some other, unique, location (e.g. /etc/rhq-agent-prefs). When the agent starts, Java will use the value of that system property as the location where it will store its Java Preferences. You set this system property on the agent via the environment variable RHQ_AGENT_ADDITIONAL_JAVA_OPTS. When you set that environment variable, rhq-agent.sh will add its value to the default set of Java options when passing in options to the agent's Java VM:

set RHQ_AGENT_ADDITIONAL_JAVA_OPTS="-Djava.util.prefs.userRoot=/etc/rhq-agent-prefs"
rhq-agent.sh

When starting the Agent via a Windows service, the Agent fails to start, and I see the error "java.lang.IllegalStateException: The name of this agent is not defined - you cannot start the agent until you give it a valid name" in the Agent wrapper log file. What does this mean?

The Agent cannot ask for its initial setup configuration when installing as a Windows service (because there is no console for the user to see and answer the prompts). This means that you need to either preconfigure the agent or run the agent in standard (non-service) mode once as the user that should run the service in order to answer the setup questions and configure it before installing it as a service.

My Agent setup is correct but my Agent is getting "Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker."

Starting in RHQ 1.1, the Server information defined in your Agent setup is used only for initial contact with a RHQ Server (i.e. the server hostname/IP address you provide to the agent startup setup prompt is only used when initially registering with the server).

Since RHQ 1.1 supports a multi-Server "High Availability Cloud", the Agent may be serviced by any Server in your RHQ Server network. The Agent will try to connect to any Server in the cloud -and it does so via the Server endpoint as defined for the Server at Server install-time, or via the RHQ GUI's server details pages (Administration>Servers).

This error is typically seen when the Server's endpoint address is not set to something that can be resolved by the Agent. The Public Endpoint Address set for each Server must be resolvable by every RHQ Agent.

Check your Server endpoint information via the GUI's HA Administration page and update if necessary. After the update, restart your Agent.

My agent machine does not have a writable directory called /var/run. How can I get my rhq-agent-wrapper.sh script to successfully write out its pidfile?

Set the environment variable RHQ_AGENT_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-agent-wrapper.sh and change /var/run to the directory that you want.

Explain how the agent scans for resources

When the agent performs discovery, it does so using two different types of "scans" to try to find resources.

A "server scan" detects top-level servers that run on your platform - things like JBossAS servers, Postgres servers and the like. These scans run by default every 15 minutes. The setting that controls this is "rhq.agent.plugins.server-discovery.period-secs".

A "service scan" detects lower-level and more fine-grained services that are running in already detected and imported top-level servers. Things like EJBs running in JBossAS, tables in a Postgres databases/tables or VHosts in Apache. These scans run by default every 24 hours (i.e. 1 day). You must have already imported the servers in inventory before services can be discovered! These types of scans are normally very "expensive" to perform since they do probing inside the managed resource, so we don't do it often (which is why its defaulted to 24 hours). The setting that controls this is "rhq.agent.plugins.service-discovery.period-secs".

The above two types of scans are "discovery" scans - in other words, they attempt to discover new resources that the agent does not have in its inventory yet but they might be ones you want to manage. There is also a third type of scan - an "availability scan". A availability scan is not a discovery scan; however, it is very important to understand what it is. When the agent performs an availability scan, it tries to determine the availability of resources that are already discovered and committed to inventory (i.e. these resources were previously discovered by one of the two types of discovery scans previously mentioned). These availability scans run by default every 5 minutes - the setting that controls this is "rhq.agent.plugins.availability-scan.period-secs". After an availability scan completes, the agent will have an up-to-date status of which resources are either UP or DOWN. Once the availability scan is finished, the agent will send an "availability report" to the server. This is how the server will know which resources should currently be displayed as UP or DOWN (aka "green" or "red"). Note that this availability report serves a second purpose - it informs the server of the agent's own availability! In other words, when the server receives an availability report from agent A, not only does the server now know the UP or DOWN status of that agent's managed resources, but it also implicitly knows that agent A itself is UP. This agent availability will thus reset the clock on that agent's "quiet time", which is used by the server to determine when it should suspect that agent is DOWN. For example, if the "max agent quiet time" server setting is set to 10 minutes, and the server hasn't received an availability report from agent A in over 10 minutes, the server will suspect that agent A is DOWN (which has a side effect of causing the server to "backfill" all of the agent A's managed resources to the availability status of DOWN).

How can I see the agent persisted configuration?

The agent's configuration is initially read from agent-configuration.xml and overlaid with values you enter at the setup prompts at startup. After the agent is initially configured, it will persist that configuration and never look at agent-configuration.xml (unless you clear the configuration). The actual location on the file system where the configuration is persisted is platform dependent - for example, on UNIX, its typically "$HOME/.java" (see the Java Preferences API documentation for more information on how and where Java persists preferences). For more details, read the comments at the top of the agent-configuration.xml file. Configure the RHQ Agent and Preconfiguring the Agent also has more information on this.

There are several ways in which you can view the agent's persisted configuration.

  1. If the agent is in your RHQ inventory, simply go to your agent's Config tab to view its live configuration (this is the same configuration that is persisted)
  2. If the agent is currently running in non-daemon mode (i.e. you have the agent prompt on your console), you can use the "getconfig" or "config" prompt commands to view the live configuration. Type "help getconfig" or "help config" for more information.
  3. If the agent is in your RHQ inventory, you can execute the "Execute Prompt Command" operation and invoke the "getconfig" prompt command to view one or more preferences.
  4. Because the agent configuration is stored in the standard Java Preferences API backing store, you can use any tool that can examine Java Preferences. One such tool is the Java Preferences Tool. This is a GUI tool that can give you a "file system" like view into your Java Preferences. The agent preferences are stored in the "User" preferences node under the node name "rhq-agent". Depending on the -p option that is passed to the agent when it is started, the actual configuration settings are found under a sub-node under "rhq-agent". The default preferences node is called "default" so typically your agent's persisted configuration is found in the user preferences under "rhq-agent/default". WARNING! Do not attempt to change the values of the preferences using third-party tools like this without knowing what you are doing - you could render the agent useless if you change the wrong preference to the wrong value. Use this mechanism only to view your agent's configuration

How can I do a "clean config" for an agent running as a background Windows Service?

This is similar to the FAQ above ("How can I see the agent persisted configuration"). The RHQ Agent Windows Service (like all Windows Services) will run as a specific user. You need to go to that user's Windows Registry and delete the RHQ Agent's configuration node. The RHQ Agent uses the standard Java Preferences API, so the agent's configuration is stored as a node under the normal Java Preferences location in the Windows Registry. This should be something like "HKEY_CURRENT_USER\Software\JavaSoft\Prefs\rhq-agent\default" where "default" is the name of the preferences node you are using. If you did not override this (via the --pref command line option), then this is going to be "default". Rarely will anyone need to change this preferences node name, so assume yours is named "default". You would just have to delete that "default" node. Note, however, that you deleting this will probably not be what you want! Because if you delete this node, the agent will not be preconfigured the next time it starts up - and if you start up the service in background, it will fail (because it hasn't been fully configured - it needs to ask you those setup questions). So start the agent as the same user that you will run it as when it is a Windows Service in the foreground, answer the setup questions then shut the agent down again. Now you can start the agent as a service once again. You can avoid this additional step by preconfiguring your agent ahead of time. All of this is documented in the pages RHQ Agent Installation and Running the RHQ Agent.

If you want your RHQ agent to re-read its configuration from agent-configuration.xml using a script, you won't be able to start it in the foreground, which makes re-configuring it a little bit more difficult. If you have your agent running and inventoried in the RHQ server, you could invoke config --import agent-configuration.xml command from the RHQ server's UI using the "Execute Command Prompt" operation on the RHQ Agent resource (using an RHQ CLI script).

If however you don't have your agent running or you don't have it in the RHQ server's inventory, another option you have is to update the rhq-agent-wrapper.conf file and add the following line after wrapper.app.parameter.2 option in that file:

This will force the agent to re-read its configuration from the agent-configuration.xml everytime it is started as a service. In this case you have to make sure that the agent-configuration.xml is preconfigured.

How can I get a dump of inventory information from an agent running on another machine?

The use-case here is that someone (call him "the customer") is running an agent in their environment and is having problems. You suspect the customer's agent inventory is corrupted somehow. As a developer, you would like to know exactly what the agent thinks is in its inventory so you can debug the problem.

To get this information, you must get the customer's agent "data/inventory.dat" file. Copy that file to your local machine (it doesn't matter what directory you put it in). Now, run your own agent on your own local machine - make sure you run that agent with the same plugins that the customer was running with. The agent doesn't necessarily have to be connected to a server, but the plugin container must be started (that means the agent has to have been registered). Now, execute this agent prompt command:

inventory --xml --export=/customer-inventory.xml /the/customer/inventory.dat

where /the/customer/inventory.dat is the full path to where you copied the customer's inventory.dat file. If you do not specify the --export option, the XML will simply be dumped to the stdout console window, otherwise, the XML is stored in the full path you specify.
Now you have an XML file that describes what the customer's agent thinks is its inventory.

I need to change the IP Address of my agent machine - how do I keep my server and agent up to date with that change?

The agent has a configuration preference named "rhq.communications.connector.bind-address" whose value is that of the IP address the agent binds to when it starts its server socket (the thing it listens to for incoming messages from the server).

If you change the agent's IP address (and invalidate the old agent IP address), you have to do a couple things:

  1. You have to change the agent's configuration so that preference value is the same as the new IP address. You can do this by issuing a setconfig prompt command on the agent prompt: setconfig rhq.communications.connector.bind-address=<the new IP address>. (NOTE: do not change agent-configuration.xml and think the change will take effect - please read and understand the comments at the top of agent-configuration.xml before you change that configuration file). If your agent is running in the background as a daemon process, you'll have to shut it down via rhq-agent-wrapper.sh/bat stop and re-start it via "rhq-agent.sh".
  2. Restart the agent once you change the IP address preference value.

Once the agent is restarted, it will use that new IP address.

When I shutdown the agent, the RHQ Server takes more than 14 minutes to detect the agent was down. Can I configure it to not take so long?

You are killing the agent entirely, so the agent is never reporting any availability data at all to the server.

To support cases like this (where the agent is completely down or unresponsive), periodically, the server needs to check to see what agents it hasn't heard from in a long time and then determine which of these "suspect" agents are really down.

Read this for background on this issue: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-1098. That issue tells you why we increased the default time.

Read this for more information - it talks about the new default time: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-2349. It states, in part, "We have a quiet time of 15m right now (recently changed to that)."

What does this mean? It means, by default, if we have not heard from an agent in 15 minutes (what we call the agent's "quiet time"), only then do we mark that agent and all of its resources down. This is why it takes more than 14 minutes to detect your agent was down.

If you do not like that, and you want it to report "down" faster, then, yes, you can change this - its configurable in the GUI... go to the main menu "Administration>SystemConfiguration>Settings" and change the setting "Agent Max Quiet Time Allowed" to something shorter. Note: the shorter your allowed quiet time interval is, the greater the possibility of a "false negative" - for example, if you set quiet time to 5 minutes and if your server can't process all your agent's availability reports fast enough, it may think it hasn't heard from an agent when in fact it just hasn't had time to process the latest avail report. When an agent is determined to be down, the server has to "backfill it" - marking all of its resources down - and this is expensive. So you don't want to do this often.

Do I have to run the agent as root?

You do not necessarily have to run the agent as root. It all depends on how much and how deep you want to manage your resources.

For example, there is a Postgres plugin that lets the agent probe the Postgres configuration file postgres.conf. However, by default, Postgres installs itself with very strict file permissions on that file - and if you run the agent as a non-root, non-postgres- privileged user, it won't be able to read that file and manage it (and you'll see agent log messages saying so).

The same is true for lots of other plugins that try to manage things that touch privileged files (like iptables and things like that; even JBossAS app servers might be installed with strict file privileges that might cause this).

If you run the agent as root, you are giving the agent privileges to manage all those things - if you don't, you are giving the agent restricted views of your managed resources. This might be what you want, hence, you don't have to run the agent as root. But if you don't run the agent as root, you must be willing to accept that the agent will not be able to manage some things and will log messages saying so.

How can I find out what environment variables and Java system properties are set in my agent JVM process?

The prompt command "version" can give you a list of the agent process' environment variables and system properties. At the agent prompt, type "help version" for the syntax of that command. In short "version --sysprops" will provide a list of all the system properties, "version --env" will provide a list of all the environment variables.


Log messages

What are "Command failed to be authenticated" messages?

Agents are assigned security tokens when they first register with the server. The token is one way an agent identifies itself with the server. If an agent does not identify itself with any token, or if it identifies itself with a wrong token, the server will deny access to that agent - in other words, the server will reject commands that come from that agent until that agent has properly registered. If an agent is continually causing "failed to be authenticated" errors on the server similar to this:

02:31:33,095 WARN [CommandProcessor] {CommandProcessor.failed-authentication}
Command failed to be authenticated! This command will be ignored and not processed:
Command: type=[identify]; cmd-in-response=[false]; config=[{}]; params=[null]

then it usually means the agent has been misconfigured, or it is an unknown agent attempting to identify itself as another agent. Restart your agent with the "--cleanconfig" command line option to clean out its configuration and re-register.

PLEASE NOTE Do not rely on the security token mechanism as a way to protect your RHQ environment from intrusion. If you require secure communications between servers and agents, see the Securing Communications section to learn how to setup SSL for authentication and encryption.

What are "fail-safe cleanup" messages?

You'll often see messages in your logs that look like:

13:43:10,781 WARN [LoadContexts] fail-safe cleanup (collections) :
org.hibernate.engine.loading.CollectionLoadContext@103583b
<rs=org.postgresql.jdbc3.Jdbc3ResultSet@d16f5b>

Please ignore these messages as they are normal and expected. The messages deal with the underlying ORM technology used (Hibernate) and how it automatically cleans up after itself to prevent memory leaks.

I am seeing this error in my server logs or stack trace: "WARN  [QueryTranslatorImpl] firstResult/maxResults specified with collection fetch; applying in memory." What does that mean and what is causing it?

This error is issued by the Hibernate service and can be triggered for a number of different reasons. This error can be ignored.


Plugins

Platform Plugin

How can I collect syslog messages as RHQ Events?

The Linux platform plugin can monitor syslog messages by emitting them as events. Syslog messages can be collected by the plugin by either reading syslog message files or by receiving them over a socket listener.

In either case, syslog must be configured to format the messages in a way that RHQ can parse. You can either tell RHQ (in the platform's plugin configuration - aka connection properties) what regular expressions can parse your syslog messages, or in your syslog config file (e.g. /etc/rsyslog.conf), you should format your messages that RHQ understands out of the box. In the latter case, if you make sure you define the syslog message format like below, the Linux platform plugin can parse it:

$template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n"

If you then use "RHQfmt" in your syslog configuration so it writes messages out in that format, you'll be able to have RHQ understand the log messages fully. For example:

$template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n"
*.* /var/log/messages-for-rhq;RHQfmt
*.* @@127.0.0.1:5514;RHQfmt

That will both write syslog messages to /var/log/messages-for-rhq and will send the messages over TCP to a listener on port 5514 (you would configure the platform's connection properties to listen to this port).

JBossAS Plugin

Why does only 1 JBossAS server show "green" availability and all the rest show "red" even though I made sure all of my JNP credentials are configured properly in my resources' connection properties?

There is a problem in the way the JBossAS JNP client works. See RHQ-1030 for the full description of the problem, but in short, if you a managing multiple JBossAS servers on a single box, all of your security credentials for those servers must be the same (i.e. the JNP username and password must be the same).

Is it possible to monitor JBoss AS 5.1?

If you are trying to monitor JBoss AS 5 using RHQ but cannot get the agent to discover it, it is because JBossAS 5.1 is not manageable by RHQ. This is because there were some problems with JBossAS's Profile Service, specifically with its remote interface, which were not fixed until after JBossAS 5.1 was released. However, you can monitor JBoss EAP 5.0 or later or JBossAS 6.0.

JMX Plugin

When I import a server like JBoss EAP 5 or Tomcat, I see its child JVM resource in inventory, but it is red (DOWN). Why?

You probably started your server with JMX remoting enabled and secured. For example, you probably set something like the following system properties in your server:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=5222
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=true
-Dcom.sun.management.jmxremote.password.file=/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/jmxremote.access

When this happens, the JMX plugin's code will examine the command line for your server's process, see that the JMX server is remoted and secured, and try to set up its secure, remote JMX connector. When it does, it will not have the appropriate credentials to use and thus can't connect to the remote JMX MBeanServer and will consider it in a DOWN state. You must go into your new child JVM resource's Connection Properties page in the GUI and enter your valid credentials (username and password) that you set in your JMX remote access files. If you don't want RHQ to go through that remote JMX endpoint, you could try to unset all the Connection Properties for that JVM resource except the Type property - set that to "Parent". This tells RHQ to use the connection to the parent resource in order to communicate with the JVM resource. If your JVM is a child of, say, a JBossAS 4 or a JBoss EAP 5 resource, this is valid - the parent of the JVM (which is the JBossAS server resource) should be able to provide the connection to the JVM resource information.

Postgres Plugin

Why is the agent showing an error in my postgres discovery about authentication failed for user "postgres"?

The Postgres plugin attempts to log into the database server using the username and password of "postgres". In many installations, this is a default superuser and will work. However, it is also possible that this login could fail for a number of reasons:

  • The "postgres" user has been deleted.
  • The password for the "postgres" user has been changed.
  • On Linux, the administrative login has been set to "ident sameuser".

In many cases, this can be alleviated as follows:

  • Inventory the discovered Postgres resource. Its availability will show as down and it will not find any child resources.
  • Navigate to the inventory tab for the Postgres resource.
  • Under Connection Properties, click the Edit button.
  • Change the "role name" and "role password" fields to reflect a valid super user account on the Postgres instance.

Additionally, Postgres may need to be changed on Linux systems to allow password based logins (i.e. "md5" v. "ident sameuser" settings in the pg_hba.conf file). Consult the Postgres for more details.

Why are most of the metrics for my Postgres resource showing up as NaN?

In many installations, Postgres will not start its statistics collector by default. To enable statistics collection, add (or change) the following line in the postgres.conf file:
stats_start_collector = on

How many database connections are necessary to monitor a Postgres database?

Each Postgres database inventoried in RHQ requires 1 connection.

Why can't I drop my database that is inventoried in RHQ?

With the frequency of availability and statistics monitoring, the Postgres plugin keeps an open connection to the database. As such, when attempting to drop a database currently inventoried in RHQ, an error will be thrown about the database being in use. In order to drop the database, the RHQ Agent monitoring the database must be shutdown or the database resource should be removed from RHQ. This will close the postgres plugin's connection to the Postgres server and thus allow you to drop the database.

Apache Plugin

Where can I get the connectors?

The Apache plugin monitors an Apache Web Server via custom modules like the SNMP connector. You can download the open-source versions of these connectors and install them in your Apache Web Server.

Script Plugin

The script plugin can be used to capture metric data. Explain how this is done.

Because the script plugin is "generic" in nature, it doesn't know what script you are managing - it could be anything. Thus, there is no way to know what metrics you want to collect (or even how to collect them). This is why there are no metrics defined in the script plugin by default.

However, the script plugin is made to be extensible. Probably the easiest way to do this is to just edit the descriptor and customize it for your own needs. Keep the other jar content intact (you still need the Java classes that come with it) but do things like:

  1. Change the name of the jar to some custom file name you want
  2. Change the name of the plugin as defined in the descriptor (<plugin> element, "name" attribute) so its your own custom name
  3. Change the rest of the descriptor to match your custom needs (add <metric> definitions, for example).

Deploy the plugin jar like any other plugin and get the agents to update their plugins and your new plugin will be available to all your agents.

Here's an example of adding a metric. Suppose you have a script that takes the argument "--getElephantCount" - and suppose that script will print the text "Elephant Count is ###" to stdout when you pass that option. For example, if I enter the script command from the console, it would look something like:

> myscript.sh --getElephantCount
Elephant Count Is 204

Suppose further that you want to track the number of elephants you have throughout the day - in other words, "elephant count" is your metric. Here's what your custom plugin's <metric> definition could look like in your new plugin descriptor:

<metric property="{--getElephantCount}|.*Elephant Count Is ([0-9]+).*"
        dataType="measurement"
        displayName="Total Number of Elephants"
        description="The total count of elephants that exist."
        defaultOn="true"
        units="none"
        defaultInterval="300000"
        displayType="summary" />

When you deploy this, you'll see a metric "Total Number of Elephants" that gets collected. Note that the regex has one capture group in it - that's the "([0-9]+)" in the "property" attribute value. If there is a capture group defined in your regex, the plugin will match the stdout output of the script, match it to the regex and take the value of the capture group as the value of the metric. If you don't have a regex, the plugin assumes the entire output is the metric value (you wouldn't have to define a regex if the output of the script was simply a number like "3421" - but our script dumps a human readable string, so we have to tell the plugin to pick out the metric value from the sentence "Elephant Count Is 204" - and we do that via a regex with a capture group).

Note that if the exitcode is the metric value itself (i.e. the metric you want collected is the status code of the script process when it exited), just set the "regex" to the string "exitcode" (<metric property="{--getElephantCount}|exitcode").

Augeas-based Plugins

What is this augeas plugin?

The augeas plugin is an "abstract" plugin that exists solely as an extension point for other plugins to extend. The augeas plugin provides the Java JNI classes necessary for other dependent plugins to use to access the Augeas native library. For example, the opensshd plugin depends on the augeas plugin because it uses the Augeas library to access the OpenSSH daemon configuration. The other RHQ plugins known to use this augeas plugin are: hosts, grub and apt.

Why does my agent log have this in it: "java.lang.UnsatisfiedLinkError: Unable to load library 'augeas': libaugeas.so: cannot open shared object file: No such file or directory"

This occurs when you have deployed one or more augeas-based plugins but your Linux machine does not have the augeas native library installed. See http://augeas.net for more information on Augeas and how you can install it on your machine.


Troubleshooting

Installer fails on PostgreSQL with "Relation RHQ_Principal does not exist"

First make sure that the RHQ server / installer is allowed to connect to PostgreSQL. You should look at the PostgreSQL configuration file pg_hba.conf where the permissions are configured. If this is OK, and the installer is able to connect to the database, please check the PostgreSQL page for a workaround.

RHQ 1.0 has trouble starting on Java 6

Java 6 is not supported on earlier versions of RHQ 2.0 - please use Java 5. RHQ 1.1 and up support Java 6.

RHQ 4.0 has trouble starting on Java 5

Support for Java 5 was dropped in RHQ 4.0. For RHQ 4.0 and later, please use Java 6.

The execution of a Script-resource fails on Unix

When I invoke the "Execute" operation on a Script resource, it immediately fails and I get an error saying that the script can not be executed.

Make sure that the execute bit is set on the resource. You can set it via chmod +x scriptname

Install fails on Oracle with ORA-01843

This issue happens when Oracle runs in a locale where the abbreviation for April is not 'APR' like in EN or DE locales. There are currently two workarounds

  • put Oracle in a different locale (most of the time not wanted)
  • Edit one of the server distribution files before running the installer
    • remove the old server directory and unzip the install package again
    • go to $SERVER/jbossas/server/default/deploy/rhq-installer.war/WEB-INF/classes
    • edit db-data-combined.xml There are a few dates in the form 01-APR-08 to be in the locale you have
    • save the file
    • re-run the installer - choose to overwrite the database

When trying to monitor a JBoss EAP instance, I get the error "Connection failure Failed to authenticate principal=null, securityDomain=jmx-console"

As explained in the JBoss EAP documentation, the jmx-console is secured by default, follow the instructions listed in the EAP Installation Guide to define a username/password. Then, in the RHQ GUI, go to the Inventory > Connection tab of the JBoss EAP Resource and set the username and password properties to the same values.

Also note that when starting a JBoss EAP instance without specifying a configuration parameter (-c), it will be started with the "production" configuration, as described in JBPAPP-198.

Why does my Apache SNMP module fail to start with the error ...?

"Syntax error on line 1376 of /etc/httpd/conf/httpd.conf: Unable to write to SNMPvar directory" (on stderr)

Please ensure the directory specified via the "SNMPVar" directive exists and is writable by the user that owns the Apache process.

"init_master_agent: Invalid local port (Permission denied)" (in the error_log file)

See if your Apache error_log contains a log message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:httpd_t:s0". If so, the SELinux (Security-Enhanced Linux) policy is preventing the httpd process from binding to the SNMP agent port (1610 by default). The easiest solution is to put SELinux in permissive mode by running the command "/usr/bin/setenforce 0" and then restart Apache. You should then see a message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:unconfined_t" in your error_log; note the "unconfined_t" portion, which indicates SELinux is no longer restricting the process.

When monitoring a JBAS instance, I'm not seeing any JVM resources beneath it?

In order for RHQ to discover JVM resources for a JBAS resource, the corresponding JBAS instance needs to be running on Java 5 or later, and it needs to have been started with the jboss.platform.mbeanserver System property set. For example, in UNIX-type environments, you can specify the following in the ${JBOSS_HOME}\bin\run.conf file:

JAVA_OPTS="$JAVA_OPTS -Djboss.platform.mbeanserver"

Note: With RHQ 1.0 and 1.0.1, if the system property com.sun.management.jmxremote is also specified this will prevent the JVM resources being discovered by RHQ. Removing this property will allow those resources to be found. In RHQ 1.0.1 this restriction is lifted and even if the system property com.sun.management.jmxremote is specified JVM resources should still be added to the RHQ inventory.

I get the error "This resource's configuration has not yet been initialized" when I got to the Configuration tab on Tomcat. Why?

This means that configuration management has not been enabled for the Tomcat resource. This can be done by going to the Tomcat server's Inventory tab, opening the Connections subtab, and enabling configuration management explicitly.

How can I debug JDBC access and trace SQL?

Use log4jdbc.

How can I stop my agent from thinking the server keeps going up and down when the server has remained running the whole time?

If you see information like this in your agent logs:

INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-offline}
The Agent has auto-detected the Server going offline [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will stop sending new messages
...
INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-online}
The Agent has auto-detected the Server coming online [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will be able to start sending messages now

it means the agent has auto-detected its server going down and back up. This auto-detection was done through the multicast detector (it is different than the detection-via-polling, which is the second way the agent attempts to detect the server's status).

If you think the agent is erroneously detecting the server going up or down, it is possible your network does not support multicast traffic or the multicast network is acting abnormally. In either case, you should disable the agent multicast detector and just have the agent rely on polling to detect changes in the server status. To turn off the multicast detection, set the following agent preferences to false:

rhq.agent.server-auto-detection
rhq.communications.multicast-detector.enabled

Those are the actual Java Preference names; you may often see these in the user interface as the following:

Auto-Detect RHQ Server?
Multicast Detector Enabled?

Since you are disabling multicast detection, make sure you keep the polling detection feature enabled (i.e. rhq.agent.client.server-polling-interval-msecs should be larger than 0, typically 60000), otherwise, the agent will never be able to know when the server goes down.

Once you reconfigure the agent, you need to restart it so the communications subsystem can pick up the changes.

My Agent fails to start with "[: 207: ==: unexpected operator".

This is a known bug in RHQ 1.2/Jopr 2.2. There is a syntax error in rhq-agent.sh that causes the script to fail when executed by non-bash shells (e.g. /bin/sh on Solaris, HP-UX, or AIX). To fix the issue, edit rhq-agent.sh and change the "==" on line 207 to "=".

Why are the graphs and charts on the Monitor tab in the GUI not displayed?

If you see errors in the RHQ Server log such as:

it is probably because you are missing some system fonts needed by Java to generate the text in the graphs/charts. If you are on Linux, make sure you have the urw-fonts package installed. On Fedora or RHEL, use:

If you are on another OS, make sure you have all the default fonts installed.

To help debug Out Of Memory conditions, how do I get the agent or server to dump heap when it runs out of memory or on demand?

Pass these JVM arguments to the server or agent, e.g. via RHQ_AGENT_ADDITIONAL_JAVA_OPTS or RHQ_SERVER_ADDITIONAL_JAVA_OPTS:

If you want the heap dump file to be dropped in a particular location, additionally specify:

-XX:HeapDumpPath=<where you want the hprof file>

See SUN JVM Debugging Options for more info.

Why do I see alerts triggered on different metric values on different alert definition conditions when they are using the same metric?

This can occur due to the nature of how alert conditions are processed when measurement data comes in from the agent. This happens when you have a single alert definition with multiple conditions that use the same metric and that alert definition uses the "ALL" conjunction (that is, the conditions must all be true for the alert to fire). For example, do not have an alert definition that says, "alert if ALL conditions are true: if metric X > 5 and if metric X < 10". Note, however, that a new feature has been added to RHQ 4 to support range checking (which is usually why people create multiple conditions using the same metric with the ALL conjunction in an alert definition - for more information, see https://bugzilla.redhat.com/show_bug.cgi?id=735262).

I created an alert definition and I know immediately thereafter my agent should have reported data that should have triggered the alert - but my alert did not fire. Where did my alert go?

When you create your alert definitions, you will need to wait a short amount of time before that alert definition makes its way into the RHQ Servers' alert caches throughout the RHQ HA Server cloud (and when it does, then it will be able to fire alerts).

This small window of time between the creation of your alert definition and when it starts firing alerts is roughly around 30 seconds, give or take a few seconds. To be safe, assume you should wait over a minute.

You will know the alert def made it into the cache and is ready to fire alerts when the RHQ Server logs messages like this:

INFO  [CacheConsistencyManagerBean] localhost took [51]ms to reload global cache
INFO  [CacheConsistencyManagerBean] localhost took [49]ms to reload cache for 1 agents

Can I uninventory a platform while the agent is still running?

This is a rare use-case and it is recommended that you do not uninventory a platform while the agent managing that platform is still running. Usually you uninventory a platform when you no longer want to manage it, so it normally doesn't make sense that you want the agent to remain running anyway.

If you want to uninventory a platform but want to later manage it again (that is, put it back into inventory), it is recommended you shutdown the agent, then uninventory the platform and then when you later want to bring it back into inventory restart the agent.

If you want to uninventory the platform while the agent is still running, you will need to ensure the agent re-registers with the server after the platform has completely been uninventoried. This means, if the agent is running in the foreground you need to execute the agent prompt command "register". If the agent is running in the background as a service, you will need to restart the agent. There is no other way to re-register the agent other than via the "register" prompt command or through restarting the agent.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.