RHQ uses Sigar native library to gather information on systems and processes. This page lists important things to know about Sigar and the RHQ objects which work with it.
Sigar Java class will try to load the Sigar native library in the java.library.path or, if it fails, in the same folder where the Sigar JAR file is located.
Sigar Java class is not thread-safe. Create an instance of Sigar Java class for each thread or synchronize access to shared instances appropriately.
Sigar class instances hold resources internally which may not be released if the Sigar object is simply garbage collected. Before getting read of any Sigar class instance always call its close method.
Once a process has died, if you call getProcState method twice on the same Sigar instance in less than two seconds, you will get the last ProcState value known by Sigar (when the process was still alive). See http://communities.vmware.com/message/2187972
As a workaround, RHQ ProcessInfo#refresh will internally call Sigar only if the last execution reported the process was in running state.
So far, the problem was only found on a RHEL 64 bit system for a user defined in a an external database (LDAP). It has already been detected by Hyperic as well (system type/version not provided). Solaris platforms may also be affected. See https://jira.hyperic.com/browse/SIGAR-231
1. Failure to discover/connect to JMX servers
Workaround: enable JMX remoting to avoid using Sun's attach API
|Many external plugins depend on JMX plugin discovery feature (Infinispan for instance)|
2. Failure to discover AS7 instances
Workaround: run AS7 instances with same user and group as RHQ agent
3. Failure when checking Hadoop Server availability
Workaround: disable events on Hadoop Server resource
Here is an excerpt of Sigar source:
getpwuid_r returns ERANGE (numerical result out of range) indicating that the buffer size was not large enough.
Hyperic fix consists in increasing the value of _R_SIZE_MAX_ up to 2048. The fix has been tested by RHQ team and solves our particular case. Still, it is not understood why the unpatched code extracted in a simple C test case works :
A bug is opened on glibc side to find why getpwuid_r may not consistently return ERANGE errors.
Also discussed is the appropriate way to call getpwuid_r. According to glibc manual, this type of lookup function should be called in a loop where ERANGE errors would lead to buffer reallocation with larger size:
SigarAccess is an RHQ utility class. It creates:
- a unique instance of Sigar
- a proxy to this instance, implementing the SigarProxy interface
- an invocation handler which serializes calls to the shared Sigar instance
Any RHQ plugin class which needs system or process information should get the SigarProxy from SigarAccess (SigarAccess.getSigar method). This guarantees that RHQ agent will not waste/leak resources and that two threads will not concurrently call the same Sigar instance.
ProcessInfo encapsulates information about a known process and behaves like a cache which can be refreshed.
A few process properties (i.e. PID, command line) will never change during the lifetime of the process and can be read directly with ProcessInfo accessors. Other process properties (i.e. state, CPU usage) will vary and their values are grouped in ProcessInfoSnapshot class instances.
New snapshots of changing data will be taken when calling ProcessInfo.freshSnapshot or ProcessInfo.refresh methods.