RHQ uses Sigar native library to gather information on systems and processes. This page lists important things to know about Sigar and the RHQ objects which work with it.
Sigar Java class will try to load the Sigar native library in the java.library.path or, if it fails, in the same folder where the Sigar JAR file is located.
Sigar Java class is not thread-safe. Create an instance of Sigar Java class for each thread or synchronize access to shared instances appropriately.
Sigar class instances hold resources internally which may not be released if the Sigar object is simply garbage collected. Before getting read of any Sigar class instance always call its close method.
Once a process has died, if you call getProcState method twice on the same Sigar instance in less than two seconds, you will get the last ProcState value known by Sigar (when the process was still alive). See http://communities.vmware.com/message/2187972
As a workaround, RHQ ProcessInfo#refresh will internally call Sigar only if the last execution reported the process was in running state.
So far, the problem was only found on a RHEL 64 bit system for a user defined in a an external database (LDAP). It has already been detected by Hyperic as well (system type/version not provided). Solaris platforms may also be affected. See https://jira.hyperic.com/browse/SIGAR-231
1. Failure to discover/connect to JMX servers
Workaround: enable JMX remoting to avoid using Sun's attach API
|Many external plugins depend on JMX plugin discovery feature (Infinispan for instance)|
2. Failure to discover AS7 instances
Workaround: run AS7 instances with same user and group as RHQ agent
3. Failure when checking Hadoop Server availability
Workaround: disable events on Hadoop Server resource
Here is an excerpt of Sigar source:
getpwuid_r returns ERANGE (numerical result out of range) indicating that the buffer size was not large enough.
Hyperic fix consists in increasing the value of _R_SIZE_MAX_ up to 2048. The fix has been tested by RHQ team and solves our particular case. Still, it is not understood why the unpatched code extracted in a simple C test case works :
A bug is opened on glibc side to find why getpwuid_r may not consistently return ERANGE errors.
Also discussed is the appropriate way to call getpwuid_r. According to glibc manual, this type of lookup function should be called in a loop where ERANGE errors would lead to buffer reallocation with larger size:
SigarAccess is an RHQ utility class. It creates:
- a unique instance of Sigar
- a proxy to this instance, implementing the SigarProxy interface
- an invocation handler which serializes calls to the shared Sigar instance
Any RHQ plugin class which needs system or process information should get the SigarProxy from SigarAccess (SigarAccess.getSigar method). This guarantees that RHQ agent will not waste/leak resources and that two threads will not concurrently call the same Sigar instance.
The invocation handler uses a lock to serialize calls. If a thread waits more than sharedSigarLockMaxWait seconds, it will be given a new Sigar instance, which will be destroyed at the end of the call. Every 5 minutes, a background task checks that localSigarInstancesWarningThreshold has not been exceeded. It it has, a warning message will be logged, optionally with a thread dump.
The invocation handler behavior is configurable with System properties:
- sharedSigarLockMaxWait: maximum time in seconds a thread will wait for the shared Sigar lock acquisition; defaults to 2 seconds
- localSigarInstancesWarningThreshold: threshold of currently living Sigar instances at which the background task will print warning messages; defaults to 50
- maxLocalSigarInstances: maximum number of local Sigar instances which can be created, zero and negative values being interpreted as 'no limit'; defaults to 50
- threadDumpOnlocalSigarInstancesWarningThreshold: if set to true (case insensitive), the background task will also log a thread dump when localSigarInstancesWarningThreshold is met
ProcessInfo encapsulates information about a known process and behaves like a cache which can be refreshed.
A few process properties (i.e. PID, command line) will never change during the lifetime of the process and can be read directly with ProcessInfo accessors. Other process properties (i.e. state, CPU usage) will vary and their values are grouped in ProcessInfoSnapshot class instances.
New snapshots of changing data will be taken when calling ProcessInfo.freshSnapshot or ProcessInfo.refresh methods.