JBoss Community Archive (Read Only)

RHQ 4.10

Sigar usage

RHQ uses Sigar native library to gather information on systems and processes. This page lists important things to know about Sigar and the RHQ objects which work with it.

Sigar

General information

Sigar Java class will try to load the Sigar native library in the java.library.path or, if it fails, in the same folder where the Sigar JAR file is located.

Sigar Java class is not thread-safe. Create an instance of Sigar Java class for each thread or synchronize access to shared instances appropriately.

Sigar class instances hold resources internally which may not be released if the Sigar object is simply garbage collected. Before getting read of any Sigar class instance always call its close method.

Known Issues

getProcState may return wrong value on consecutive calls

Once a process has died, if you call getProcState method twice on the same Sigar instance in less than two seconds, you will get the last ProcState value known by Sigar (when the process was still alive). See http://communities.vmware.com/message/2187972

As a workaround, RHQ ProcessInfo#refresh will internally call Sigar only if the last execution reported the process was in running state.

getProcCredName may fail to resolve process owner name

So far, the problem was only found on a RHEL 64 bit system for a user defined in a an external database (LDAP). It has already been detected by Hyperic as well (system type/version not provided). Solaris platforms may also be affected. See https://jira.hyperic.com/browse/SIGAR-231

Possible consequences

1. Failure to discover/connect to JMX servers
Workaround: enable JMX remoting to avoid using Sun's attach API

Many external plugins depend on JMX plugin discovery feature (Infinispan for instance)

2. Failure to discover AS7 instances
Workaround: run AS7 instances with same user and group as RHQ agent

3. Failure when checking Hadoop Server availability
Workaround: disable events on Hadoop Server resource

Problem details

Here is an excerpt of Sigar source:

sigar_format.c excerpt
/* sysconf(_SC_GET{PW,GR}_R_SIZE_MAX) */
#define R_SIZE_MAX 1024

int sigar_user_name_get(sigar_t *sigar, int uid, char *buf, int buflen)
{
    struct passwd *pw = NULL;
    /* XXX cache lookup */

# ifdef HAVE_GETPWUID_R
    struct passwd pwbuf;
    char buffer[R_SIZE_MAX];

    if (getpwuid_r(uid, &pwbuf, buffer, sizeof(buffer), &pw) != 0) {
        return errno;
    }
    if (!pw) {
        return ENOENT;
    }
# else
    if ((pw = getpwuid(uid)) == NULL) {
        return errno;
    }
# endif

    strncpy(buf, pw->pw_name, buflen);
    buf[buflen-1] = '\0';

    return SIGAR_OK;
}

getpwuid_r returns ERANGE (numerical result out of range) indicating that the buffer size was not large enough.

Hyperic fix consists in increasing the value of _R_SIZE_MAX_ up to 2048. The fix has been tested by RHQ team and solves our particular case. Still, it is not understood why the unpatched code extracted in a simple C test case works :

Extracted C test case
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pwd.h>
#include <grp.h>

#define R_SIZE_MAX 1024

int main(void) {

    struct passwd *pw = NULL;
    struct passwd pwbuf;
    char buffer[R_SIZE_MAX];

    // 600 is a userid
    if (getpwuid_r(600, &pwbuf, buffer, sizeof(buffer), &pw) != 0) {
        return errno;
    }
    if (!pw) {
        return ENOENT;
    }

    puts (pw->pw_name);

    return EXIT_SUCCESS;
}

A bug is opened on glibc side to find why getpwuid_r may not consistently return ERANGE errors.
See http://sourceware.org/bugzilla/show_bug.cgi?id=15139

Also discussed is the appropriate way to call getpwuid_r. According to glibc manual, this type of lookup function should be called in a loop where ERANGE errors would lead to buffer reallocation with larger size:

Example from the glibc manual
          struct hostent *
          gethostname (char *host)
          {
            struct hostent hostbuf, *hp;
            size_t hstbuflen;
            char *tmphstbuf;
            int res;
            int herr;
          
            hstbuflen = 1024;
            /* Allocate buffer, remember to free it to avoid memory leakage.  */
            tmphstbuf = malloc (hstbuflen);
          
            while ((res = gethostbyname_r (host, &hostbuf, tmphstbuf, hstbuflen,
                                           &hp, &herr)) == ERANGE)
              {
                /* Enlarge the buffer.  */
                hstbuflen *= 2;
                tmphstbuf = realloc (tmphstbuf, hstbuflen);
              }
            /*  Check for errors.  */
            if (res || hp == NULL)
              return NULL;
            return hp;
          }

// Copyright 2013 Free Software Foundation, Inc.
// Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

SigarAccess

SigarAccess is an RHQ utility class. It creates:

  • a unique instance of Sigar

  • a proxy to this instance, implementing the SigarProxy interface

  • an invocation handler which serializes calls to the shared Sigar instance

Any RHQ plugin class which needs system or process information should get the SigarProxy from SigarAccess (SigarAccess.getSigar method). This guarantees that RHQ agent will not waste/leak resources and that two threads will not concurrently call the same Sigar instance.

The invocation handler uses a lock to serialize calls. If a thread waits more than sharedSigarLockMaxWait seconds, it will be given a new Sigar instance, which will be destroyed at the end of the call. Every 5 minutes, a background task checks that localSigarInstancesWarningThreshold has not been exceeded. It it has, a warning message will be logged, optionally with a thread dump.

The invocation handler behavior is configurable with System properties:

  • sharedSigarLockMaxWait: maximum time in seconds a thread will wait for the shared Sigar lock acquisition; defaults to 2 seconds

  • localSigarInstancesWarningThreshold: threshold of currently living Sigar instances at which the background task will print warning messages; defaults to 50

  • maxLocalSigarInstances: maximum number of local Sigar instances which can be created, zero and negative values being interpreted as 'no limit'; defaults to 50

  • threadDumpOnlocalSigarInstancesWarningThreshold: if set to true (case insensitive), the background task will also log a thread dump when localSigarInstancesWarningThreshold is met

ProcessInfo

ProcessInfo encapsulates information about a known process and behaves like a cache which can be refreshed.

A few process properties (i.e. PID, command line) will never change during the lifetime of the process and can be read directly with ProcessInfo accessors. Other process properties (i.e. state, CPU usage) will vary and their values are grouped in ProcessInfoSnapshot class instances.

New snapshots of changing data will be taken when calling ProcessInfo.freshSnapshot or ProcessInfo.refresh methods.

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-11 14:01:03 UTC, last content change 2013-07-15 18:32:56 UTC.