Hibernate Performance Monitor

Summary

HibernatePerformanceMonitor is not the be-all end-all to performance monitoring, but it has been integrated with several aspects of RHQ to date and thus provides a quick and easy way to determine whether bottlenecks or other performance-related issues exist.

Logging Details

As of this writing, it leverages HibernateStatisticsStopWatch to log the following statistics by default:

Type	Description
queryExecutions	number of executed queries
transations	number of completed transactions
entityLoads	number of entity loads
connects	number of connections asked by the session
time	wall clock duration of work

For the following entities:

Type	Integration Point	Data Capture Explanation
Facelets (URLs)	FaceletRedirectionViewHandler (configured in portar-war's web.xml)	All work needed to render the viewId/URL in question
GWT RPC	AbstractGWTServiceImpl (GWT service baseclass in coregui)	All work needed to carry out the end-to-end service
SLSB	PerformanceMonitoringInterceptor (server/jar's ejb-jar.xml)	Sum of work done by this SLSB method and any other SLSB methods it calls along the way

Below is an example of the log snippet for each integration point:

Type	Message
Facelet	21:18:43,227 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=43, xactions=39, loads=106, connects=70, time=222 ](perf: too many xactions?) for URL:/rhq/common/monitor/tables-plain.xhtml?...
GWT RPC	00:35:30,550 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=13, xactions=8, loads=34, connects=15, time=13691 ](perf: slowness?) for GWT:AlertGWTService.findAlertsByCriteria
SLSB	13:49:18,296 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=2, xactions=0, loads=2, connects=2, time=4 ] for SLSB:SavedSearchManagerBean.findSavedSearchesByCriteria

Usage

In order to turn on debug logging, you need to uncomment the following line in the jboss-log4j.xml file in the jbossas/server/default/conf directory:

<category name="org.rhq.enterprise.server.util.HibernatePerformanceMonitor">
  <priority value="DEBUG"/>
</category>

Make sure you change the logging threshold for the appender in question. That will give you statistics about the high-level GWT service calls as well as any JSF-based URL hit. If you want detailed metrics consisting of method-by-method tracking at the SLSB layer, then you need to uncomment the following line in the ejb-jar.xml file inside server/jar (and restart the server if it's currently running):

<interceptor-binding>
   <ejb-name>*</ejb-name>
   <interceptor-class>...PerformanceMonitorInterceptor</interceptor-class>
</interceptor-binding>

Findings

Below are some preliminary findings from running with full debugging on...

ResourceBossBean.getInventorySummary
- called on dashboard (and then refresh continuously) as well as used to render the counts inside the table headers on the inventory browser
- total impact is queries 13, connects 13
- it does 10 queries to get resource/group counts - this can be rewritten as 1 query
ServerPluginsBean.getLastConfigurationChangeTimestamp
- this is called every 30 seconds (in dev mode) for each individual plugin installed in the system, which means a separate db roundtrip per plugin – this can be done in a single query for all plugins at once, and then loop over the results on the server for processing
Resource{Group}ManagerBean.findResource{Group}CompositesByCriteria
- total number of DB roundtrips is on order with the number of results pulled across the wire; the row-wise resource permissions look-up is N+1 issue, we need to include that information in the generated composite query itself so that there is only 1 query/roundtrip to the DB
- these methods are called on the inventory browser and inventory tab of resources/groups
GroupDefinitionManagerBean.calculateGroupMembership
- for "groupby" expression, this could be extremely expensive (for example, queries=835, xactions=914, loads=5010, connects=3667). problem is that each dynagroup is its own xaction, which has multiple sub-actions for additonal, removal, update, renaming, etc.
MeasurementChartsManager.getMetricDisplaySummariesForResource
- total cost is cN+1 queries where N is number of metrics being displayed on monitor>graphs separate query to load each metric, then other queries ('c' of them for each row result) to find related data to put into the MetricDisplaySummary object
AvailabilityManagerBean.mergeAvailabilityReport
- separate DB roundtrip to test whether each availability record has changed and several more roundtrips for each that has. can be rewritten to make one query to get current availabilities of all resources in the report, then loop over that in memory, queue up changes then persist in one fell swoop. lookup of the latest rhq_availability row for each resource is still slow, but could be significantly improved if rhq_reosurce_avail had a FK reference back to the row in rhq_availability
resource import
- total cost is cN+1 queries where N is number of resources being imported into inventory and 'c' is related data that needs to be inserted into the database for each row. below is a snippet of what happens for firing up an agent on a new installation and importing the discovered platform. it shows that 'c' is approximately 7 here, meaning that for an import of approx 900 resources we're going to perform approx 6300 transactions and take nearly 2 minutes of wall clock time to complete that :

// fire up new agent
14:37:43,876 ...[ queries=20, xactions=0, loads=31, connects=163, time=409 ] for DiscoveryBossBean.mergeInventoryReport
14:37:48,901 ...[ queries=2, xactions=0, loads=14, connects=16, time=14 ] for DiscoveryBossBean.mergeInventoryReport
// initial platform, top-level servers, platform-level services done
// click import
14:38:45,581 ...[ queries=5, xactions=0, loads=20, connects=35, time=125 ] for DiscoveryBossBean.getQueuedPlatformsAndServers
14:38:58,021 ...[ queries=10, xactions=0, loads=60, connects=79, time=270 ] for DiscoveryBossBean.updateInventoryStatus
14:39:00,597 ...[ queries=98, xactions=0, loads=131, connects=349, time=558 ] for DiscoveryBossBean.mergeInventoryReport
14:39:00,734 ...[ queries=171, xactions=290, loads=442, connects=608, time=3084 ](perf: slowness?)  for DiscoveryBossBean.importResources
14:39:01,191 ...[ queries=5, xactions=0, loads=6, connects=11, time=17 ] for DiscoveryBossBean.getQueuedPlatformsAndServers
14:39:08,027 ...[ queries=825, xactions=4, loads=1348, connects=3529, time=3828 ](perf: slowness?)  for DiscoveryBossBean.mergeInventoryReport
14:39:25,824 ...[ queries=5130, xactions=25, loads=9324, connects=18669, time=15008 ](perf: slowness?)  for DiscoveryBossBean.mergeInventoryReport
14:39:32,677 ...[ queries=116, xactions=0, loads=1111, connects=1360, time=534 ] for DiscoveryBossBean.mergeInventoryReport
// steady state
14:39:35,234 ...[ queries=2, xactions=0, loads=909, connects=910, time=241 ] for DiscoveryBossBean.mergeInventoryReport
14:39:38,992 ...[ queries=2, xactions=0, loads=909, connects=910, time=233 ] for DiscoveryBossBean.mergeInventoryReport

Other

There are a couple other URLs / SLSBs in the app that appear slow (high number of DB roundtrips for the data they are displaying), but they haven't been analyzed in detail yet:

group graphs, ~60 rountrips
group traits, ~100 roundtrips
group schedules, ~35 roundtrips
resource overview, 65 roundtrips (each spark is multiple roundtrips, and they are queried one at a time)

Caveat

Since this tool is pulling metrics from the single Hibernate Statistics object for the entire RHQ.ear application, it can not distinguish between work done in different transactions, let alone different threads. In other words, this tool is better suited to be used in a single-user environment where load can be applied from a single source. This will give you the most accurate representation of the load an individual (JSF-based) URL, GWT service method, or SLSB puts on the database.

JBoss Community Archive (Read Only)