<category name="org.rhq.enterprise.server.util.HibernatePerformanceMonitor"> <priority value="DEBUG"/> </category>
HibernatePerformanceMonitor is not the be-all end-all to performance monitoring, but it has been integrated with several aspects of RHQ to date and thus provides a quick and easy way to determine whether bottlenecks or other performance-related issues exist.
As of this writing, it leverages HibernateStatisticsStopWatch to log the following statistics by default:
Type |
Description |
queryExecutions |
number of executed queries |
transations |
number of completed transactions |
entityLoads |
number of entity loads |
connects |
number of connections asked by the session |
time |
wall clock duration of work |
For the following entities:
Type |
Integration Point |
Data Capture Explanation |
Facelets (URLs) |
FaceletRedirectionViewHandler |
All work needed to render the viewId/URL in question |
GWT RPC |
AbstractGWTServiceImpl |
All work needed to carry out the end-to-end service |
SLSB |
PerformanceMonitoringInterceptor |
Sum of work done by this SLSB method and any other SLSB methods it calls along the way |
Below is an example of the log snippet for each integration point:
Type |
Message |
Facelet |
21:18:43,227 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=43, xactions=39, loads=106, connects=70, time=222 ](perf: too many xactions?) for URL:/rhq/common/monitor/tables-plain.xhtml?... |
GWT RPC |
00:35:30,550 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=13, xactions=8, loads=34, connects=15, time=13691 ](perf: slowness?) for GWT:AlertGWTService.findAlertsByCriteria |
SLSB |
13:49:18,296 DEBUG [ HibernatePerformanceMonitor ] HibernateStats[ queries=2, xactions=0, loads=2, connects=2, time=4 ] for SLSB:SavedSearchManagerBean.findSavedSearchesByCriteria |
In order to turn on debug logging, you need to uncomment the following line in the jboss-log4j.xml file in the jbossas/server/default/conf directory:
<category name="org.rhq.enterprise.server.util.HibernatePerformanceMonitor"> <priority value="DEBUG"/> </category>
Make sure you change the logging threshold for the appender in question. That will give you statistics about the high-level GWT service calls as well as any JSF-based URL hit. If you want detailed metrics consisting of method-by-method tracking at the SLSB layer, then you need to uncomment the following line in the ejb-jar.xml file inside server/jar (and restart the server if it's currently running):
<interceptor-binding> <ejb-name>*</ejb-name> <interceptor-class>...PerformanceMonitorInterceptor</interceptor-class> </interceptor-binding>
Below are some preliminary findings from running with full debugging on...
ResourceBossBean.getInventorySummary
called on dashboard (and then refresh continuously) as well as used to render the counts inside the table headers on the inventory browser
total impact is queries 13, connects 13
it does 10 queries to get resource/group counts - this can be rewritten as 1 query
ServerPluginsBean.getLastConfigurationChangeTimestamp
this is called every 30 seconds (in dev mode) for each individual plugin installed in the system, which means a separate db roundtrip per plugin – this can be done in a single query for all plugins at once, and then loop over the results on the server for processing
Resource{Group}ManagerBean.findResource{Group}CompositesByCriteria
total number of DB roundtrips is on order with the number of results pulled across the wire; the row-wise resource permissions look-up is N+1 issue, we need to include that information in the generated composite query itself so that there is only 1 query/roundtrip to the DB
these methods are called on the inventory browser and inventory tab of resources/groups
GroupDefinitionManagerBean.calculateGroupMembership
for "groupby" expression, this could be extremely expensive (for example, queries=835, xactions=914, loads=5010, connects=3667). problem is that each dynagroup is its own xaction, which has multiple sub-actions for additonal, removal, update, renaming, etc.
MeasurementChartsManager.getMetricDisplaySummariesForResource
total cost is cN+1 queries where N is number of metrics being displayed on monitor>graphs separate query to load each metric, then other queries ('c' of them for each row result) to find related data to put into the MetricDisplaySummary object
AvailabilityManagerBean.mergeAvailabilityReport
separate DB roundtrip to test whether each availability record has changed and several more roundtrips for each that has. can be rewritten to make one query to get current availabilities of all resources in the report, then loop over that in memory, queue up changes then persist in one fell swoop. lookup of the latest rhq_availability row for each resource is still slow, but could be significantly improved if rhq_reosurce_avail had a FK reference back to the row in rhq_availability
resource import
total cost is cN+1 queries where N is number of resources being imported into inventory and 'c' is related data that needs to be inserted into the database for each row. below is a snippet of what happens for firing up an agent on a new installation and importing the discovered platform. it shows that 'c' is approximately 7 here, meaning that for an import of approx 900 resources we're going to perform approx 6300 transactions and take nearly 2 minutes of wall clock time to complete that :
// fire up new agent 14:37:43,876 ...[ queries=20, xactions=0, loads=31, connects=163, time=409 ] for DiscoveryBossBean.mergeInventoryReport 14:37:48,901 ...[ queries=2, xactions=0, loads=14, connects=16, time=14 ] for DiscoveryBossBean.mergeInventoryReport // initial platform, top-level servers, platform-level services done // click import 14:38:45,581 ...[ queries=5, xactions=0, loads=20, connects=35, time=125 ] for DiscoveryBossBean.getQueuedPlatformsAndServers 14:38:58,021 ...[ queries=10, xactions=0, loads=60, connects=79, time=270 ] for DiscoveryBossBean.updateInventoryStatus 14:39:00,597 ...[ queries=98, xactions=0, loads=131, connects=349, time=558 ] for DiscoveryBossBean.mergeInventoryReport 14:39:00,734 ...[ queries=171, xactions=290, loads=442, connects=608, time=3084 ](perf: slowness?) for DiscoveryBossBean.importResources 14:39:01,191 ...[ queries=5, xactions=0, loads=6, connects=11, time=17 ] for DiscoveryBossBean.getQueuedPlatformsAndServers 14:39:08,027 ...[ queries=825, xactions=4, loads=1348, connects=3529, time=3828 ](perf: slowness?) for DiscoveryBossBean.mergeInventoryReport 14:39:25,824 ...[ queries=5130, xactions=25, loads=9324, connects=18669, time=15008 ](perf: slowness?) for DiscoveryBossBean.mergeInventoryReport 14:39:32,677 ...[ queries=116, xactions=0, loads=1111, connects=1360, time=534 ] for DiscoveryBossBean.mergeInventoryReport // steady state 14:39:35,234 ...[ queries=2, xactions=0, loads=909, connects=910, time=241 ] for DiscoveryBossBean.mergeInventoryReport 14:39:38,992 ...[ queries=2, xactions=0, loads=909, connects=910, time=233 ] for DiscoveryBossBean.mergeInventoryReport
There are a couple other URLs / SLSBs in the app that appear slow (high number of DB roundtrips for the data they are displaying), but they haven't been analyzed in detail yet:
group graphs, ~60 rountrips
group traits, ~100 roundtrips
group schedules, ~35 roundtrips
resource overview, 65 roundtrips (each spark is multiple roundtrips, and they are queried one at a time)
Since this tool is pulling metrics from the single Hibernate Statistics object for the entire RHQ.ear application, it can not distinguish between work done in different transactions, let alone different threads. In other words, this tool is better suited to be used in a single-user environment where load can be applied from a single source. This will give you the most accurate representation of the load an individual (JSF-based) URL, GWT service method, or SLSB puts on the database.