JBoss Community Archive (Read Only)

RHQ 4.9

RHQ Metrics Simulator

Overview

RHQ Metrics Simulator or rhq-ms is a command line tool for testing the new Cassandra-based metrics backend. The purpose of the tool is twofold. First, it facilitates performance and load testing. Secondly, it assists with understanding how Cassandra behaves under different scenarios. The simulator complements the perftest plugin. The simulator does not replace it. Testing with the perftest plugin requires a full environment that includes RHQ server, database, at least one agent, and Cassandra. rhq-ms is a stand-alone program that is run from the command line and only its only dependency is Cassandra which it can deploy and configure for you. This tool aims to be easily accessible and usable in a dev environment as well as suitable for use in a more dedicated performance/load testing environment. The remainder of this document will discuss how to use and configure the tool.

Components

The simulator is comprised of a few different components. A summary of them is provided to beter understand how to configure and tune the simulator.

MeasurementCollector

A task that periodically runs to collect and store metrics in Cassandra. It uses a priority queue similar to how the agent does to determine which schedules are ready for collection. You can think of this task simulating both the agent and the server as it collects values and then writes those values to Cassandra. Note that it uses the same code that the server does to store metrics. Multiple instances of this task can be run concurrently.

MeasurementAggregator

A task that periodically runs to aggregate metric data. This task executes the same code that runs on the server during the DataPurgeJob. Only one instance of this task can be running at any given time. The reason for this is that the server only executes a single instance of the DataPurgeJob.

StatsCollector

A task that periodically runs to collect and report statistics about the simulation. Only a single instance of this tak runs at any given time. Statistics are reported via logging which can be configured in RHQ_METRICS_SIMULATOR_HOME/conf/log4j.properties. The task report statistics for,

  • Number of raw metrics inserted per minute

  • Total number of metrics inserted

  • Summary aggregates for raw insertion times

Installation

The metrics simulator lives in the RHQ source tree in the modules/helpers/metrics-simulator Maven module. After building that module you will find rhq-metrics-simulator-${RHQ_VERSION}.zip in the target directory. Simply unzip that file to a location of your choosing.

Running

The rhq-ms.sh script found in the bin directory is used to run the simulator.  It requires the JAVA_HOME environment variable to be set. Run the script without any arguments to see a list of supported options. The output should look similar to this,

$ rhq-ms.sh
usage: rhq-ms [options]
 h,--help               Display this message.
 s,--simulation <arg>   The simulation to run. Expected to be a JSON file.
usage: rhq-ms [options] h,--help               Display this message. s,--simulation <arg>   The simulation to run. Expected to be a JSON file.

Next we will look at running the simulator with some example simulations.

Examples

This section provides examples that aim to help you understand how to create a simulation.

Example 1

$ rhq-ms.sh -s example1.json

This simulator is passed as input a file named example1.json. This input file specifies details about the simulation to be run. Here is the file.

example1.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "schedules": {"count": 1000, "interval": 500}
}

Let's look at each property one by one. simulationTime specifies the duration of the entire simulation run. Its value is interpreted in minutes.

collectionInterval specifies the frequency of metric collections. Its value is interpreted in milliseconds. This property is analogous to how often the agent performs its measurement collections task. This value does not specify the collection interval for a schedule. It specifies how frequently the task of collecting measurements should be done.

aggregationInterval specifies the frequency of how often to perform aggregation, i.e., compression. The value is interpreted in milliseconds. This property is analogous to how often the RHQ server's data purge job runs and executes the compression task.

schedules specifies details about the metrics to be collected. The value of this property is an object that contains two properties. count specifies the number of schedules, and interval is the collection interval for those schedules. In this example, the simulator will generate 1000 unique schedule ids and then collect and store metrics for them every 500 milliseconds.

All properties are optional. Default values will be used for any properties that you omit. You still have to define a top-level object which can be empty.

Example 2

$ rhq-ms.sh -s example2.json

This example is slightly more involved in that it specifies multiple schedule sets. Properties that have already been covered will be omitted from discussion.

example2.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "schedules": [
    {
      "count": 1000,
      "interval": 500
    },
    {
      "count": 1000,
      "interval": 1000
    }
  ]
}

The difference here from the previous example is that schedules takes an array of objects as its value. The supported properties for each object are the same, namely, the count and interval properties. Why would you want to specifies multiple sets of schedules? This can be useful for testing and analyzing the effects of varying data distributions in the cluster. Schedules with smaller intervals (i.e., higher collection frequencies) will wind up with wider rows than schedules having larger intervals. Staggering intervals might also be used for causing spikes in write traffic.

Example 3

$ rhq-ms.sh -s example3.json
example3.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "threadPoolSize": 10,
  "numMeasurementCollectors": 8,
  "schedules": [
    {
      "count": 1000,
      "interval": 500
    },
    {
      "count": 1000,
      "interval": 1000
    }
  ],
  "ttl": [
    {
      "table": "raw_metrics",
      "value": 180
    },
    {
      "table": "one_hour_metrics",
      "value": 360
    },
    {
      "table": "six_hour_metrics",
      "value": 540
    },
    {
      "table": "twenty_four_hour_metrics",
      "value": 720
    }
  ]
}

Three new properties are introduced in this example. threadPoolSize specifies the total number of threads that the simulator uses. Keep in mind the following when setting this property. The simulator runs multiple tasks concurrently using a thread pool. There are tasks for

  • Collecting and storing raw metrics

  • Performing aggregation

  • Collecting and reporting statistics

numMeasurementCollectors specifies the number of threads to use for performing measurement collections. Currently the measurement collector tasks using a fixed batch size of 500. The batch size sets a cap on the total number of schedules for which metrics will be collected in a run of the measurement collector task. Let's suppose numMeasurementCollectors is set to 2 and that 2,000 schedules are ready for collection. Because two collectors are running, 1,000 schedules can be processed concurrently. If we increase numMeasurementCollectors to 4, then all 2,000 schedules can be processed concurrently.

Using a larger value for numMeasurementCollectors can increase write throughput which will in turn increase the overall load on Cassandra.

ttl specifies the time-to-live for metric data stored in Cassandra. You can specify a TTL for a column in Cassandra, which is in seconds, and when that time has passed the column is marked as deleted. The value of ttl is an array where each object corresponds to one of the metric data tables. Two properties are recognized for each nested object - table and value. table specifies the table name, and value specifies the TTL in seconds.

Example 4

$ rhq-ms.sh -s example4.json
example4.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "threadPoolSize": 10,
  "numMeasurementCollectors": 8,
  "schedules": [
    {
      "count": 1000,
      "interval": 500
    },
    {
      "count": 1000,
      "interval": 1000
    }
  ],
  "ttl": [
    {
      "table": "raw_metrics",
      "value": 180
    },
    {
      "table": "one_hour_metrics",
      "value": 360
    },
    {
      "table": "six_hour_metrics",
      "value": 540
    },
    {
      "table": "twenty_four_hour_metrics",
      "value": 720
    }
  ],
  "timeSliceDuration": {
    "units": "minutes",
    "values": [
      {
        "table": "raw_metrics",
        "value": 1
      },
      {
        "table": "one_hour_metrics",
        "value": 6
      },
      {
        "table": "six_hour_metrics",
        "value": 24
      }
    ]
  }
}

This example introduces the timeSliceDuration property which requires a bit of explanation. When the RHQ server performs aggregation over a given metrics table, it does so for the last time slice. The time slices get increasingly large for each table. The raw_metrics table for example has a time slice duration or size of one hour for production use. This means that aggregation job will aggregate all raw metrics inserted during the previous hour. The six_hours_metrics table has a time slice duration of six hours for production use. When the current time slice has elapsed, the aggregation job will aggregate all of the 1 hour metrics during that six hour time slice, storing the computed aggregates in the six_hours_metrics table.

The nested units property how to interpret the value for each table. In this example, values are interpreted as minutes.

values takes an array of objects where each object specifies the duration for a given table. Two properties are recognized for objects stored in that array - table and value. The former specifies the table name while the latter specifies the value for the table's time slice duration.

The time slice duration for the raw_metrics table should not be smaller than aggregationInterval.

Example 5

$ rhq-ms.sh -s example5.json
example5.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "threadPoolSize": 10,
  "numMeasurementCollectors": 8,
  "cluster": {
    "embedded": true,
    "clusterDir": "/var/lib/rhq-metrics-simulator",
    "numNodes": 2,
    "heapSize": "256M",
    "heapNewSize": "64M",
    "stackSize": "400k"
  }
}

This example introduces the cluster property which is used to specify cluster configuration. heapSize sets the min/max heap size for the JVM. heapNewSize sets the maximum size of the new generation of the heap. The simulator will deploy a two-node, embedded cluster in /var/lib/rhq-metrics-simulator/cassandra. The stackSize property specifies the thread stack size. The value is passed directly to the -Xss option of the java executable.

Example 6

$ rhq-ms.sh -s example6.json
example5.json
{
  "simulationTime": 20,
  "collectionInterval": 500,
  "aggregationInterval": 1000,
  "threadPoolSize": 10,
  "numMeasurementCollectors": 8,
  "clientCompression": "none"
}

This example introduces the clientCompression property. Cassandra's binary protocol for CQL supports compression. The DataStax driver supports compression. It can also be disabled as is done in this example. Valid values for the clientCompression property include none and snappy.

The simulator currently only supports working with an embedded cluster. There are plans to support using non-embedded clusters that span multiple machines.

Configuration Reference

Property

Type

Default Value

Description

simulationTime

integer

10

The duration of the entire simulation. The value is specified in minutes.

collectionInterval

integer

500

The frequency of metric collections. The value is specified in milliseconds. This property is analogous to how often the agent performs its measurement collections task. This value does not specify the collection interval for a schedule.

aggregationInterval

integer

1000

The frequency of how often to perform aggregation, i.e., compression. The value is specified in milliseconds.

numMeasurementCollectors

integer

5

The number of threads to use for performing measurement collections. This directly corresponds to the number of MeasurementCollector tasks that will execute concurrently.

threadPoolSize

integer

7

The total number of threads that the simulator uses.

clientCompression

string

none

Specifies whether or not to use compression. Valid values include none and snappy.

schedules

object or array

{"count": 2500, "interval": 500}

Specifies details about the metrics to be collected. The value can either be an object or an array of objects where each object specifies details about a set of schedules.

ttl

array of objects

 

The time-to-live for metric data stored in Cassandra. The array element properties are described in ttl[n].table and ttl[n].value

ttl[n].table

string

one of raw_metrics, one_hour_metrics, six_hour_metrics, twenty_four_hour_metrics

The name of the table for which the TTL is being set.

ttl[n].value

integer

see description

The TTL value in seconds. The defaults for each table are as follows, 

  • raw_metrics - 180

  • six_hour_metrics - 360

  • six_hour_metrics - 540

  • twenty_four_hour_metrics - 720

timeSliceDuration

object

 

The size or duration of a time slice for a metrics table. Nested properties are described below.

timeSliceDuration.units

string

minutes

The unit of time to use for the durations. Acceptable values are seconds, minutes, hours, days.

timeSliceDuration.values

array

 

An array of objects where each object specifies the time slice duration for a metrics table. The array element properties are described in timeSliceDuration.values[n].table and in timeSliceDuration.values[n].value.

timeSliceDuration.values[n].table

string

one of raw_metrics, one_hour_metrics, six_hour_metrics

The name of the table for which the time slice duration is being set.

timeSliceDuration.values[n].value

integer

see description

The time slice duration. The default values for each table are as follows

  • raw_metrics - 1

  • one_hour_metrics - 6

  • six_hour_metrics - 24

cluster

object

 

Specifies cluster configuration.

cluster.embedded

boolean

true

Specifies whether or not the cluster is embedded.

cluster.clusterDir

string

RHQ_METRICS_SIMULATOR_HOME/cassandra

The director in which cluster nodes are deployed.

cluster.numNodes

integer

2

The number of nodes to deploy.

cluster.heapSize

string

256M

The max and min heap size for the Cassandra JVMs.

cluster.heapNewSize

string

64M

The maximum size of the new generation space of the JVM heap.

cluster.stackSize

string

180k

The JVM thread stack size which is passed to the -Xss option of the java executable.

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-13 08:09:55 UTC, last content change 2013-09-18 19:41:50 UTC.