Metrics - RHQ

RHQ-Metrics or Metrics as as Service or Charting as a Service

The idea is to create a "Metrics subsystem", that can be used by RHQ / JON as well as other systems like EAP, Fabric8 and so on.

It is composed of two general parts:

Storage
Display

as seen in this drawing:

images/author/download/attachments/78906545/RHQ-metrics.png

The drawing does not (yet) show the final architecture, as some parts may need to be more tied to individual C* cluster instances, while others are more provided by a more central/common service.

== Display

Here we will create AngularJS directives, that can be used to talk to the backend service (and also other services that use the same API) and which will then graph the data for one or more metrics over a (given) period of time. If no time range is supplied, a default of last 8h will be used.

== Storage

The storage system itself consists of a few parts

Core: this is the central backbone and marked in green in above example
Storage Backends (mutually exclusive)
- Internal Cassandra cluster: a Cassandra cluster that is managed by the storage system and which is only used for storage
- External Cassandra cluster: like above, but the storage system is (re)using an already existing Cassandra cluster
- Memory: this is used for cases where RHQ-Metrics is supposed to be fully embedded into other projects to provide rapid prototyping / response
Rest-API: Allows to store and retrieve data
- An important detail here is the id-translator that allows to register strings as metric names instead of the numeric ids that are normally used, so that users can e.g. send the data id as host1:cpu4:idle, which is then converted to an (internal) id of 1234
Aggregation: data compaction to reduce the storage needed for long term storage. Only available with C* backends
Processing Extensions (orange, only with C* backends)
- Baselining: calculates baselines of the stored metrics
- .. more extensions, that the user can load into the system.
Communication-Extensions (bordeaux): Allow other systems to upload and retrieve data in their native dialect
Access control (optional in embedded mode): allow to specify who may access what metrics. Granularity to be defined.

There will be several ways to use the service:

embedded inside another process (e.g. WildFlyAS).
- with a single 'embedded' C* instance
- backed by a memory-based structure
Standalone with own C* cluster
Standalone with a remote (=already existing) C* cluster

Internally the data stored is a triple (metric-id, long timestamp, double value)

Question

Is double the right data type?
How do we want to represent the metric-id internally? int?

=== REST-Api

The REST-Api will need endpoints to support the "native encoding" of metrics by default; transport encoding is JSON, other encodings like XML or Yaml should be supported as well

PUT metrics
GET metrics f(time range)
GET additional stuff (baseline data , ... )