Acceptance Criteria

Overview

This page provides a compilation of questions that need to be adequately answered in order to move forward with a database as the next generation metrics storage solution. Answers will be focused around Cassandra.

Acceptance Criteria Questions

What are the minimum requirements for a single box install?

How do we manage and monitor disk usage?

There are several things we may want to monitor including:

The amount of used and free space on the partition or drive where backups are stored
The amount of used and free space on the partition or drive where the Cassandra data directory is stored
- Note that data files are grouped by column family into their own directories making it possible and easier to store individual column family data directories on partitions or drives having hardware that address specific performance demands of those column families.
The amount of used and free space on the partition or drive where the commit log is stored.
The read performance of the disk where data files are stored
The write performance of the disk where data files are stored
The write performance of the disk where the commit log is stored

Let's first consider backups. As backups start filling up available disk space, we need to fire an alert, notifying the user that disk space is running low. We should also offer couses of action to remedy the situation. This could include purging older backups, keeping fewer backups, increasing available disk space, or moving backups (including future backups) to a different partition or drive that has more space. We should be able to automate all of these tasks. While this functionality is needed for our metrics database, it comes down to basic file system operations. This functionality should therefore be provided by the File System type in the Platforms plugin.

When the data directory or when one of the column family data directories starts filling up, we need to fire an alert notifying the user that the partition or drive where the data directory resides is running low on space. As with backups we need to provide some possible solutions. The first and potentially easiest is to run a compaction. This will merge multiple SSTables into one. After the new SSTable is created and after old ones are no longer referenced, they will be deleted. Another option which may or may not be feasible is to increase storage capacity. If LVM is being used, then this could actually be easier and/or faster than doing a compaction. Lastly, we could add nodes to the cluster, thereby dividing up and redistributing the data.

Another, application specific solution we can pursue to deal with a lack of space in the data directory is to reduce the retention period for data that we store. Take raw metric data as an example. Today we store 14 days worth of raw data. If we find that raw data is the culprit, then we can reduce the retention period to 10 days (note that 10 was arbitrarily chosen here for illustration). We then proceed to purge any raw data older than 10 days. All future raw daw data will be stored with a TTL of 10 days. Because we are dealing with distributed deletes, Cassandra does not immediately delete data. In order for the deletes to propagate through the cluster and to be purged as soon as possible, we (may) want to run a repair on each node and then run compaction on each node.

This brings up the topic of data distribution, which is a subject in its own right, but I will discuss it briefly here. If only a single node or nodes (i.e., replicas) are suffering from high disk usage in the data directory, this might signal uneven data distribution. It could be the case that the nodes experiencing high disk usage store keys that have much larger amounts of data associated with them relative to keys on other nodes. Using or changing key strategies may be a better solution in this scenario as opposed to reducing retention periods.

The commitlog_segment_size_in_mb property determines the size of each commit log file segment. Once all of the data in a segment has been flushed to SSTables, that file segment can be deleted. The default size is 32 MB. Scheduling a job to periodically purge old file segments should be sufficient for managing the disk space where the commit log resides.

If we detect poor read performance on the partition or drive where the data files reside, we can increase the row and/or key cache sizes to reduce trips to disk.

TODO

Does changing cache sizes in Cassandra require a restart to take effect?

The key with the read performance will be doing some intelligent analysis to determine which column families are responsible for the performance degradation. If the cache tuning does not alleviate the problems, then we can suggest that the user consider moving the column family directories to faster physical media.

As for slow writes to the partition or drive where Cassandra's data directory lives, we need to determine which column families are receiving the bulk of the writes. We could move those column family directories to faster media; otherwise the next best option would probably be to add nodes to the cluster to better distribute writes and improve overall throughput.

The Cassandra documentation strongly recommends putting the commit log on a partition or drive separate from the data directory. The commit log is only read at start up; so, read performance is not an issue. Cassandra only performs sequential IO with the commit log. There is no random IO. There are several configuration parameters that affect things like how frequently the commit log is synced to disk and when writes should be acked. Testing will need to be done to determine if and how tuning these types of settings can alleviate write performance problems with the commit log.

Are there any OS specific requirements?

I am not aware of any OS specific requirements other than Java being supported on the platform.

Are there any Java/JRE specific requirements?

The community docs say Cassandra should be run on the latest, stable release of Java 6. There are several JIRA issues related to Java 7:

How many concurrent writes should we be able to handle with 1 node, 2 nodes, 3 nodes, etc.?

How do we deploy the database during installation?

Let's start with the simple case involving a single node. During the RHQ installation process, the user will have to provide several pieces of information so that we can install and configure the node. For basic configuration settings that require values this includes:

The host name of the machine on which Cassandra will be installed
A writable directory where Cassandra will store the commit log and data files
User credentials (more on this in the security section)
Location for backups (more on this in the backup section)

For advanced settings for which we will supply default values, this includes:

The various ports including rcp_port which is the port Cassandra listens on for Thrift clients
key and row cache settings like the cache sizes and retention periods for cache entries
Settings for the commit log which control when writes are acked and how frequently it is flushed to disk
The JMX port which is the port on which Cassandra listens for JMX connections

Nearly all of the user-supplied configuration values will go into cassandra.yaml. The JMX port will be written into cassandra-env.sh. We will need to generate these files with the values supplied by the user. The location of backups is unique in that it pertains to how RHQ will manage the Cassandra node. That information will need to be stored in the RHQ database.

Users are not required to install and run an agent on the same machine on which the RHQ server runs. The user will however have to install an agent onto the machine on which the Cassandra node is installed so that we can fully manage it. This brings up the order of installation with respect to the Cassandra node, the RHQ server, and the RHQ agent. The key thing is that when the RHQ installer finishes, the Cassandra node should be running and fully configured for management. By that I mean that the Cassandra node should already be imported into the server's inventory, maintenance operations should already be scheduled, alert definitions should already be created, event logging configured, and so on.

If deploying multiple nodes, there are two additional properties that must be set - seeds and initial_token. seeds is a list of host names or IP addresses where nodes will be running. This list is used at start up so that a node can learn the topology of the cluster. initial_token specifies the range of the token ring for which the node is responsible. There is a simple function to use for calculating tokens. It is possible to run multiple nodes on a single machine for production use, but each node has to have its own IP address.

What does the set up look like for dev environments?

With PostgreSQL or Oracle, we provide instructions for installing the database, and then you can install and manage the schema via dbsetup and dbupgrade. With Cassandra we will typically want to run multiple nodes, even if they are running on the same machine. We need to automate tasks like creating a cluster, adding nodes to the cluster, removing nodes from the cluster, and starting/stopping nodes. ccm is a handy tool/library that automates these and other tasks. One thing we will want do is put Cassandra start up scripts and configuration files in git. ccm downloads and builds Cassandra from source. It then uses that source distribution as the basis for creating clusters and nodes. I do not think we can have it instead for example pull cassandra.yaml from the RHQ git repo. For the long term we need a solution that will enable us to create or alter a development cluster that uses the start up scripts and configuration files we store in git.

We could easily enough add something to the build to generate an N node development cluster that runs locally. The build already generates our dev container. I am not sure if this is the way to go though because we need to support the ability for other people, who might not be actively developing and building RHQ, to quickly and easily set up a cluster as well. QE engineers for example do not build from source when they do their testing. They may instead work with a build from Jenkins. We also need to consider support engineers who to set up environments for their work. We might be able to put some sort of wrapper around whatever solution that we come up with so that it can easily be incorporated into the build.

For the relational database, the build generates test and dev schemas. test is for running unit tests, and dev is for use with the dev container. We will do the same with Cassandra.

How do we perform database upgrades?

How do we perform backups?

Cassandra performs backups via the snapshot operation. You can take a snapshot of all keyspaces, a single keyspace, or a specified set of columns families. Creating a snapshot does not actually copy files. Instead hard links to the SSTables are created. By default snapshots are stored in <cassandra_data_dir>/<keyspace_name>/<column_family_name>/snapshots. Let's say we use default settings, we have the keyspace named rhq, and a column family named raw_metric_data. Then we create a snapshot named backup1. It will be located at /var/lib/cassandra/data/rhq/raw_metric_data/snapshots/backup1. After the snapshot is generated, the contents of the snapshot directory can be copied and archived. Then after the snapshot files have been copied to the backup destination, the original snapshot directory should be deleted because the snapshot files will prevent obsolete data files from being deleted. The first thing that happens when a snapshot is taken is that the MemTable is flushed to disk.

TODO

Can taking a snapshot trigger a compaction?

Once a snapshot has been generated, incremental backups can be performed. They are disabled by default but can be enabled by setting the incremental_backups property in cassandra.yaml to true. When a MemTable has in-memory mutations and is flushed to disk, hard links are created for the newly created SSTable on disk. Files from incremental backups are stored in <cassandra_data_dir>/<keyspace_name>/<column_family_name>/backups. When incremental backups are enabled, both the snapshot directory and the backups directory should be copied to the backup destination. And then both directories should be deleted.

We want the process of performing and managing backups to be as transparent as the user wants. For a given Cassandra node, the user will have to set up how backups should be managed. He will have to answer some questions including:

Where backups should be stored?
How frequently should backups be run?
How many backups should kept?
Should incremental backups be used?
When should backups be done?

RHQ will need to monitor disk usage as well. If backup files and/or data files exhaust available disk space, problems will ensue. Critical maintenance operations like backups and compaction will likely fail as well as writes to the commit log or SSTables being flushed. See How do we manage and monitor disk usage for more details.

How is security handled?

Cassandra provides a pluggable security model. You have to implement two classes, IAuthenticator and IAuthority. The former performs authentication while the latter performs authorization. Cassandra ships with no security out of the box. In the examples subdirectory of the Cassandra source tree, there is SimpleAuthenticator and SimpleAuthority which provide properties files-based authentication and authorization. Those classes are not included in the packaged, binary distribution. We will need to build and package those ourselves as well any other implementations we use.

SimpleAuthenticator looks for usernames and passwords in a file called passwd.properties. SimpleAuthority looks a properties file called access.properties. Permissions can be specified for an entire keyspace or for individual column families within a keyspace. Here is an example of what this file might look like:

# This is a sample access file for SimpleAuthority. The format of this file
# is KEYSPACE[.COLUMNFAMILY].PERMISSION=USERS, where:
#
# * KEYSPACE is the keyspace name.
# * COLUMNFAMILY is the column family name.
# * PERMISSION is one of <ro> or <rw> for read-only or read-write respectively.
# * USERS is a comma delimited list of users from passwd.properties.

# Grant read/write access to everything in the rhq keyspace
rhq.<rw>=rhqadmin

# Grant read-only access to some_column_family
rhq.some_column_family=<ro>

This may be sufficient for our security needs. One drawback though is that we would be storing security credentials in the RHQ database as well as in these properties files. These properties files would have to be copied onto each machine where a node will run. If this is not adequate then we may want to consider an integrated solution where RHQ security is used. Credentials would be accessed via RHQ's remote APIs.

How do we handle RHQ server and database starts/stops?

This question involves whether or not database start up/shutdown should be part of the RHQ server start up/shutdown. In other words, when a user starts his RHQ server should that also take of starting up the database. And should stopping the RHQ server take care of stopping the database. This could get complicated with multiple Cassandra nodes running on different machines. I do not think we want to couple the start up/shutdown operations in this way not only because of the added complexity but also because there are good reasons why a user may want to leave Cassandra nodes running even if the RHQ server is shut down. The Cassandra nodes can continue performing maintenance operations while the RHQ server is down. In fact, that would be the perfect time to do something like a full compaction which is both CPU and IO intensive.

We should however provide scripts to start/stop a single node as well as multiple nodes. We need to make it easy as running a single script, once from one machine, to start up or shut down all or some specified set nodes in the cluster. There are different parallel, remote shell utilities we can look to use like pdsh or dsh.

JBoss Community Archive (Read Only)

RHQ

Acceptance Criteria

Overview