There are a couple of issues that you have to take into consideration when running RHQ in EC2. The first involves agent/server communication The second involves resource keys. This document describes these issues, the challenges the present, and solutions to better adapt RHQ to EC2 as well as other cloud environment.
When an agent registers with the server, the agent sends its IP address to the server so that the server knows how to contact the agent. The server sends to the agent a failover list of server host names. When a machine is restarted in EC2 it acquires a new IP address and new host name. If an agent machine is restarted, the server no longer has a valid endpoint address for the agent. Likewise, if the server machine is restarted, agents will no longer be able to reach the server.
Agent/server communication is bi-directional; consequently, there are two configuration properties that need special attention - the agent endpoint address and the server endpoint address.
When an agent registers with the server, it sends to the server its endpoint address. The server sends requests to the agent via this address. The value of the agent endpoint address comes from the rhq.communications.connector.bind-address agent configuration property. The agent needs to be configured to bind to whatever the current address of the machine is. This is done simply by not setting rhq.communications.connector.bind-address. During the interactive setup this property is set when you are prompted with the following,
Leaving it blank is all that needs to be done. When that preference is set, the agent will always try use the value that you specify. When no value is specified, the agent uses whatever the current IP address happens to be.
If you automate agent start up such as is the case when the agent runs as a service, the agent is started/stopped with the <RHQ_AGENT>/bin/rhq-agent-wrapper.sh script. This script derives values for agent configuration properties, including those values that you set interactively, from <RHQ_AGENT>/conf/agent-configuration.xml. By default rhq.communications.connector.bind-address is commented out which produces the desired the behavior of the agent binding to the current address of the machine.
The agent registers with the server using the server endpoint address. This address comes from the rhq.agent.server.bind-address agent configuration property. During interactive setup this property is set when you are prompted with the following,
If you automate agent start up such as is the case when the agent runs as a service, you could modify the value rhq.agent.server.bind-address directly in <RHQ_AGENT>/conf/agent-configuration.xml. Alternatively and possibly more easily, you can set the value through the RHQ_AGENT_CMDLINE_OPTS environment variable as follows,
You can set RHQ_AGENT_CMDLINE_OPTS in <RHQ_AGENT>/bin/rhq-agent-env.sh or from a bash shell for example as follows,
So far we have looked at some of the different ways you can configure the server endpoint address, but we have not discussed how to handle a server restart that invalidates the current server endpoint address. One approach might be to use an elastic IP address which is discussed next.
Some people may choose to use elastic IP addresses to deal with the agent and server endpoint addresses; however, this is not a robust solution for a couple reasons. First, Amazon restricts each user account to a very limited number of elastic IP address; so, unless you plan to have a very small deployment with only a handful of machines, you will wind up with machines that do not have elastic IP addresses. Secondly, a user can assign or unassign an elastic IP address to a machine at any time. Whether intentionally or accidentally a user can too easily change the IP address resulting in lost communication between agent and server.
If you do choose to use elastic IP addresses, first allocate any elastic IP address that you have to your RHQ server machines This way, agents have a known address for the server endpoint that does not change across server restarts. To make sure your server uses the elastic IP address, follow one of the approaches from the preceding section to configure the server endpoint address as follows,
In the next section, we look at some server configuration changes that can be made to help agents maintain valid server endpoint addresses without using elastic IP addresses.
When an agent registers with a server, it downloads a failover list that contains server endpoint addresses. If the agent is unable to connect to the first server in the list, it will continue trying subsequent addresses in the list until it is able to connect. When the server restarts in EC2 (that is when the machine on which the server is running is restarted), its endpoint address becomes invalid. If you are running multiple servers, the agent should be able to connect to another server, provided that server still has a valid endpoint address. But if you are running a single server, then your agent will not be able to reconnect without manual intervention. There are a couple of things that you can do to mitigate the possibilities of an agent not being able to connect to a server.
The server endpoint is stored in the RHQ database. If the server endpoint becomes invalid as a result of a machine restart, you will have to update the endpoint address. This can be automated by setting a property in rhq-server.properties.
Setting rhq.sync.endpoint-address will ensure that the database has valid endpoint addresses for each server, agent failover lists need to be updated as well. The cloud server plugin will do this for you. The plugin runs as a scheduled job monitoring RHQ servers for endpoint address changes. When a change is detected, the plugin notifies each of the agents connected to that server to switch over to the new endpoint address. When the agents connects on the new endpoint address, it will download an up to date failover list. Note that the agent does not have to be connected to the server for this operation to succeed. This is because the agent listens for requests from the server on its endpoint address.
|In order to be able to use this cloud server plugin, you must ensure all of your agents have been committed into inventory. Any agents not in inventory will not get their failover lists updated.|
|The cloud server plugin is not currently packaged in the RHQ distribution. To use it, you will need to build it from source. The plugin may be released in a future RHQ version.|