Design-AgentUniquePrefsLocation

Overview of Problem

This is tracked via BZ 768706

Currently when an Agent starts up, unless a preferences node name is specified via the -p/--pref option, the Agent will read its preferences from the node named "default".

If someone wants to preconfigure their agent and provide a fully configured agent-configuration.xml, they have the option of putting their own preference node name directly in the .xml file:
<node name="your-custom-node-name">

And unless an alternate user preferences location is specified via -Djava.util.prefs.userRoot, the preferences are read from $HOME/.java/.userPrefs/. This means that if a user wants to do one of the following:

1) run multiple Agents on a single machine
2) run Agents as the same user on multiple machines that share NIS users and NFS home dirs

then they need to remember to specify a unique prefs node name and/or a unique user prefs location for each Agent, otherwise the Agents will clobber each others' preferences and chaos will ensue. The problem is that most customers do not realize this until it's too late.

Proposed Solution

Agent Setup

At initial setup time, if -p/--pref options is specified on the command line, use that as the prefs node name. Otherwise, once the user has specified the agent name (or it has been read from agent-configuration.xml), use that as the prefs node name.

If a prefs node with the determined name already exists in the user preferences store, warn the user, e.g. "An RHQ Agent preferences node named 'foo' already exists in your user preferences - are you sure you want to overwrite it with a new set of preferences? (y|n)"

Write the prefs node name out to one of the following locations:

a) $RHQ_AGENT_HOME/bin/rhq-agent-env.sh:

RHQ_AGENT_CMDLINE_OPTS="${RHQ_AGENT_CMDLINE_OPTS} --pref=foo"

mazz: Should we consider a new env var? like RHQ_AGENT_PREF=foo

mazz: Remember that we have two -env scripts - one for windows and one for unix. Anything we support for one we have to make sure we can support for both. I have code that let's us add new lines in .bat/.sh code in the agent plugin. That's one reason why I like making this a separate env var - because it means in the agent resource's Environment Script child service resource, we'll see this new setting in the list-o-maps for that resource's configuration.

b) $RHQ_AGENT_HOME/conf/.prefs-node-name

foo

mazz: for some reason that I can't quite articulate, I think I like this option better than a). We do have to make sure we copy this over in the agent update ant script (we might already copy over files that we find in the conf/ directory, so there might not be much, if anything, to do here. But we do have to make sure agent upgrade will work. Shouldn't be hard to fix this if needed

This way, when it's restarted, the Agent will know what its prefs node name is, without the user having to remember to pass the --pref option every time.

Note, we would not store the prefs node name anywhere under $RHQ_AGENT_HOME/data/, because if the user decides to purge their data dir, we want the Agent to still remember where its preferences are stored.

Make .prefs-node-name a hidden file and set its perms to 400 to make it hard for the user to accidentally delete or modify it.

mazz: any perm requirements also must make it into agent update script (i.e. the auto-update stuff has to maintain the proper perms once it copies, backs up, restores, or lays down this file)

Advantages of a) are that we're reusing the existing mechanism for specifying the prefs node name and that we already copy over rhq-agent-env.sh during agent upgrades. A disadvantage of a) is that rhq-agent-env.sh would be trickier to update, since it's a shell script that the user may have edited, particularly in the case of an Agent being upgraded.

mazz: We have code in the agent plugin that does this. This is how we can have a list-o-map <resource-configuration> for the Environment Script resource

Subsequent Agent Starts

On subsequent starts of the Agent, if -p/--pref is specified, use that as the prefs node name. If we go with option b) above, then -p would trump .prefs-node-name. but the Agent would ask the user to confirm, e.g. "Are you sure you wish to override the prefs node name 'foo' that this Agent was previously using? (y|n)".

mazz: must make sure we don't prohibit or make it harder to support automated installs. Asking for questions outside of the normal setup questions means an admin must be manually (either remotely or on site at the keyboard) there to respond to the prompts. I think spitting out a warning both in the log and on the console would provide information without forcing the install to require a person to enter "y" or "n" to get the install to continue. Can we look to see if we are in daemon mode (-d,--daemon) - if we are, don't ask the question. If we are not, we can ask the question.

If -p is not specified and we went with option b) above, then read the prefs node name from .prefs-node-name; if .prefs-node-name does not exist, assume this is an initial setup (however, if we see files in data/, plugins/, and/or logs/, we could print a warning, e.g. "This is not a brand new Agent install. Are you sure you want to proceed with initial setup, rather than specifying a user preferences node name where the Agent's configuration can be found?" We could even check what prefs nodes exist, and inform the user "There are 2 existing RHQ Agent preference nodes named 'foo' and 'bar'".)

Backward Compatibility Issues

Currently, RHQ Agents assume the preferences node name is "default" unless -p is specified. When we introduce this new feature, we need to support the case when the new upgraded agent runs but there is no prefs-node-name file in conf/ directory. Here's how we can propose we do this:

IF agent does not have prefs-node-name THEN
   IF agent specified {{-p the-node-name}} THEN
      write {{the-node-name}} to $RHQ_AGENT_HOME/conf/prefs-node-name (where agent-configuration.xml is)
   ELSE
      // no prefs-node-name and no \-p
      get the 'rhq-agent' root user pref node
      see if there are any nodes under the 'rhq-agent' node
      IF there is only one node (either 'default' or whatever) THEN
          write the node name to prefs-node-name file
      ELSE IF there is more than one node  THEN
          pick the first one and write it to prefs-node-name
          // TODO is this the right thing? perhaps we should ask, if in daemon mode, abort startup
      ELSE IF there are no nodes THEN
          ask the user for one?
          // TODO what if in daemon mode?
      END IF
   END IF
END IF

New Requirement

The difficulty is that the AgentMain constructor processes the command line arguments and loads the configuration, while the main() method launches the setup questions when required - this means the agent name needs to be known in the constructor but it potentially isn't known until the setup questions are answered in main() (there are other ways the setup questions are asked, all outside the constructor). This catch-22 is a problem.

To get around this, I think we may have to make a change here to the agent startup. We need a way for the agent name to be defined during construction time - either as a command line argument, an environment variable or some other mechanism. We need to make the agent name NOT a question a user can set during setup questions - we have to know the agent name first thing.

Some potential solutions:

RHQ_AGENT_NAME environment variable
-a / --agentname command line argument
If -p / --pref is specified, assume it is not only the preference node name but ALSO the agent name
Have a pre-setup question - asking for the agent name
If the user doesn't explicitly say what the agent name is, we use the same default that we use today (that is the canonical hostname)

We can then to the rest of what this page recommends we do - that is, we write out the agent name and use it as the preference node name.

Another different alternative is to NOT use agent name as the name of the unique preference node name. Instead, we can use some other unique (perhaps random?) node name. We could use the same default that we use for agent name (that is, the canonical hostname) but it wouldn't be the agent name itself. for example, I could start the agent on machine X and set my agetn name "foo" - the preference node would be X even though the agent is named "foo". I could make the agent name X as well, but this would be independent of the preference node name (even though they happen to be identical).

Problems and Miscelleneous Thoughts

Here's all the problems I'm thinking of when trying to come up with a solution.

1) Use agent name as the preference node name

The preferences are where the agent name is stored. Catch-22 - can't get the preferences until you have the agent name; can't get the agent name until you have the preference node name

1a) Just use the local host's DNS hostname as the agent name - you can use that as the preference name

This goes against the whole reason why we don't use the DNS hostname in the first place for the agent name. Because an agent's DNS hostname may change during the lifetime of the agent and we don't necessarily want the agent ot be considered a different agent when that happens (IT departments rename machines for any number of reasons - reallocating names in namespaces, renaming a sub-domain, etc). Well, if we make the domain name the preference node name, the same thing happens. It means the agent will lose its configuration by the simply fact of its DNS name changing. Thus, artificially linking the name of the agent or the preference node to the DNS hostname could bring in just as many problems as it is solving. I suppose we can use this in conjunection with the conf/pref-node-name (that is, use hostname as the pref node name and write that in the conf/ file). So the only time when the config will be lost (in the event of a hostname change) is if you do a clean install and expect the config to survive. But of course, if hostnames were just shuffled around (meaning, your hostname was reassigned to another agent box), then you are really screwed because you'll be getting that other agent's config and not realize you just swapped config with another agent. And this would be really hard to figure out if it happens to you.

2) Store preference node name in conf/pref-node-name

How do you know what the preference node name is going to be the first time we start, when we don't have this file yet? We can force the user to pass in --pref; but we ask users to do that today and that's not changing anything. If we say, "just use the agent name as the pref name" or "use the hostname", see problem #1 and #1a above.

3) Ask the user up front what the agent name is, then continue with startup using the agent name as the preference node.

This breaks the ability to have the agent fully pre-configured. It means you have to ALWAYS run the agent at least one time in a console window to answer the question "what is the agent name".

4) Allow for a cmdline arg --agentname to set the agent name - to avoid asking the user at the console for it, thus solving #3 problem

This requires that all agents cannot be started with the same command line. You have to configure each machine to run the agent with a specific cmdline option. The same thing if you don't use a cmdline option but put it in the -env.sh script. This means ALL of your agents each have to be individually configured special to run on that machine. This breaks the ability to have a standard startup workflow to do, "unzip agent, run "rhq-agent.sh"". To be honest, I don't see how this is much different from requesting them to use --pref, however, the one difference is that --pref doesn't have to be the unique agent name - thus it need only be unique to that machine (or more specifically, unique to the user running the agent, since $HOME/.java is where the prefs are stored and the node names have to be unique under there). so the difference is --pref can be pre-canned in the startup routine by simply using some rules like "agent1" ... "agentN" where N is the number of agents running with the $HOME/.java location and they are not related to actual agent name.

5) Backward compatibility.

Shutdown the old agent. Unzip the new agent (or auto-uprade). Start the new agent. The agent has configuration that we want to maintain/keep. We can say, look at the "default" preference node and if its there, just copy it (put it in the preference node named the same as the agent name as found in the preference node "default"). But suppose I want to now run two agents (now that we have this great new feature we are supposed to be implementing). Both will see the same "default" node name - so both will attempt to re-use it - both will attempt to reuse the same agent name. Its probably best for that first agent to use the "default" node and then immediately delete it when it is finished with it (first one wins). But make sure you don't run the agents simultenously images/author/images/icons/emoticons/smile.gif as they might both see the default node at the same time.

But now do a clean install of the agent (removing the conf/pref-node-name file). Now what? The default node is gone (because we said we should delete it after loading the old legacy config). We would have to scan the rhq-agent preference node parent, and if there is one node, assume that's the one (is that a good assumption?). If there is two or more nodes, we're screwed - we don't know which one to use. We would have to ask - thus causing more questions to be prompted on the user and thus not allowing a remote clean install to happen without requiring manual intervention. It would mean the user would have to specify --pref on the command line - but we can do that today without any changes!

What If We Just Abort If We Detect a Problem?

What if we just not start the agent if we see a clash of preferences and force the user to pass in --pref to correct the clash? For example...

Start clean. Install agent #1. It sees it has no preferences so it creates "default" node AND it writes "default" to conf/pref-node. Install agent #2. It sees there is already a default node BUT it does not have conf/pref-node with the value "default" in it. This means either a) an agent already took the default preference node OR b) this is a clean agent install. In the case of a) we need to abort the startup because we don't want this agent to steal that pref node - the user should do something (like pass in --pref). In the case of b), if we are in daemon mode, we need to abort but if we are in console mode, we can ask if we should reuse the preference node. To support daemon mode not aborting, perhaps we can support a cmdline option like -P to mean "reuse the preference node if it already exists". This just seems like alot more added complexity though.

What about just storing config in agent-config.xml?

One way we can really fix this is to revamp the way we do the setup of the configuration (we talked about this in the call - how we don't put Preferences under the setup questions impl). Today, we do everything with Java Preferences. We can still use Java Preferences to store the config, but if there is no initial set of preferences, we can can ask those setup questions (which can include the agent name AND a preference node name - where the pref node name can be the agent name by default) and only after the setup questions are answered do we write the config out to the Java Preferences store. See, today, we do everything with the Preferences as the backing store - which is why we need them to exist already - even before we start the setup questioning (this is because we get all the defaults from agent-configuration.xml). The entire way the setup question stuff works is with Preferences as the underlying storage/retrieval mechanism. Perhaps we can refactor this to use simple in-memory java.util.Properties. This would allow us to, say, ask for the agent name and preference name and THEN look for the Java Preferences - if there are none, keep asking the questions. We would STILL need to store the agent name and/or preference node name in the agent conf/ because at subsequent agent start ups, we need to know where to find the agent config.

We could even just use agent-configuration.xml as the backing store. When we to the agent auto-update, we can just copy the old agent-configuration.xml as part of the upgrade. HOWEVER, this breaks down when you do a clean install but want to maintain a previous config (e.g. you can't just "rm -rf <agent-install>" and then unzip a clean agent distro and expect it to be pre configured since the rm will remove the agent-configuration.xml - this is true for any model we implement that stores config under the agent install dir).

But one nice thing we have with our Preferences impl is we can upgrade our Preferences settings. For example, if going from RHQ version X to version Y we add or remove a preference setting, we have a nice upgrade mechanism that does this. If, for example, a preference setting changes its semantics, or changes its default settings, this upgrade mechanism can also be used to fix up those old preferences to mesh with the new preferences (we have already used this in the past).

I also don't know how easy it would be to make sure auto-upgrade will work with just having the config in agent-configuration.xml - it could be as easy as copying the file over, but I'd have to look to see if the old agent files are accessible durign auto-upgrade (I think they are, but not 100% sure).

JBoss Community Archive (Read Only)

RHQ 4.9

Design-AgentUniquePrefsLocation

Overview of Problem

Proposed Solution

Agent Setup

Subsequent Agent Starts

Backward Compatibility Issues

New Requirement

Problems and Miscelleneous Thoughts