Bundles vs. Content - Proposal 1

Unfinished

AS8 Patching Without API Changes?

Yes! Implement it using the ContentFacet! images/author/images/icons/emoticons/smile.gif Then concentrate on bringing content into bundles as outlined below.

Executive Summary

The high level end goal of this effort is to end up with a single provisioning solution for RHQ. I.e. to somehow merge or bridge the current two solutions - the bundle and content subsystems. An important milestone in this effort is to enable one or the other subsystem with the ability to handle the upocming patching ability of JBoss AS.
What is outlined below represents the arguably more complex approach but one which, IMHO, will provide us with a more consistent and future-proof design than the discussions had so far about adding new capabilities to the bundle subsystem that would mimic the capabilities already present in the content subsystem. In that case, a lot of code would have to be essentially duplicated and we would be stuck with supporting multiple versions of essentially same interfaces.
This document discusses an approach were a minimal set of changes would be applied to the existing agent-side APIs (leaving them largely backwards compatible) which enables the reuse of a large body of existing agent plugin code and modifying the server-side APIs to accomodate the merger of the 2 subsystems.
This approach would kill 2 birds with one stone - it would enable AS patching as a consequence of the merger of the subsystems (without actually introducing hardly anything new on the agent side). This is in direct contrast with the previously discussed approach which would only enable AS patching using the bundle subsystem (by introducing new agent-side API) but would leave the overall goal untouched (or even slightly more complicated with one more API to care about).

In a nutshell, this proposal proposes 6 things:

Keep both the bundle and content agent APIs as much backwards compatible as possible (which it actually achieves barring 1 small change in the domain model that theoretically may affect some agent plugins)
Adopt bundle deployment workflows in content subsystem
Adopt content storage model for bundles
Unify auditing of bundles and content
Get rid of content sources and rely on user supplied automation scripts to do much better job
Get rid of the over-engineered concepts like "pre-parsed installation steps" and packages applicable to only certain product versions. These, IMHO, are much better handled by the users themselves through scripted automation (or even just common sense ) than by us manipulating some abstract model that tries to capture every possible nuance in deployment needs of users.

Data storage discrepancy

Apart from the fundamental difference between content and bundles, which is that content is resource-bound while bundles are filesystem location bound, another major difference that hinders the interoperability between the subsystems is the difference in their storage mechanisms.

Content subsystem is built around the idea that the deployable unit is called a PackageVersion which maps to a single (versioned) file. These files exist in a big pool on which one can create views in the form of Repos. One can also push the packages into the pool from content sources and from the resources themselves.

Bundles on the other hand have (conceptually) 2 deployable units - the bundle itself but also the individual files that the bundle consists of - these are stored as, surprisingly or not, package versions. A bundle then is stored as a repo containing files of all bundle versions it contains. Theoretically it could be possible to change and "beam down" to agents individual files of a bundle but currently there is no UI or API support for that (nor do I think it would a smart idea to allow changing a single bundle file - the whole concept of versioning the bundle would then become problematic).

Why do we need to unify?

The fundamental requirement for the two subsystems to start merging is to start unify their data storage "philosophy". As said above, content considers a package version a single deployable unit, while bundles consider a bundle (which consists of several package versions) a deployable unit. If we were able to represent a bundle as a single package version, it would bring us the advantage of keeping and reusing the existing APIs only adding new abilities to "re-purpose" them. Concretely, if a bundle version was stored as a single packageversion (not as a set of them), it would be possible have a single serverside code to handle all kinds of content (both bundles and content packages). This then could enable us to be able to re-purpose the ContentFacet on the agent as a vehicle for delivering and deploying the "typed bundles" (in another words AS patching would become so much easier - the Content APIs are well equipped for that task). Conversely, if a bundle was stored as a single package version, it would be possible to organize bundles into repos (currently there is a single bundle per repo, which causes confusion when repos are also used for content-subsystem purposes).

What are these "typed bundles" (that I already mentioned in the initial discussion of the problem) and why would we need them? This is actually a bit of a misnomer as bundles are already typed - we have Ant bundles and filetemplate bundles. By "typed bundles" I meant bundles tied to a single resource type. In the mean time we have seen this in our internal design discussions where we came up with a concept of resource-bound bundle destination that would be handled by the resource components themselves. Even during those discussions we found that that concept is very similar to CreateChildResourceFacet, namely to creating a child resource by uploading some binary data. In addition to this similarity, there is another one in the ContentFacet - the deployPackages() method.

Imagine a world, where a bundle would be a single package version (i.e. a single file). Now let's take a closer look at the above mentioned interfaces, their methods, expectations and intended usage and also how would we be able to re-purpose the ContentFacet methods to do "bundly" things:

`BundleFacet`	`CreateChildResourceFacet`	`ContentFacet`
`BundleDeployResult deployBundle(BundleDeployRequest request)` This method takes in a request to deploy a single bundle to a certain concrete file system location. The bundle files are already downloaded and the implementer can ask for a downloaded location of each `PackageVersion` that corresponds to each bundle file in the bundle. Notice that there is not a single notion of a "resource" in here, only files and locations on a file system. The fact that those file system locations are computed from information stored in the resources is actually of no concern to this method.	`CreateResourceReport createResource(CreateResourceReport report)` Let's ignore the fact that a resource can also be created from configuration data and concentrate only on the other way - using the binary data. In that case the input parameter of the `createResource` method will contain a reference to a `ResourcePackageDetails` object which is essentially a fancy reference to package version/ the file with the data. The method is supposed take that data, put it in an appropriate location, determined by its type, and report back whether the "deployment" was successful and what new resource key should the server expect in the next discoveries.	`DeployPackagesResponse deployPackages(Set<ResourcePackageDetails> packages, ContentServices contentServices)` This method takes a set of packages not because it should consider them a single deployable unit, but basically because of the possiblity of batching such requests before they are passed to the component. Therefore the set may contain packages of different kinds in any order. This looks very similar to the (multiple calls to) `BundleFacet.deployBundle` but there is an important difference here: The packages are understood as "constituent parts" of the resource and deploying them should not result in any new child resource being discovered.

So what are the fundamental differences/common points here?

All methods deploy binary data "somewhere" (being it filesystem, API, whatever)
Bundle deployment may or may not result in new resources being discovered
Creating a child should result in exactly one new resource being discovered
Deploying content to a resource should result in no new resource being discovered

If we changed expectations of all the methods so that:

Deployment may or may not result in new resources being discovered and these are the resource keys of those "candidate" resources

The methods would become semantically identical. Note that the change to the ContentFacet and BundleFacet would even be backwards compatible (we could leave CreateChildResourceFacet return just one child and deprecate its usage for content-based resources - having 1 child being created using just configuration entries seems quite logical)!

What's even more exciting about this is that if the bundles were single package versions, we could suddenly view all the content files as potentially deployable as bundles (to compatible groups of resource types corresponding to the package types of the content). I.e. we could apply the (almost unchanged) bundle workflow on data originating from the content subsystem.

`BundleFacet`	`DeleteResourceFacet`	`ContentFacet`
`BundlePurgeResult purgeBundle(BundlePurgeRequest request)` This method is given an absolute file location of the deployment and a description of the deployment to be purged. It then merely returns a success/failure message. Again, while this may or may not result in zero or more resources going unavail, it is not checked for.	`void deleteResource()` This method insctructs a resource component to delete its underlying managed resoure (i.e. this is NOT invoked from the parent, but rather from the resource itself). The resource is then automatically removed from the inventory.	`RemovePackagesResponse removePackages(Set<ResourcePackageDetails> packages)` Removes a bunch of packages "from" a resource. This act is NOT meant to cause the disappearance of the resource itself - as above, the packages are understood to be just "constituent parts" of the resource itself.

As with the content deployment discussed above, we can modify the BundleFacet and ContentFacet to report the potential disappearance of child resources as a result of content removal at which point they become functionally equivalent.
DeleteFacet does not fit too nicely here, because it operates from the POV of the removed resource itself, not from the POV of parent. At the same time, it does not conflict with the above changes.

`ContentFacet`	Discussion
`List<DeployPackageStep> generateInstallationSteps(ResourcePackageDetails packageDetails)`	This is somewhat similar to the audit messages of bundle deployment, the biggest difference being that the content installation steps are generated ahead of the time of the actual deployment, while the bundle audit messages are only generated during the deployment itself. While I can see benefit in seeing everything a deployment wants to do before we do the actual deployment, bundles have done without this ability quite nicely so far, so I would be inclined to deprecate it and actually ignore it for "content-based" bundles.
`Set<ResourcePackageDetails> discoverDeployedPackages(PackageType type)`	This would be a new concept, not currently present in bundles - discovery of already installed bundles. The difficulty with this is that this method is meant for discoverying content "inside" a resource - i.e. a WAR resource is supposed to discover its content. This would not work with bundle subsystem because the bundle containing a WAR file should logically be deployed from the parent. On the other hand, this might not be too much of a problem - "the war file discovered its contents, let's deploy that as a bundle to the other application server" doesn't sound too illogical.
`InputStream retrievePackageBits(ResourcePackageDetails packageDetails)`	Used to retrieve the bits of the discovered packages. This is needed for the component-discovered content/bundles, but would not ever be called for the generic bundles like Ant - these guys wouldn't have the ability to discover already deployed bundles (they don't now and would not gain the ability to do that in the future.)

How would we unify?

IMHO, the simplest way of representing a bundle as a single package version while at the same time keep the possibility of accessing its constituent parts is to make the package versions hierarchical (i.e. a package version could "contain" other package versions). This would enable a couple of things:

bundles would become normal packages and therefore could be "normally" processed by content workflows - i.e. could be grouped in repos
current bundle apis would (hopefully) not have to drastically change - they could still be supplied with the set of package versions that make up a bundle - it'd be just the subpackages of the bundle package.
current code consuming content API would still work - up to this point packages were not hierarchical and therefore no code in existance would assume that
new content/bundles code could exploit that new ability
we'd be in a much better place for the subsystems unitification because we would only work with 1 type of data

Details

The devil is here, obviously. While the unification at the plugin API level is quite possible as shown above, the domain model of bundles is quite elaborate and doesn't lend itself too easily to modification. Let's explore the options we have in updating the bundle domain model to consider a single package version a bundle version and a resource-type specific bundles.

BundleType

Make this a @Deprecated NON-ENTITY generated from a PackageType. An ant bundle would be a package of type "application/vnd.rhq.ant-bundle" (did I mention I woudl LOVE to have MIME types as our content type specs?). The resourceType field of the PackageType denotes the resource type that can accept packages of that type. This is different from the "resourceType" as the BundleType understands it. See the PackageType changes discussed below.

mazz

Does it seem like we should be able to support multiple resource types per bundle type now?

Bundle

@Deprecated NON_ENTITY computed from Package
bundleType => packageType, see above
repo => always null
bundleVersions => compute this from packageVersions inherited from Package
destinations => computed from destinations inherited from Package (this is a new thing in packages)
tags - move to Package

BundleVersion

@Deprecated NON_ENTITY computed from PackageVersion
description - provided by PackageVersion
name => PackageVerions.displayName, provide deprecated delegating (get|set)ters
version => PackageVersion.displayVersion, provide deprecated delegating (get|set)ters
bundleFiles => PackageVersion.childPackageVersions (the new thing we're enabling)
recipe => reuse byte[] metadata of PackageVersion
configurationDefinition => move to PackageVersion as "versionSpecificConfigurationDefinition" (there's also a packageType wide config definition, which would in the POV of bundles be used for configuring the bundle handler itself, in addition to configuring the bundle, which is what the "versionSpecificConfigurationDefinition" would be for)
tags - move to PackageVersion
versionOrder => need to understand why is it even there - why not just use in-memory OSGi version comparator which we seem to be using in the SLSB code anyway...

mazz

This is here to precompute ordering so we can do DB sorting - like sorting on version (for listing bundle versions in version order) and determining the latest version of a bundle. Otherwise, we'd have to load eveything in memory, and sort it in memory using the OSGi comparator code (the DB can't do OSGi version sorting).
bundleDeployments => new class PackageDeployments - this will basically replace the InstalledPackage concept with the more elaborate one taken over from BundleDeployment

BundleFile

@Deprecated NON_ENTITY computed from PackageVersion

PackageVersionDeployment NEW CLASS

exact copy of BundleDeployment, which become @Deprecated NON_ENTITY

PackageDestination NEW CLASS

copy of BundleDestination, which becomes @Deprecated NON_ENTITY
destinationBaseDirName, deployDir => is this generic enough, or should we take over the approach content has of specifying the config by PackageType.deploymentConfigurationDefinition/packageExtraPropertiesDefinition? I lean towards adopting the content subsystem's approach (just because of future-proofing the design).
we probably should think about supporting deployment to single resources (without making single-memeber groups, because bundles will replace "create child resource")

PackageResourceDeployment NEW CLASS

exact copy of BundleResourceDeployment, which becomes @Deprecated NON_ENTITY

BundleResourceDeploymentHistory

@Deprecated NON_ENTITY generated from PackageVersionResourceOperationAuditTrail

PackageType

resourceType => resourceTypes - we need a single package type deployable to zero or multiple resource types - e.g. WAR into tomcat, AS, ... This will be an API breakage!!

mazz

Ah! OK, this answers my earlier comment above when I ask "Does it seem like we should be able to support multiple resource types per bundle type now?" I see the answer is "yes"
deployerResourceType => new thing to model the "bundle handler" resource type. All the original "bundle types" would be converted to a new package type by having resourceTypes empty (meaning all) and deployerResourceType equal to the "resourceType" field of the bundle type. The existing package types would just have the deployerResourceType null, which would mean the resource types themselves are responsible for deploying.
destinationConfigurationDefinition => new thing to model the "deployDir + destinationName" combo hardcoded for bundle types.

Package

No changes required apart from obtaining "tags" from Bundle.

PackageVersion

see BundleVersion discussion above
Set<PackageVersion> childPackageVersions - note that having a set of concrete package versions circumvent the problem of version updates by simply ignoring it. I.e. we don't even consider the question of what happens when a new version of a package gets uploaded to the system. All the folders just still contain the package versions they contained before that.
packageBits - this becomes problematic for package versions with children - how do we define or work with such package versions? My gut optinion is that a package version either has content or children and can never have both (similar to directory vs file). But what should happen if the user wants to retrieve bits of a "folder" package version? Should we return null, or zip up the contents of the children? I lean towards the returning of null because that way we remain consistent with the originl meaning of the package bits (in previous versions, we had no "folder" package versions and assumed that a package version has contents. This would remain true and no special treatment (apart from a null check) would be required for the "folder" package versions should the old code ever have to deal with such packages (which, it shouldn't)).

PackageVersionResourceOperationRequest NEW CLASS

This serves as the base for the audit trail of operations on a resource. This class is introduced to provide a single point of history entry where multiple messages, modelled by PackageVersionResourceOperationAuditTrail, can be grouped together (think multiple messages generated during bundle deployment).
This class closely resembles the ContentServiceRequest class that serves the same purpose for content subsystem.

PackageVersionResourceOperationAuditTrail NEW CLASS

This class in an amalgamation of InstalledPackageHistory and BundleResourceDeploymentHistory providing audit trail for package operations on a resource.

InstalledPackage

@Deprecated NON_ENTITY generated from PackageVersionDeployment (aka BundleDeployment)

ContentServiceRequest

@Deprecated NON_ENTITY generated from PackageVersionResourceOperationRequest

InstalledPackageHistory

@Deprecated NON_ENTITY generated from PackageVersionRersourceOperationAuditTrail.

Consequences of unification

Data migration: existing bundles converted to the new elaborate packages and shoved into an autogenerated "Bundles" repo.
Bundles accessible through repos, we inherit the slightly more granular authz on repos
UI: top most level are now repos, not the bundle themselves, resource-discovered bundles - where to put them?
Hierarchical package versions lend themselves more nicely (I suppose) to swapping the storage backend from DB to some JCR impl - repo would become a "connection" to a JCR server and package structure would map the contents of the JCR repo?
ContentManagerRemote.getBackingContentForResource() and generally the problem of "content-backed" resources - if we loosen up the relationship between a resource and its backing content in the way outlined above (i.e. any number of new resources may be expected as a result of deploying a bundle - be it through BundleFacet, ContentFacet or a (deprecated) CreateChildResourceFacet), we will introduce a kind of duplicate content in the DB (which kindof exists even today): an ANT bundle creates a WAR deployment, we store the bundle and its consitutent parts in the DB. Once a WAR is discovered, its backing content is discovered (because it is has "creation type" "CONTENT") and may be uploaded to the server, too. This potentially leaves us with 2 copies of the WAR file lying in the database. Notice that this example can happen even today, but the unification of workflows will make it more frequent.
- Btw. I think there is a bug in here: BZ 902823
On the server side I think the best idea would be to implement a brand new interface - ProvisioningManager(Local|Remote), to which the current interfaces (BundleManager(Local|Remote), ContentManager(Local|Remote)) would merely delegate. This way we would end up in a situation where:
- on the agent side, we'd still have 2 interfaces (BundleFacet and ContentFacet) but their usage pattern on the agent is so different that I think this is still warranted.
- on the server side we'd have a brand new remote/local API that we'd start using and the old APIs can be @Deprecated and just delegate to the new API.
- UI will see quite big changes (most of which I haven't yet even started to think about), but we knew that we needed changes anyway.

Package Types

This is an OPTIONAL (but great to have) change that would bring about some backwards incompatiblities!

At the moment, RHQ has 3 sources of package types:

Agent plugins
Bundles (each bundle gets its own "artificial" package type)
Package Type server plugins

The names of the package types are completely arbitrary and when a content source for example fetches the content, it needs to create packages of types that exactly correspond to those names. Also note that a package type is tied to a single resource type (which kinda makes sense in the current setup, where the resource type is the source of the package type). But this makes things rather complicated when you want for example to fetch "WAR files" from a content source and then distribute them in your environment, where there's a mixture of Tomcats and ASes. Because a package type is bound to a single resource type, how is the content source supposed to know which WAR the admin will want to deploy to a Tomcat instance and which one to JBoss AS?

My proposal here is simple:

Package type name be unique: a MIME type (application/zip)
Content specification in the agents optionally specify a file extension (to accomodate the lack of a specific mimetype for e.g. WAR files)
- <content name="library" displayName="Java Library" mime-types="application/zip,application/x-java-archive" extensions="jar"/> would create or just acknowledge existance of 2 package types: application/zip and application/x-java-archive. The "name" attribute would become unused and obsolete.
On the server side we already have package type plugins that are just a hack right now to enable custom version-format checking. These could be enhanced to actually detect the mimetypes of content.
multiple resource types attached to the same package type

Areas Untouched By This Proposal

Content Sources

If we went ahead with the package type changes proposed above, changes to content source plugins would be needed but I have not investigated this thoroughly. In my mind, content sources, while nice conceptually, don't offer a nice and easy way of pulling content from various sources - in my mind that would be much more extensible and approachable from users perspective, if things like downloading new packages from some remote location could be done using a simple periodically run CLI script executed on the server or even remotely.

Other ideas for discussion

Hierarchical repos instead of package versions - nicer fit with the existing bundles impl, but makes it difficult to integrate the with the current content APIs
Don't make package versions hierarchical, but store the "children" as zipped contents, with the list of children stored as metadata on the package version - perf. problems with updates (the "upload recipe first, files later" approach)?, how to store a recipe if metadata occupied with the list of contents?

Proposed Domain Model Changes Graphically

The current domain model of the content and bundle subsystems can be represented by this (slightly simplified) class diagram:

images/author/download/attachments/73139617/ContentAndBundleClassDiagram.png

The proposed changes are represented in the following class diagram (the changes are represented by the <<NEW>> stereotype on the classes, attributes and associations):

images/author/download/attachments/73139617/ProposedClassDiagramChanges.png

Modifications of workflows

Upload a bundle

The only thing we need to change here is to select a repo where to store the bundle package. There will be implementation differences, of course, because adding a file to a bundle would mean adding a child package version, but the workflow would remain the same.

Upload a package

Today, if a user wants to upload a package, s/he needs to manually select the package type for that package (with all the wonderfulness of "pick one of EAR File:JBoss AS and EAR File:JBoss AS5"). I imagine this to become a 2-step process: 1) just upload the file, let the package type plugins try to figure out the package type. In the second step, let the user confirm it, change or create a new package type (which is an advanced option because of the need to know the MIME type of the file).

Create Child Resource

Creating a child using configuration would probably see no changes. Creating a child using content would follow the "upload package" workflow and then change to bundle deployment workflow.

Connect a resource to a repo

As it stands right now, this workflow wouldn't be available. Rather than having this abstract concept, I think it would be easier for the user to just have a nice "package picker" that would let them search through the repos for packages compatible with given resource type.

Install a new package to a resource

Because we'd be ditching content sources, this workflow wouldn't be available.

Install a package

This is available in two forms today:

On a resource, pick a package from one of the subscribed repos
On a repo, pick a package to install to all subscribed resources

Since resource subscription wouldn't be available, the first workflow could be replaced by a simple "package picker" as methoded above, while the second workflow is basically equal to today's bundle deployment.

Bundle deployment

This would see no changes as the new implementation adopts the bundle deployment workflow.

Define a content source

This would be replaced by a user CLI script that could do much more than just download a package from somewhere. The script could in one go download the script, put it in relevant repos, update existing bundles/packages with the new versions downloaded and straight away install it. We needed a whole elaborate system of screens and different workflows and schedules and whatnot before, which I think is one of the main reasons for the low adoption of content subsystem.

RHQ 4.9

Bundles vs. Content - Proposal 1

Executive Summary

Data storage discrepancy

Why do we need to unify?

How would we unify?

Details

BundleType

Bundle

BundleVersion

BundleFile

PackageVersionDeployment NEW CLASS

PackageDestination NEW CLASS

PackageResourceDeployment NEW CLASS

BundleResourceDeploymentHistory

PackageType

Package

PackageVersion

PackageVersionResourceOperationRequest NEW CLASS

PackageVersionResourceOperationAuditTrail NEW CLASS

InstalledPackage

ContentServiceRequest

InstalledPackageHistory

Consequences of unification

Package Types

Areas Untouched By This Proposal

Content Sources

Other ideas for discussion

Proposed Domain Model Changes Graphically

Modifications of workflows

Upload a bundle

Upload a package

Create Child Resource

Connect a resource to a repo

Install a new package to a resource

Install a package

Bundle deployment

Define a content source

Subscribe a repo to a content source

Synchronize content source

Synchronize repo