JBoss.orgCommunity Documentation
Teiid supports materialized views. Materialized views are just like other views, but their transformations are pre-computed and stored just like a regular table. When queries are issued against the views through the Teiid Server, the cached results are used. This saves the cost of accessing all the underlying data sources and re-computing the view transforms each time a query is executed.
Materialized views are appropriate when the underlying data does not change rapidly, or when it is acceptable to retrieve data that is "stale" within some period of time, or when it is preferred for end-user queries to access staged data rather than placing additional query load on operational sources.
Caching of relational table or view records (pre-computing all transformations)
Model-based definition of virtual groups to cache (requires Teiid Designer)
User ability to override use of materialized view cache for
specific queries through OPTION NOCACHE
Similar to cached procedures, materialized view tables are used automatically when a query accesses the corresponding view.
Usage of the cached results may be bypassed with an OPTION NOCACHE
clause. See the OPTION NOCACHE section for more on its usage.
Materialized views are defined in Teiid Designer by setting the materialized property on a table or view in a virtual (view) relational model. Setting this property's value to true (the default is false) allows the data generated for this virtual table to be treated as a materialized view.
It is important to ensure that all key/index information is present as these will be used by the materialization process to enhance the performance of the materialized table.
The target materialized table may also be set in the properties. If the value is left blank, the default, then internal materialization will be used. Otherwise for external materialization, the value should reference the fully qualified name of a table (or possibly view) with the same columns as the materialized view. For most basic scenarios the simplicity of internal materialization makes it the more appealing option.
Reasons to use external materialization
The cached data needs to be fully durable. Internal materialization should not survive a cluster restart.
Full control is needed of loading and refresh. Internal materialization does offer several system supported methods for refreshing, but does not give full access to the materialized table.
Control is needed over the materialized table definition. Internal materialization does support secondary indexes, but they cannot be directly controlled. Constraints or other database features cannot be added to internal materialization tables.
The data volume is large. Internal materialization (and temp tables in general) have memory overhead for each page. A rough guideline is that there can be 100 million rows in all materialized tables across all VDBs for every gigabyte of heap.
Materialized view tables are always scoped to the VDB. If a materialized view definition directly or transitively contains a non-deterministic function call, such as random or hasRole, the resulting table will contain only the initially evaluated values. In most instances you should consider nesting a materialized view without the deterministic results that is joined with relevant non-deterministic values in a parent view.
External materialized views cache their data in an external database system. External materialized views give the administrator full control over the loading and refresh strategies.
Since the actual physical cache for materialized views is maintained external to the Teiid system, there is no predefined policy for clearing and managing the cache. These policies will be defined and enforced by administrators of the Teiid system.
Typical Usage Steps
Create materialized views and corresponding physical materialized target tables in Designer. This can be done through setting the materialized and target table manually, or by selecting the desired views, right clicking, then selecting Modeling->"Create Materialized Views"
Generate the DDL for your physical model materialization target tables. This can be done by selecting the model, right clicking, then choosing Export->"Metadata Modeling"->"Data Definition Language (DDL) File". This script can be used to create the desired schema for your materialization target on whatever source you choose.
Determine a load and refresh strategy. With the schema created the most simplistic approach is to just load the data.
The load can even be done through Teiid with insert into target_table select * from matview option nocache
.
That however may be too simplistic because your index creation may be more performant if deferred until after the table has been created.
Also full snapshot refreshes are best done to a staging table then swapping it for the existing physical table to ensure that the refresh
does not impact user queries and to ensure that the table is valid prior to use.
Internal materialization creates Teiid temporary tables to hold the materialized table. While these tables are not fully durable, they perform well in most circumstances and the data is present at each Teiid instance which removes the single point of failure and network overhead of an external database. Internal materialization also provides more built-in facilities for refreshing and monitoring.
The cache hint, when used in the context of an internal materialized view transformation query, provides the ability to fine tune the materialized table. The pref_mem option also applies to internal materialized views. Internal table index pages already have a memory preference, so the perf_mem option indicates that the data pages should prefer memory as well.
An internal materialized view table is initially in an invalid state (there is no data). The first user query will trigger an implicit loading of the data.
All other queries against the materialized view will block until the load completes.
In some situations administrators may wish to better control when the cache is loaded with a call to SYSADMIN.refreshMatView
. The initial load may itself trigger the initial load
of dependent materialized views. After the initial load user queries against the materialized view table will only block if it is in an invalid state.
The valid state may also be controlled through the SYSADMIN.refreshMatView
procedure.
Example 3.1. Invalidating Refresh
CALL SYSADMIN.refreshMatView(viewname=>'schema.matview', invalidate=>true)
matview will be refreshed and user queries will block until the refresh is complete (or fails).
While the initial load may trigger a transitive loading of dependent materialized views,
subsequent refreshes performed with refreshMatView
will use dependent materialized view tables if they exist. Only one load may occur at a time. If a load is already in progress when
the SYSADMIN.refreshMatView
procedure is called, it will return -1 immediately rather than preempting the current load.
When Teiid is running clustered mode, after of loading of internal materialized view contents, an event will be sent to all the other participating nodes in the cluster to refresh the contents from the original node in asynchronus fashion. During this remote node loading process, if the node that is reading the contents gets a request from user to serve the results of that view, then the current results in cache will be served. If no results were available at that node, then request will be blocked until load process is finished.
When a Teiid node joins the cluster, at the end of start-up cycle, an asynchronus job will be started to fetch all the previously cached internal materialized views at other nodes for the deployed VDBs. The query request behaviour during this load process is same as above.
In the clustered mode, the "invalidate=true" flag in the "SYSADMIN.refreshMatView" procedure will only apply to the node that is refreshing the contents from source. All other nodes, will still serve the old contents during the refresh process.
The cache hint may be used to automatically trigger a full snapshot refresh after a specified time to live (ttl).
The ttl starts from the time the table is finished loading.
The refresh is equivalent to CALL SYSADMIN.refreshMatView('view name', false)
, but performed asynchronously so that user queries do not block on the load.
Example 3.2. Auto-refresh Transformation Query
/*+ cache(ttl:3600000) */ select t.col, t1.col from t, t1 where t.id = t1.id
The resulting materialized view will be reloaded every hour (3600000 milliseconds).
The automatic ttl refresh is not intended for complex loading scenarios, as nested materialized views will be used by the refresh query.
The automatic ttl refresh is performed lazily, that is it is only trigger by using the table after the ttl has expired. For infrequently used tables with long load times, this means that data may be used well past the intended ttl.
In advanced use-cases the cache hint may also be used to mark an internal materialized view as updatable.
An updatable internal materialized view may use the SYSADMIN.refreshMatViewRow
procedure to update a single row in the materialized table.
If the source row exists, the materialized view table row will be updated. If the source row does not exist, the correpsonding materialized row will be deleted.
To be updatable the materialized view must have a single column primary key. Composite keys are not yet supported by SYSADMIN.refreshMatViewRow
.
Example 3.3. Updatable Transformation Query
Transformation Query:
/*+ cache(updatable) */ select t.col, t1.col from t, t1 where t.id = t1.id
Update SQL:
CALL SYSADMIN.updateMatViewRow(viewname=>'schema.matview', key=>5)
Given that the schema.matview defines an integer column col as its primary key, the update will check the live source(s) for the row values.
The update query will not use dependent materialized view tables, so care should be taken to ensure that getting a single
row from this transformation query performs well. See the Reference Guide for information on controlling dependent joins, which may be applicable to increasing the performance of retrieving a single row.
The refresh query does use nested caches, so this refresh method should be used with caution.
When the updatable option is not specified, accessing the materialized view table is more efficient because modifications do not need to be considered. Therefore, only specify the updatable option if row based incremental updates are needed. Even when performing row updates, full snapshot refreshes may be needed to ensure consistency.
Internal materialized view tables will automatically create non-unique indexes for each unique constraint and index defined on the materialized view. These indexes are created as non-unique even for unique constraints since the materialized table is not intended as an enforcement point for data integrity and when updatable the table may not be consistent with underlying values and thus unable to satisfy constraints. The primary key (if it exists) of the view will automatically be part of the covered columns for the index.
The secondary indexes are always created as trees - bitmap or hash indexes are not supported. Teiid's metadata for indexes is currently limited. We are not currently able to capture additional information, such as specifying the evaluated expressions, sort direction, additional columns to cover, etc. You may workaround some of these limitations though.
If a function based index is needed, consider adding another column to the view that projects the function expression, then place an index on that new column. Queries to the view will need to be modified as appropriate though to make use of the new column/index.
If additional covered columns are needed, they may simply be added to the index columns. This however is only applicable to comparable types. Adding additional columns will increase the amount of space used by the index, but may allow its usage to result in higher performance when only the covered columns are used and the main table is not consulted.
Each member in a cluster maintains its own copy of each materialized table and associated indexes. With cache clustering enabled, an additional snapshot copy of the table is maintained for loading by other members. An attempt is made to ensure each member receives the same full refresh events as the others. Full consistency for updatable materialized views however is not guaranteed. Periodic full refreshes of updatable materialized view tables helps ensure consistency among members.
Loads of materialized tables are not coordinated across the cluster. It is possible for the same ttl expiration to trigger a load at each member.
In many clustered scenarios using external materialization is advantageous to fully control the loading of the tables and to have materialized data that is durable.