JBoss.orgCommunity Documentation
Teiid supports SQL for issuing queries and for defining view transformations; see also Procedure Language for how SQL is used in virtual procedures and update procedures.
Teiid provides nearly all of the functionality of SQL-92 DML. SQL-99 and later features are constantly being added based upon community need. The following does not attempt to cover SQL exhaustively, but rather highlights SQL's usage within Teiid. See the grammar for the exact form of SQL accepted by Teiid.
SQL commands contain references to tables and columns. These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command. All queries are processed in the context of a virtual database, or VDB. Because information can be federated across multiple sources, tables and columns must be scoped in some manner to avoid conflicts. This scoping is provided by models, which contain the information for each data source or set of views.
Fully-qualified table and column names are of the following form, where the separate 'parts' of the identifier are delimited by periods.
TABLE: <model_name>.<table_spec>
COLUMN: <model_name>.<table_spec>.<column_name>
Syntax Rules:
Identifiers can consist of alphanumeric characters, or the underscore (_) character, and must begin with an alphabetic character. Any Unicode character may be used in an identifier.
Because different data sources organize tables in different ways, some prepending catalog or schema or user information, Teiid allows the 'table_spec' to be a dot-delimited construct.
Identifiers are not case-sensitive in Teiid.
The separate parts of an identifier can be quoted, with double quotes. This is not required, but some tools do this automatically. Quotes establish another level of grouping, in addition to the dot delimiters. Quotes should not be used in such a way that the table specification, which may itself have multiple parts, is split between two quoted sections.
Some examples of valid fully-qualified table identifiers are:
MyModel.MySchema.Portfolios
"MyModel"."MySchema.Portfolios"
MyModel.Inventory
MyModel.MyCatalog.dbo.Authors
Some examples of valid fully-qualified column identifiers are:
MyModel.MySchema.Portfolios.portfolioID
"MyModel"."MySchema.Portfolios"."portfolioID"
MyModel.Inventory.totalPallets
MyModel.MyCatalog.dbo.Authors.lastName
Fully-qualified identifiers can always be used in SQL commands. Partially- or unqualified forms can also be used, as long as the resulting names are unambiguous in the context of the command. Different forms of qualification can be mixed in the same query.
Identifiers, literals, and functions can be combined into expressions. Expressions can be used almost anywhere in a query -- SELECT, FROM (if specifying join criteria, WHERE, GROUP BY, HAVING. However you currently cannot use expressions in an ORDER BY clause.
Teiid supports the following types of expressions:
Column identifiers are used to specify the output columns in SELECT statements, the columns and their values for INSERT and UPDATE statements, and criteria used in WHERE and FROM clauses. They are also used in GROUP BY, HAVING, and ORDER BY clauses. The syntax for column identifiers was defined in the Identifiers section above.
Literal values represent fixed values. These can any of the 'standard' data types.
Syntax Rules:
Integer values will be assigned an integral data type big enough to hold the value (integer, long, or biginteger).
Floating point values will always be parsed as a double.
The keyword 'null' is used to represent an absent or unknown value and is inherently untyped. In many cases, a null literal value will be assigned an implied type based on context. For example, in the function '5 + null', the null value will be assigned the type 'integer' to match the type of the value '5'. A null literal used in the SELECT clause of a query with no implied context will be assigned to type 'string'.
Some examples of simple literal values are:
‘abc’
‘isn’’t true’
- use an extra single tick to escape a tick in a string with single ticks
5
-37.75e01
- scientific notation
100.0
- parsed as double
true
false
'\u0027'
- unicode character
Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group.
Teiid supports the following aggregate functions:
COUNT(*) – count the number of values (including nulls and duplicates) in a group
COUNT(expression) – count the number of values (excluding nulls) in a group
SUM(expression) – sum of the values (excluding nulls) in a group
AVG(expression) – average of the values (excluding nulls) in a group
MIN(expression) – minimum value in a group (excluding null)
MAX(expression) – maximum value in a group (excluding null)
Syntax Rules:
Some aggregate functions may contain a keyword 'DISTINCT' before the expression, indicating that duplicate expression values should be ignored. DISTINCT is not allowed in COUNT(*) and is not meaningful in MIN or MAX (result would be unchanged), so it can be used in COUNT, SUM, and AVG.
Aggregate functions may only be used in the HAVING or SELECT clauses and may not be nested within another aggregate function.
Aggregate functions may be nested inside other functions.
For more information on aggregates, see the sections on GROUP BY or HAVING.
Teiid supports two forms of the CASE expression which allows conditional logic in a scalar expression.
Supported forms:
CASE <expr> ( WHEN <expr> THEN <expr>)+ [ELSE expr] END
CASE ( WHEN <criteria> THEN <expr>)+ [ELSE expr] END
Each form allows for an output based on conditional logic. The first form starts with an initial expression and evaluates WHEN expressions until the values match, and outputs the THEN expression. If no WHEN is matched, the ELSE expression is output. If no WHEN is matched and no ELSE is specified, a null literal value is output. The second form (the searched case expression) searches the WHEN clauses, which specify an arbitrary criteria to evaluate. If any criteria evaluates to true, the THEN expression is evaluated and output. If no WHEN is true, the ELSE is evaluated or NULL is output if none exists.
Subqueries can be used to produce a single scalar value in the SELECT, WHERE, or HAVING clauses only. A scalar subquery must have a single column in the SELECT clause and should return either 0 or 1 row. If no rows are returned, null will be returned as the scalar subquery value. For other types of subqueries, see the Subqueries section below.
Criteria are of two basic forms:
Predicates that evaluate to true or false
Logical criteria that combine predicates (AND, OR, NOT)
Syntax Rules:
expression (=|<>|!=|<|>|<=|>=) (expression|((ANY|ALL|SOME) subquery))
expression [NOT] IS NULL
expression [NOT] IN (expression[,expression]*)|subquery
expression [NOT] LIKE expression [ESCAPE char]
EXISTS(subquery)
expression BETWEEN minExpression AND maxExpression
criteria AND|OR criteria
NOT criteria
Criteria may be nested using parenthesis.
Some examples of valid criteria are:
(balance > 2500.0)
100*(50 - x)/(25 - y) > z
concat(areaCode,concat('-',phone)) LIKE '314%1'
Null values represent an unknown value. Comparison with a null value will evaluate to 'unknown', which can never be true even if 'not' is used.
There are 4 basic commands for manipulating data in SQL, corresponding to the CRUD create, read, update, and delete operations: INSERT, SELECT, UPDATE, and DELETE. In addition, procedures can be executed using the EXECUTE command or through a procedural relational command.
The SELECT command is used to retrieve records any number of relations.
A SELECT command has a number of clauses:
All of these clauses other than OPTION are defined by the SQL specification. The specification also specifies the order that these clauses will be logically processed. Below is the processing order where each stage passes a set of rows to the following stage. Note that this processing model is logical and does not represent the way any actual database engine performs the processing, although it is a useful model for understanding questions about SQL.
FROM stage - gathers all rows from all tables involved in the query and logically joins them with a Cartesian product, producing a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.
WHERE stage - applies a criteria to every output row from the FROM stage, further reducing the number of rows.
GROUP BY stage - groups sets of rows with matching values in the group by columns.
HAVING stage - applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).
SELECT stage - specifies the column expressions that should be returned from the query. Expressions are evaluated, including aggregate functions based on the groups of rows, which will no longer exist after this point. The output columns are named using either column aliases or an implicit name determined by the engine. If SELECT DISTINCT is specified, duplicate removal will be performed on the rows being returned from the SELECT stage.
ORDER BY stage - sorts the rows returned from the SELECT stage as desired. Supports sorting on multiple columns in specified order, ascending or descending. The output columns will be identical to those columns returned from the SELECT stage and will have the same name.
LIMIT stage - returns only the specified rows (with skip and limit values).
This model can be used to understand many questions about SQL. For example, columns aliased in the SELECT clause can only be referenced by alias in the ORDER BY clause. Without knowledge of the processing model, this can be somewhat confusing. Seen in light of the model, it is clear that the ORDER BY stage is the only stage occurring after the SELECT stage, which is where the columns are named. Because the WHERE clause is processed before the SELECT, the columns have not yet been named and the aliases are not yet known.
The INSERT command is used to add a record to a table.
Example Syntax
INSERT INTO table (column,...) VALUES (value,...)
INSERT INTO table (column,...) query
The UPDATE command is used to modify records in a table. The operation may result in 1 or more records being updated, or in no records being updated if none match the criteria.
Example Syntax
UPDATE table SET (column=value,...) [WHERE criteria]
The DELETE command is used to remove records from a table. The operation may result in 1 or more records being deleted, or in no records being deleted if none match the criteria.
Example Syntax
DELETE FROM table [WHERE criteria]
The EXECUTE command is used to execute a procedure, such as a virtual procedure or a stored procedure. Procedures may have zero or more scalar input parameters. The return value from a procedure is a result set, the same as is returned from a SELECT. Note that EXEC can be used as a short form of this command.
Example Syntax
EXECUTE proc()
EXECUTE proc(value, ...)
EXECUTE proc(name1=value1,name4=param4, ...) - named parameter syntax
Syntax Rules:
The default order of parameter specification is the same as how they are defined in the procedure definition.
You can specify the parameters in any order by name. Parameters that are have default values and/or are nullable in the metadata, can be omitted from the named parameter call and will have the appropriate value passed at runtime.
If the procedure does not return a result set, the values from the RETURN, OUT, and IN_OUT parameters will be returned as a single row.
Procedural relational commands use the syntax of a SELECT to emulate an EXEC. In a procedural relational command a procedure group names is used in a FROM clause in place of a table. That procedure will be executed in place of a normal table access if all of the necessary input values can be found in criteria against the procedure. Each combination of input values found in the criteria results in an execution of the procedure.
Example Syntax
select * from proc
select output_param1, output_param2 from proc where input_param1 = 'x'
select output_param1, output_param2 from proc, table where input_param1 = table.col1 and input_param2 = table.col2
Syntax Rules:
The procedure as a table projects the same columns as an exec with the addition of the input parameters. For procedures that do not return a result set, IN_OUT columns will be projected as two columns, one that represents the output value and one named {column name}_IN that represents the input of the parameter.
Input values are passed via criteria. Values can be passed by '=','is null', or 'in' predicates.
The procedure virtual group automatically has an access pattern on its IN and IN_OUT parameters which allows it to be planned correctly as a dependent join when necessary or fail when sufficient criteria cannot be found.
Procedures containing duplicate names between the parameters (IN, IN_OUT, OUT, RETURN) and result set columns cannot be used in a procedural relational command.
Default values for IN, IN_OUT parameters are not used if there is no criteria present for a given input. Default values are only valid for named procedure syntax.
The usage of 'in' or join criteria can result in the procedure being executed multiple times.
Teiid supports creating temporary,or "temp", tables. Temp tables are dynamically created, but are treated as any other physical table.
Temp tables can be defined implicitly by referencing them in a SELECT INTO or in an INSERT statement or explicitly with a CREATE TABLE statement. Implicitly created temp tables must have a name that starts with '#'.
Creation syntax:
CREATE LOCAL TEMPORARY TABLE<temporary table name> (<column name> <data type>,...)
SELECT <element name>,...INTO <temporary table name> FROM <table name>
INSERT INTO <temporary table name> ((<column name>,...)VALUES (<value>,...)
Drop syntax:
DROP TABLE <temporary table name>
Limitations:
With the CREATE TABLE syntax only basic table definition (column name and type information) is supported.
The "ON COMMIT" clause is not supported in the CREATE TABLE statement.
"drop behavior" option is not supported in the drop statement.
Only local temporary tables are supported. This implies that the scope of temp table will be either to the sesssion or the block of a virtual procedure that creates it.
Session level temp tables are not fail-over safe.
temp tables are non-transactional.
Temp tables do not support update or delete operations.
The following example is a series of statements that loads a temporary table with data from 2 sources, and with a manually inserted record, and then uses that temp table in a subsequent query.
... CREATE LOCAL TEMPORARY TABLE TEMP (a integer, b integer, c integer); SELECT * INTO temp FROM Src1; SELECT * INTO temp FROM Src2; INSERT INTO temp VALUES (1,2,3); SELECT a,b,c FROM Src3, temp WHERE Src3.a = temp.b; ...
See virtual procedures for more on temp table usage.
This section describes the clauses that are used in the various SQL commands described in the previous section. Nearly all these features follow standard SQL syntax and functionality, so any SQL reference can be used for more information.
SQL queries start with the SELECT keyword and are often referred to as "SELECT statements". Teiid supports most of the standard SQL query constructs.
Usage:
SELECT [DISTINCT|ALL] ((expression [[AS] name])|(group identifier.STAR))*|STAR ...
Syntax Rules:
Aliased expressions are only used as the output column names and in the ORDER BY clause. They cannot be used in other clauses of the query.
DISTINCT may only be specified if the SELECT symbols are comparable.
The FROM clause specifies the target table(s) for SELECT, UPDATE, and DELETE statements.
Example Syntax:
FROM {table [AS alias]}
FROM table1 [INNER|LEFT OUTER|RIGHT OUTER|FULL OUTER] JOIN table1 ON join-criteria
FROM table1 CROSS JOIN table1
FROM (subquery) [AS alias]
FROM table1 JOIN table2 MAKEDEP ON join-criteria
FROM table1 JOIN table2 MAKENOTDEP ON join-criteria
FROM table1 left outer join /* optional */ table2 ON join-criteria
MAKEDEP and MAKENOTDEP are hints used to control dependent join behavior. They should only be used in situations where the optimizer does not chose the most optimal plan based upon query structure, metadata, and costing information.
The WHERE clause defines the criteria to limit the records affected by SELECT, UPDATE, and DELETE statements.
The general form of the WHERE is:
WHERE criteria
The GROUP BY clause denotes that rows should be grouped according to the specified expression values. One row will be returned for each group, after optionally filtering those aggregate rows based on a HAVING clause.
The general form of the GROUP BY is:
GROUP BY expression (,expression)*
Syntax Rules:
Column references in the group by clause must by to unaliased output columns.
Expressions used in the group by must appear in the select clause.
Column references and expessions in the select clause that are not used in the group by clause must appear in aggregate functions.
If an aggregate function is used in the SELECT clause and no GROUP BY is specified, an implicit GROUP BY will be performed with the entire result set as a single group. In this case, every column in the SELECT must be an aggregate function as no other column value will be fixed across the entire group.
The group by columns must be of a comparable type.
The HAVING clause operates exactly as a WHERE clause although it operates on the output of a GROUP BY. It supports the same syntax as the WHERE clause.
Syntax Rules:
Expressions used in the group by clause must either contain an aggregate function: COUNT, AVG, SUM, MIN, MAX. or be one of the grouping expressions.
The ORDER BY clause specifies how the returned records from a SELECT should be sorted. The options are ASC (ascending) and DESC (descending).
Usage:
ORDER BY column1 [ASC|DESC], ...
Syntax Rules:
Sort columns may be specified positionally by a 1 based integer or string literal, by SELECT clause alias name, or by a column reference.
Column references may appear in the SELECT clause as the expression for an aliased column or may reference columns from tables in the FROM clause. If the column reference is not in the SELECT clause the query must not be a set operation, specify SELECT DISTINCT, or contain a GROUP BY clause.
The ORDER BY columns must be of a comparable type.
If an ORDER BY is used in an inline view or view definition without a limit clause, it will be removed by the Teiid optimizer.
The use of positional ordering is no longer supported by the ANSI SQL standard and is a deprecated feature in Teiid. It is preferable to use alias names in the order by clause.
The LIMIT clause specifies a limit on the number of records returned from the SELECT command. An optional offset (the number of rows to skip) can be specified.
Usage:
LIMIT [offset,] limit
Examples:
LIMIT 100 - returns the first 100 records (rows 1-100)
LIMIT 500, 100 - skips 500 records and returns the next 100 records (rows 501-600)
Usage of the INTO Clause for inserting into a table has been been deprecated. An INSERT with a query command should be used instead.
When the into clause is specified with a SELECT, the results of the query are inserted into the specified table. This is often used to insert records into a temporary table. The INTO clause immediately precedes the FROM clause.
Usage:
INTO table FROM ...
Syntax Rules:
The INTO clause is logically applied last in processing, after the ORDER BY and LIMIT clauses.
Teiid's support for SELECT INTO is similar to MS SQL Server. The target of the INTO clause is a table where the result of the rest select command will be inserted. SELECT INTO should not be used UNION query.
The OPTION keyword denotes options the user can pass in with the command. These options are Teiid-specific and not covered by any SQL specification.
Usage:
OPTION option, (option)*
Supported options:
SHOWPLAN - returns the query plan along with the results
PLANONLY - returns the query plan, but does not execute the command
MAKEDEP [table, (table)*] - specifies source tables that should be made dependent in the join
MAKENOTDEP [table, (table)*] - prevents a dependent join from being used
DEBUG - prints query planner debug information in the log and returns it through the JDBC API
Examples:
LIMIT 100 - returns the first 100 records (rows 1-100)
LIMIT 500, 100 - skips 500 records and returns the next 100 records (rows 501-600)
Teiid supports the UNION, UNION ALL, INTERSECT, EXCEPT set operation as a way of combining the results of commands.
Usage:
command (UNION|INTERSECT|EXCEPT) [ALL] command [ORDER BY...]
Syntax Rules:
The output columns will be named by the output columns of the first set operation branch.
Each SELECT must have the same number of output columns and compatible data types for each relative column. Data type conversion will be performed if data types are inconsistent and implicit conversions exist.
If UNION, INTERSECT, or EXCEPT is specified without all, then the output columns must be comparable types.
INTERSECT ALL, and EXCEPT ALL are currently not supported.
A subquery is a SQL query embedded within another SQL query. The query containing the subquery is the outer query.
Supported subquery types:
Scalar subquery - a subquery that returns only a single column with a single value. Scalar subqueries are a type of expression and can be used where single valued expressions are expected.
Correlated subquery - a subquery that contains a column reference to from the outer query.
Uncorrelated subquery - a subquery that contains no references to the outer sub-query.
Supported subquery locations:
Subqueries may be used in any expression or CASE CRITERIA in the SELECT clasue.
Subqueries in the FROM clause of the outer query (also known as "inline views") can return any number of rows and columns. This type of subquery must always be given an alias.
Example 1.1. Example Subquery in FROM Clause (Inline View)
SELECT a FROM (SELECT Y.b, Y.c FROM Y WHERE Y.d = ‘3’) AS X WHERE a = X.c AND b = X.b
Subqueries supported in the criteria of the outer query include subqueries in an IN clause, subqueries using the ANY/SOME or ALL predicate quantifier, and subqueries using the EXISTS predicate.
Example 1.2. Example Subquery in WHERE Using EXISTS
SELECT a FROM X WHERE EXISTS (SELECT b, c FROM Y WHERE c=3)
The following usages of subqueries must each select only one column, but can return any number of rows.
Example 1.3. Example Subqueries in WHERE Clause
SELECT a FROM X WHERE a IN (SELECT b FROM Y WHERE c=3) SELECT a FROM X WHERE a >= ANY (SELECT b FROM Y WHERE c=3) SELECT a FROM X WHERE a < SOME (SELECT b FROM Y WHERE c=4) SELECT a FROM X WHERE a = ALL (SELECT b FROM Y WHERE c=2)