Chapter 1. DML Support

Teiid supports SQL for issuing queries and for defining view transformations; see also Procedure Language for how SQL is used in virtual procedures and update procedures.

Teiid provides nearly all of the functionality of SQL-92 DML. SQL-99 and later features are constantly being added based upon community need. The following does not attempt to cover SQL exhaustively, but rather highlights SQL's usage within Teiid. See the grammar for the exact form of SQL accepted by Teiid.

1.1. Identifiers

SQL commands contain references to tables and columns. These references are in the form of identifiers, which uniquely identify the tables and columns in the context of the command. All queries are processed in the context of a virtual database, or VDB. Because information can be federated across multiple sources, tables and columns must be scoped in some manner to avoid conflicts. This scoping is provided by schemas, which contain the information for each data source or set of views.

Fully-qualified table and column names are of the following form, where the separate 'parts' of the identifier are delimited by periods.

TABLE: <schema_name>.<table_spec>
COLUMN: <schema_name>.<table_spec>.<column_name>

Syntax Rules:

Identifiers can consist of alphanumeric characters, or the underscore (_) character, and must begin with an alphabetic character. Any Unicode character may be used in an identifier.
Identifiers in double quotes can have any contents. The double quote character can it's be escaped with an additional double quote. e.g. "some "" id"
Because different data sources organize tables in different ways, some prepending catalog or schema or user information, Teiid allows table specification to be a dot-delimited construct.
Note
When a table specification contains a dot resolving will allow for the match of a partial name against any number of the end segments in the name. e.g. a table with the fully-qualified name vdbname."sourcescema.sourcetable" would match the partial name sourcetable.
Columns, schemas, and aliases identifiers cannot contain a dot.
Identifiers, even when quoted, are not case-sensitive in Teiid.

Some examples of valid fully-qualified table identifiers are:

MySchema.Portfolios
"MySchema.Portfolios"
MySchema.MyCatalog.dbo.Authors

Some examples of valid fully-qualified column identifiers are:

MySchema.Portfolios.portfolioID
"MySchema.Portfolios"."portfolioID"
MySchema.MyCatalog.dbo.Authors.lastName

Fully-qualified identifiers can always be used in SQL commands. Partially- or unqualified forms can also be used, as long as the resulting names are unambiguous in the context of the command. Different forms of qualification can be mixed in the same query.

1.1.1. Reserved Words

Teiid's reserved words include the standard SQL 2003 Foundation, SQL/MED, and SQL/XML reserved words, as well as Teiid specific words such as BIGINTEGER, BIGDECIMAL, or MAKEDEP. See the Appendix A, BNF for SQL Grammar TOKENS section for all reserved words. They will appear as 'SMALLINT: "smallint"' where the quoted string is the actual lexical form.

1.2. Expressions

Identifiers, literals, and functions can be combined into expressions. Expressions can be used almost anywhere in a query -- SELECT, FROM (if specifying join criteria), WHERE, GROUP BY, HAVING, or ORDER BY.

Teiid supports the following types of expressions:

1.2.1. Column Identifiers

Column identifiers are used to specify the output columns in SELECT statements, the columns and their values for INSERT and UPDATE statements, and criteria used in WHERE and FROM clauses. They are also used in GROUP BY, HAVING, and ORDER BY clauses. The syntax for column identifiers was defined in the Identifiers section above.

1.2.2. Literals

Literal values represent fixed values. These can any of the 'standard' data types.

Syntax Rules:

Integer values will be assigned an integral data type big enough to hold the value (integer, long, or biginteger).
Floating point values will always be parsed as a double.
The keyword 'null' is used to represent an absent or unknown value and is inherently untyped. In many cases, a null literal value will be assigned an implied type based on context. For example, in the function '5 + null', the null value will be assigned the type 'integer' to match the type of the value '5'. A null literal used in the SELECT clause of a query with no implied context will be assigned to type 'string'.

Some examples of simple literal values are:

‘abc’
‘isn’’t true’ - use an extra single tick to escape a tick in a string with single ticks
5
-37.75e01 - scientific notation
100.0 - parsed as double
true
false
'\u0027' - unicode character

1.2.3. Aggregate Functions

Aggregate functions take sets of values from a group produced by an explicit or implicit GROUP BY and return a single scalar value computed from the group.

Teiid supports the following aggregate functions:

COUNT(*) – count the number of values (including nulls and duplicates) in a group
COUNT(x) – count the number of values (excluding nulls) in a group
SUM(x) – sum of the values (excluding nulls) in a group
AVG(x) – average of the values (excluding nulls) in a group
MIN(x) – minimum value in a group (excluding null)
MAX(x) – maximum value in a group (excluding null)
ANY(x)/SOME(x) – returns TRUE if any value in the group is TRUE (excluding null)
EVERY(x) – returns TRUE if every value in the group is TRUE (excluding null)
VAR_POP(x) – biased variance (excluding null) logically equals (sum(x^2) - sum(x)^2/count(x))/count(x); returns a double; null if count = 0
VAR_SAMP(x) – sample variance (excluding null) logically equals (sum(x^2) - sum(x)^2/count(x))/(count(x) - 1); returns a double; null if count < 2
STDDEV_POP(x) – standard deviation (excluding null) logically equals SQRT(VAR_POP(x))
STDDEV_SAMP(x) – sample standar deviation (excluding null) logically equals SQRT(VAR_SAMP(x))
TEXTAGG(FOR (expression [as name], ... [DELIMITER char] [QUOTE char] [HEADER] [ENCODING id] [ORDER BY ...]) – CSV text aggregation of all expressions in each row of a group. When DELIMITER is not specified, by default comma(,) is used as delimiter. Double quotes(") is the default quote character. Use QUOTE to specify a differnt value. All non-null values will be quoted. If HEADER is specified, the result contains the header row as the first line - the header line will be present even if there are no rows in a group. This aggregation returns a blob.
XMLAGG(xml_expr [ORDER BY ...]) – xml concatination of all xml expressions in a group (excluding null). The ORDER BY clause cannot reference alias names or use positional ordering.

Syntax Rules:

Some aggregate functions may contain a keyword 'DISTINCT' before the expression, indicating that duplicate expression values should be ignored. DISTINCT is not allowed in COUNT(*) and is not meaningful in MIN or MAX (result would be unchanged), so it can be used in COUNT, SUM, and AVG.
Aggregate functions cannot be used in FROM, GROUP BY, or WHERE clauses without an intervening query expression.
Aggregate functions cannot be nested within another aggregate function without an intervening query expression.
Aggregate functions may be nested inside other functions.
Any aggregate function may take an optional FILTER clasue of the form
```
FILTER ( WHERE condition )
```
. The condition may be any boolean value expression that does not contain a subquery or a correlated variable. The filter will logically be evaluated for each row prior to the grouping operation. If false the aggregate function will not accumulate a value for the given row.

For more information on aggregates, see the sections on GROUP BY or HAVING.

1.2.4. Window functions

Teiid supports ANSI SQL 2003 window functions. A window function allows an aggregrate function to be applied to a subset of the result set, without the need for a GROUP BY clause. A window function is similar to an aggregate function, but requires the use of an OVER clause or window specification.

Usage:

aggregate|ranking OVER ([PARTION BY expression [, expression]*] [ORDER BY ...])

aggregate can be any Section 1.2.3, “Aggregate Functions”. Ranking can be one of ROW_NUMBER(), RANK(), DENSE_RANK().

Syntax Rules:

Window functions can only appear in the SELECT and ORDER BY clauses of a query expression.
Window functions cannot be nested in one another.
Partitioning and order by expressions cannot contain subqueries or outer references.
The ranking (ROW_NUMBER, RANK, DENSE_RANK) functions require the use of the window specification ORDER BY clause.
An XMLAGG ORDER BY clause cannot be used when windowed.
The window specification ORDER BY clause cannot reference alias names or use positional ordering.
Windowed aggregates may not use DISTINCT if the window specification is ordered.

1.2.4.1. Function Definitions

ROW_NUMBER() – functional the same as COUNT(*) with the same window specification. Assigns a number to each row in a partition starting at 1.
RANK() – Assigns a number to each unique ordering value within each partition starting at 1, such that the next rank is equal to the count of prior rows.
DENSE_RANK() – Assigns a number to each unique ordering value within each partition starting at 1, such that the next rank is sequential.

1.2.4.2. Processing

Window functions are logically processed just before creating the output from the SELECT clause. Window functions can use nested aggregates if a GROUP BY clause is present. The is no guarenteed affect on the output ordering from the presense of window functions. The SELECT statement must have an ORDER BY clause to have a predictable ordering.

Teiid will process all window functions with the same window specification together. In general a full pass over the row values coming into the SELECT clause will be required for each unique window specification. For each window specification the values will be grouped according to the PARTITION BY clause. If no PARTITION BY clause is specified, then the entire input is treated as a single partition. The output value is determined based upon the current row value, it's peers (that is rows that are the same with respect to their ordering), and all prior row values based upon ordering in the partition. The ROW_NUMBER function will assign a unique value to every row regardless of the number of peers.

Example 1.1. Example Windowed Results

SELECT name, salary, max(salary) over (partition by name) as max_sal, 
          rank() over (order by salary) as rank, dense_rank() over (order by salary) as dense_rank, 
          row_number() over (order by salary) as row_num FROM employees

name	salary	max_sal	rank	dense_rank	row_num
John	100000	100000	2	2	2
Henry	50000	100000	5	4	5
John	60000	60000	3	3	3
Suzie	60000	150000	3	3	4
Suzie	150000	150000	1	1	1

1.3. Criteria

Criteria may be:

Predicates that evaluate to true or false
Logical criteria that combines criteria (AND, OR, NOT)
A value expression with type boolean

Usage:

```
criteria AND|OR criteria
```
```
NOT criteria
```
```
(criteria)
```

expression (=|<>|!=|<|>|<=|>=) (expression|((ANY|ALL|SOME) subquery))

```
expression [NOT] IS NULL
```

expression [NOT] IN (expression[,expression]*)|subquery

```
expression [NOT] LIKE pattern [ESCAPE char]
```
Matches the string expression against the given string pattern. The pattern may contain % to match any number of characters and _ to match any single character. The escape character can be used to escape the match characters % and _.
```
expression [NOT] SIMILAR TO pattern [ESCAPE char]
```
SIMILAR TO is a cross between LIKE and standard regular expression syntax. % and _ are still used, rather than .* and . respectively.
Note
Teiid does not exhaustively validate SIMILAR TO pattern values. Rather the pattern is converted to an equivalent regular expression. Care should be taken not to rely on general regular expression features when using SIMILAR TO. If additional features are needed, then LIKE_REGEX should be used. Usage of a non-literal pattern is discouraged as pushdown support is limited.
```
expression [NOT] LIKE_REGEX pattern
```
LIKE_REGEX allows for standard regular expression syntax to be used for matching. This differs from SIMILAR TO and LIKE in that the escape character is no longer used (\ is already the standard escape mechansim in regular expressions and % and _ have no special meaning. The runtime engine uses the JRE implementation of regular expressions - see the java.util.regex.Pattern class for details.
Note
Teiid does not exhaustively validate LIKE_REGEX pattern values. It is possible to use JRE only regular expression features that are not specified by the SQL specification. Additional not all sources support the same regular expression flavor or extensions. Care should be taken in pushdown situations to ensure that the pattern used will have same meaning in Teiid and across all applicable sources.
```
EXISTS(subquery)
```
```
expression [NOT] BETWEEN minExpression AND maxExpression
```
Teiid converts BETWEEN into the equivalent form expression >= minExpression AND expression <= maxExpression
```
expression
```
Where expression has type boolean.

Syntax Rules:

The precedence ordering from lowest to highest is comparison, NOT, AND, OR
Criteria nested by parenthesis will be logically evaluated prior to evaluating the parent criteria.

Some examples of valid criteria are:

(balance > 2500.0)
100*(50 - x)/(25 - y) > z
concat(areaCode,concat('-',phone)) LIKE '314%1'

Comparing null Values

Null values represent an unknown value. Comparison with a null value will evaluate to 'unknown', which can never be true even if 'not' is used.

1.4. SQL Commands

There are 4 basic commands for manipulating data in SQL, corresponding to the CRUD create, read, update, and delete operations: INSERT, SELECT, UPDATE, and DELETE. In addition, procedures can be executed using the EXECUTE command or through a procedural relational command.

1.4.1. SELECT Command

The SELECT command is used to retrieve records any number of relations.

A SELECT command has a number of clauses:

All of these clauses other than OPTION are defined by the SQL specification. The specification also specifies the order that these clauses will be logically processed. Below is the processing order where each stage passes a set of rows to the following stage. Note that this processing model is logical and does not represent the way any actual database engine performs the processing, although it is a useful model for understanding questions about SQL.

WITH stage - gathers all rows from all with items in the order listed. Subsequent with items and the main query can reference the a with item as if it is a table.
FROM stage - gathers all rows from all tables involved in the query and logically joins them with a Cartesian product, producing a single large table with all columns from all tables. Joins and join criteria are then applied to filter rows that do not match the join structure.
WHERE stage - applies a criteria to every output row from the FROM stage, further reducing the number of rows.
GROUP BY stage - groups sets of rows with matching values in the group by columns.
HAVING stage - applies criteria to each group of rows. Criteria can only be applied to columns that will have constant values within a group (those in the grouping columns or aggregate functions applied across the group).
SELECT stage - specifies the column expressions that should be returned from the query. Expressions are evaluated, including aggregate functions based on the groups of rows, which will no longer exist after this point. The output columns are named using either column aliases or an implicit name determined by the engine. If SELECT DISTINCT is specified, duplicate removal will be performed on the rows being returned from the SELECT stage.
ORDER BY stage - sorts the rows returned from the SELECT stage as desired. Supports sorting on multiple columns in specified order, ascending or descending. The output columns will be identical to those columns returned from the SELECT stage and will have the same name.
LIMIT stage - returns only the specified rows (with skip and limit values).

This model can be used to understand many questions about SQL. For example, columns aliased in the SELECT clause can only be referenced by alias in the ORDER BY clause. Without knowledge of the processing model, this can be somewhat confusing. Seen in light of the model, it is clear that the ORDER BY stage is the only stage occurring after the SELECT stage, which is where the columns are named. Because the WHERE clause is processed before the SELECT, the columns have not yet been named and the aliases are not yet known.

Note

The explicit table syntax TABLE x may be used as a shortcut for SELECT * FROM x.

1.4.2. INSERT Command

The INSERT command is used to add a record to a table.

Example Syntax

INSERT INTO table (column,...) VALUES (value,...)
INSERT INTO table (column,...) query

1.4.3. UPDATE Command

The UPDATE command is used to modify records in a table. The operation may result in 1 or more records being updated, or in no records being updated if none match the criteria.

Example Syntax

UPDATE table SET (column=value,...) [WHERE criteria]

1.4.4. DELETE Command

The DELETE command is used to remove records from a table. The operation may result in 1 or more records being deleted, or in no records being deleted if none match the criteria.

Example Syntax

DELETE FROM table [WHERE criteria]

1.4.5. EXECUTE Command

The EXECUTE command is used to execute a procedure, such as a virtual procedure or a stored procedure. Procedures may have zero or more scalar input parameters. The return value from a procedure is a result set, the same as is returned from a SELECT. Note that EXEC or CALL can be used as a short form of this command.

Example Syntax

EXECUTE proc()
EXECUTE proc(value, ...)
EXECUTE proc(name1=>value1,name4=>param4, ...) - named parameter syntax

Syntax Rules:

The default order of parameter specification is the same as how they are defined in the procedure definition.
You can specify the parameters in any order by name. Parameters that are have default values and/or are nullable in the metadata, can be omitted from the named parameter call and will have the appropriate value passed at runtime.
If the procedure does not return a result set, the values from the RETURN, OUT, and IN_OUT parameters will be returned as a single row when used as an inline view query.

1.4.6. Procedural Relational Command

Procedural relational commands use the syntax of a SELECT to emulate an EXEC. In a procedural relational command a procedure group names is used in a FROM clause in place of a table. That procedure will be executed in place of a normal table access if all of the necessary input values can be found in criteria against the procedure. Each combination of input values found in the criteria results in an execution of the procedure.

Example Syntax

select * from proc
select output_param1, output_param2 from proc where input_param1 = 'x'
select output_param1, output_param2 from proc, table where input_param1 = table.col1 and input_param2 = table.col2

Syntax Rules:

The procedure as a table projects the same columns as an exec with the addition of the input parameters. For procedures that do not return a result set, IN_OUT columns will be projected as two columns, one that represents the output value and one named {column name}_IN that represents the input of the parameter.
Input values are passed via criteria. Values can be passed by '=','is null', or 'in' predicates. Disjuncts are not allowed. It is also not possible to pass the value of a non-comparable column through an equality predicate.
The procedure view automatically has an access pattern on its IN and IN_OUT parameters which allows it to be planned correctly as a dependent join when necessary or fail when sufficient criteria cannot be found.
Procedures containing duplicate names between the parameters (IN, IN_OUT, OUT, RETURN) and result set columns cannot be used in a procedural relational command.
Default values for IN, IN_OUT parameters are not used if there is no criteria present for a given input. Default values are only valid for named procedure syntax.

Multiple Execution

The usage of 'in' or join criteria can result in the procedure being executed multiple times.

Alternative Syntax

None of issues listed in the syntax rules above exist if a nested table reference is used.

1.5. Set Operations

Teiid supports the UNION, UNION ALL, INTERSECT, EXCEPT set operation as a way of combining the results of query expressions.

Usage:

queryExpression (UNION|INTERSECT|EXCEPT) [ALL] queryExpression [ORDER BY...]

Syntax Rules:

The output columns will be named by the output columns of the first set operation branch.
Each SELECT must have the same number of output columns and compatible data types for each relative column. Data type conversion will be performed if data types are inconsistent and implicit conversions exist.
If UNION, INTERSECT, or EXCEPT is specified without all, then the output columns must be comparable types.
INTERSECT ALL, and EXCEPT ALL are currently not supported.

1.6. Subqueries

A subquery is a SQL query embedded within another SQL query. The query containing the subquery is the outer query.

Supported subquery types:

Scalar subquery - a subquery that returns only a single column with a single value. Scalar subqueries are a type of expression and can be used where single valued expressions are expected.
Correlated subquery - a subquery that contains a column reference to from the outer query.
Uncorrelated subquery - a subquery that contains no references to the outer sub-query.

1.6.1. Inline views

Subqueries in the FROM clause of the outer query (also known as "inline views") can return any number of rows and columns. This type of subquery must always be given an alias. An inline view is nearly identical to a traditional view. See also Section 2.1, “WITH Clause”.

Example 1.2. Example Subquery in FROM Clause (Inline View)

SELECT a FROM (SELECT Y.b, Y.c FROM Y WHERE Y.d = ‘3’) AS X WHERE a = X.c AND b = X.b

1.6.2. Subqueries can appear anywhere where an expression or criteria is expected.

Subqueries are supported in quantified criteria, the EXISTS predicate, the IN predicate, and as Section 1.2.6, “Scalar subqueries”.

Example 1.3. Example Subquery in WHERE Using EXISTS

SELECT a FROM X WHERE EXISTS (SELECT 1 FROM Y WHERE c=X.a)

Example 1.4. Example Quantified Comparison Subqueries

SELECT a FROM X WHERE a >= ANY (SELECT b FROM Y WHERE c=3) 
SELECT a FROM X WHERE a < SOME (SELECT b FROM Y WHERE c=4) 
SELECT a FROM X WHERE a = ALL (SELECT b FROM Y WHERE c=2)

Example 1.5. Example IN Subquery

SELECT a FROM X WHERE a IN (SELECT b FROM Y WHERE c=3)