CREATE INDEX
Synopsis
Use the CREATE INDEX
statement to create a new index on a table. It defines the index name, index columns, and additional columns to include.
Syntax
Diagram
create_index
partition_key_columns
clustering_key_columns
clustering_key_column_ordering
index_column
jsonb_attribute
covering_columns
index_predicate
Grammar
create_index ::= CREATE INDEX [ IF NOT EXISTS ] index_name
ON table_name ( partition_key_columns [ clustering_key_columns ] )
[ clustering_key_column_ordering ] [ covering_columns ] [ 'WHERE' index_predicate ]
partition_key_columns ::= index_column | ( index_column [ , ... ] )
clustering_key_columns ::= index_column [ , ... ]
clustering_key_column_ordering ::= WITH CLUSTERING ORDER BY ( { index_column [ ASC | DESC ] } [ , ... ] )
index_column ::= column_name | jsonb_attribute
jsonb_attribute ::= column_name [ -> 'attribute_name' [ ... ] ] ->> 'attribute_name'
covering_columns ::= { COVERING | INCLUDE } ( column_name [ , ... ] )
index_predicate ::= where_expression
Where
index_name
,table_name
, andcolumn_name
are identifiers.table_name
may be qualified with a keyspace name.index_name
cannot be qualified with a keyspace name because an index must be created in the table's keyspace.
Semantics
- An error is raised if transactions have not be enabled using the
WITH transactions = { 'enabled' : true }
clause on the table to be indexed. This is because secondary indexes internally use distributed transactions to ensure ACID guarantees in the updates to the secondary index and the associated primary key. More details here. - An error is raised if
index_name
already exists in the associated keyspace unless theIF NOT EXISTS
option is used. - Indexes do not support TTL. An error is raised if data is inserted with TTL into a table with indexes.
Note
When an index is created on an existing table, YugabyteDB will automatically backfill existing data into the index in an online manner (that is, while continuing to serve other concurrent writes and traffic). For more details on how this is done, see Online Index Backfill.PARTITION KEY
- Partition key is required and defines a split of the index into partitions.
CLUSTERING KEY
- Clustering key is optional and defines an ordering for index rows within a partition.
- Default ordering is ascending (
ASC
) but can be set for each clustering column as ascending or descending using theCLUSTERING ORDER BY
property. - Any primary key column of the table not indexed explicitly in
index_columns
is added as a clustering column to the index implicitly. This is necessary so that the whole primary key of the table is indexed.
INCLUDED COLUMNS
- Included columns are optional table columns whose values are copied into the index in addition to storing them in the table. When additional columns are included in the index, they can be used to respond to queries directly from the index without querying the table.
UNIQUE INDEX
- A unique index disallows duplicate values from being inserted into the indexed columns. It can be used to ensure uniqueness of index column values.
PARTIAL INDEX
-
If a
WHERE
clause is specified, only rows which satisfy theindex_predicate
are indexed. -
An
index_predicate
can have sub-expressions on columns of these data types:TINYINT
,SMALLINT
,INT/INTEGER
,BIGINT
,VARINT
,BOOLEAN
andTEXT
along with these operators (when applicable):=, !=, >, <, >=, <=
. -
Partial indexes can be
UNIQUE
. A UNIQUE partial index enforces the constraint that for each possible tuple of indexed columns, only one row that satisfies theindex_predicate
is allowed in the table. -
SELECT
queries can use a partial index for scanning if theSELECT
statement'swhere_expression
=> (logically implies)index_predicate
.Note
-
A partial index might not be chosen even if the implication holds in case there are better query plans.
-
The logical implication holds if all sub-expressions of the
index_predicate
are present as is in thewhere_expression
. For example, assumewhere_expression = A AND B AND C
,index_predicate_1 = A AND B
,index_predicate_2 = A AND B AND D
,index_predicate_3 = A AND B AND C AND D
. Thenwhere_expression
only impliesindex_predicate_1
-
Currently, valid mathematical implications are not taken into account when checking for logical implication. For example, even if
where_expression = x > 5
andindex_predicate = x > 4
, theSELECT
query will not use the index for scanning. This is because the two sub-expressionsx > 5
andx > 4
differ.
-
-
When using a prepared statement, the logical implication check (to decide if a partial index is usable), will only consider those sub-expressions of
where_expression
that don't have a bind variable. This is because the query plan is decided before execution (i.e., when a statement is prepared).
ycqlsh:example> CREATE TABLE orders (customer_id INT,
order_date TIMESTAMP,
product JSONB,
warehouse_id INT,
amount DOUBLE,
PRIMARY KEY ((customer_id), order_date))
WITH transactions = { 'enabled' : true };
ycqlsh:example> CREATE INDEX idx ON orders (warehouse_id)
WHERE warehouse_id < 100;
ycqlsh:example> EXPLAIN SELECT product FROM orders
WHERE warehouse_id < 100 AND order_date >= ?; // Idx can be used
QUERY PLAN
------------------------------------------
Index Scan using temp.idx on temp.orders
Filter: (order_date >= :order_date)
ycqlsh:example> EXPLAIN SELECT product FROM orders
WHERE warehouse_id < ? and order_date >= ?; // Idx cannot be used
QUERY PLAN
--------------------------------------------------------------------------
Seq Scan on temp.orders
Filter: (warehouse_id < :warehouse_id) AND (order_date >= :order_date)
- Without partial indexes, we do not allow many combinations of operators together on the same column in a
SELECT
's where expression e.g.:WHERE v1 != NULL and v1 = 5
. But if there was a partial index that subsumes some clauses of theSELECT
's where expression, two or more operators otherwise not supported together, might be supported.
ycqlsh:example> EXPLAIN SELECT product FROM orders
WHERE warehouse_id != NULL AND warehouse_id = ?;
SyntaxException: Invalid CQL Statement. Illogical condition for where clause
EXPLAIN SELECT product from orders where warehouse_id != NULL and warehouse_id = ?;
^^^^^^^^^^^^
(ql error -12)
ycqlsh:example> CREATE INDEX warehouse_idx ON orders (warehouse_id)
WHERE warehouse_id != NULL;
ycqlsh:example> EXPLAIN SELECT product FROM orders
WHERE warehouse_id != NULL AND warehouse_id = ?; // warehouse_idx can be used
QUERY PLAN
----------------------------------------------------
Index Scan using temp.warehouse_idx on temp.orders
Key Conditions: (warehouse_id = :warehouse_id)
Examples
Create a table to be indexed
'customer_id' is the partitioning column and 'order_date' is the clustering column.
ycqlsh:example> CREATE TABLE orders (customer_id INT,
order_date TIMESTAMP,
product JSONB,
warehouse_id INT,
amount DOUBLE,
PRIMARY KEY ((customer_id), order_date))
WITH transactions = { 'enabled' : true };
Create an index for query by the order_date
column
ycqlsh:example> CREATE INDEX orders_by_date ON orders (order_date) INCLUDE (amount);
Create an index for query by the JSONB attribute product->>'name'
ycqlsh:example> CREATE INDEX product_name
ON orders (product->>'name') INCLUDE (amount);
Create an index for query by the warehouse_id
column
ycqlsh:example> CREATE INDEX orders_by_warehouse
ON orders (warehouse_id, order_date) INCLUDE (amount);
Insert some data
ycqlsh:example> INSERT INTO orders (customer_id, order_date, product, warehouse_id, amount)
VALUES (1001, '2018-01-10', '{ "name":"desk" }', 107, 100.30);
ycqlsh:example> INSERT INTO orders (customer_id, order_date, product, warehouse_id, amount)
VALUES (1002, '2018-01-11', '{ "name":"chair" }', 102, 50.45);
ycqlsh:example> INSERT INTO orders (customer_id, order_date, product, warehouse_id, amount)
VALUES (1001, '2018-04-09', '{ "name":"pen" }', 102, 20.25);
ycqlsh:example> INSERT INTO orders (customer_id, order_date, product, warehouse_id, amount)
VALUES (1003, '2018-04-09', '{ "name":"pencil" }', 108, 200.80);
Query by the partition column customer_id
in the table
ycqlsh:example> SELECT SUM(amount) FROM orders
WHERE customer_id = 1001 AND order_date >= '2018-01-01';
sum(amount)
-------------
120.55
Query by the partition column order_date
in the index orders_by_date
ycqlsh:example> SELECT SUM(amount) FROM orders
WHERE order_date = '2018-04-09';
sum(amount)
-------------
221.05
Query by the partition column product->>'name'
in the index product_name
ycqlsh:example> SELECT SUM(amount) FROM orders
WHERE product->>'name' = 'desk';
sum(amount)
-------------
100.30
Query by the partition column warehouse_id
column in the index orders_by_warehouse
ycqlsh:example> SELECT SUM(amount) FROM orders
WHERE warehouse_id = 102 AND order_date >= '2018-01-01';
sum(amount)
-------------
70.7
Create a table with a unique index
You can do this as shown below.
ycqlsh:example> CREATE TABLE emp (enum INT primary key,
lastname VARCHAR,
firstname VARCHAR,
userid VARCHAR)
WITH transactions = { 'enabled' : true };
ycqlsh:example> CREATE UNIQUE INDEX emp_by_userid ON emp (userid);
Insert values into the table and verify no duplicate userid
is inserted
ycqlsh:example> INSERT INTO emp (enum, lastname, firstname, userid)
VALUES (1001, 'Smith', 'John', 'jsmith');
ycqlsh:example> INSERT INTO emp (enum, lastname, firstname, userid)
VALUES (1002, 'Smith', 'Jason', 'jsmith');
InvalidRequest: Error from server: code=2200 [Invalid query] message="SQL error: Execution Error. Duplicate value disallowed by unique index emp_by_userid
INSERT INTO emp (enum, lastname, firstname, userid)
^^^^
VALUES (1002, 'Smith', 'Jason', 'jsmith');
(error -300)"
ycqlsh:example> INSERT INTO emp (enum, lastname, firstname, userid)
VALUES (1002, 'Smith', 'Jason', 'jasmith');
ycqlsh:example> SELECT * FROM emp;
enum | lastname | firstname | userid
------+----------+-----------+---------
1002 | Smith | Jason | jasmith
1001 | Smith | John | jsmith