Join us on YugabyteDB Community Slack
Star us on
Get Started
Slack
GitHub
Get Started
v2.13 (latest) v2.12 (stable) v2.8 (earlier version) v2.6 (earlier version) v2.4 (earlier version) v2.2 (earlier version) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • Node.js
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore
      • SQL features
        • Schemas and Tables
        • Data Types
        • Data Manipulation
        • Queries and Joins
        • Expressions and Operators
        • Stored Procedures
        • Triggers
        • Advanced features
          • Cursors
          • Table Partitioning
          • Views
          • Savepoints
          • Collations
          • Extensions
        • Going beyond SQL
          • Follower reads
          • Tablespaces
      • Fault tolerance
      • Horizontal scalability
        • Scaling Transactions
        • Sharding Data
      • Transactions
        • Distributed Transactions
        • Isolation Levels
        • Explicit Locking
      • Indexes and Constraints
        • Overview
        • Unique Indexes
        • Partial Indexes
        • Expression Indexes
        • Generalized Inverted Indexes
        • Primary Key
        • Foreign Key
        • Other Constraints
      • JSON support
      • Multi-region deployments
        • Sync replication (3+ regions)
        • Async Replication (2+ regions)
        • Row-Level Geo-Partitioning
        • Read replicas
      • Query tuning
        • Introduction
        • Get query statistics using pg_stat_statements
        • Viewing live queries with pg_stat_activity
        • Analyzing queries with EXPLAIN
        • Optimizing YSQL queries using pg_hint_plan
      • Cluster management
        • Point-in-time recovery
      • Security
      • Observability
        • Prometheus Integration
        • Grafana Dashboard
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
      • Explore sample apps
      • Best practices
      • Cloud-native development
        • Codespaces
        • Gitpod
    • Migrate
      • Migration process overview
      • Migrate from PostgreSQL
        • Convert a PostgreSQL schema
        • Migrate a PostgreSQL application
        • Export PostgreSQL data
        • Prepare a cluster
        • Import PostgreSQL data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect Clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Asynchronous Replication
        • Read replica clusters
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
      • Performance Troubleshooting
    • Secure
      • Security checklist
      • Enable Authentication
        • Enable User Authentication
        • Configure ysql_hba_conf_csv
      • Authentication Methods
        • Password Authentication
        • LDAP Authentication
        • Host-Based Authentication
        • Trust Authentication
      • Role-Based Access Control
        • Overview
        • Manage Users and Roles
        • Grant Privileges
        • Row-Level Security (RLS)
        • Column-Level Security
      • Encryption in Transit
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to Clusters
        • TLS and authentication
      • Encryption at rest
      • Column-level encryption
      • Audit Logging
        • Configure Audit Logging
        • Session-Level Audit Logging
        • Object-Level Audit Logging
      • Vulnerability disclosure policy
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Point-in-time recovery
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
      • Grow cluster
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
        • Replace a failed YB-TServer
        • Replace a failed YB-Master
        • Manual remote bootstrap when a majority of peers fail
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
        • Common error messages
    • Contribute
      • Core database
        • Contribution checklist
        • Build the source
        • Configure a CLion project
        • Run the tests
        • Coding style
  • YUGABYTE PLATFORM
    • Overview
      • Install
      • Configure
    • Install Yugabyte Platform
      • Prerequisites
      • Prepare the environment
      • Install software
      • Prepare nodes (on-prem)
      • Uninstall software
    • Configure Yugabyte Platform
      • Create admin user
      • Configure the cloud provider
      • Configure the backup target
      • Configure alerts
    • Create deployments
      • Multi-zone universe
      • Multi-region universe
      • Multi-cloud universe
      • Read replica cluster
      • Asynchronous replication
    • Manage deployments
      • Start and stop processes
      • Add a node
      • Eliminate an unresponsive node
      • Enable high availability
      • Edit configuration flags
      • Edit a universe
      • Delete a universe
      • Configure instance tags
      • Upgrade YugabyteDB software
      • Migrate to Helm 3
    • Back up universes
      • Configure backup storage
      • Back up universe data
      • Restore universe data
      • Schedule data backups
    • Security
      • Security checklist
      • Customize ports
      • LDAP authentication
      • Authorization platform
      • Create a KMS configuration
      • Enable encryption at rest
      • Enable encryption in transit (TLS)
      • Network security
    • Alerts and monitoring
      • Alerts
      • Live Queries dashboard
      • Slow Queries dashboard
    • Troubleshoot
      • Install and upgrade issues
      • Universe issues
    • Administer Yugabyte Platform
      • Back Up Yugabyte Platform
      • Authenticate with LDAP
    • Upgrade Yugabyte Platform
      • Upgrade using Replicated
  • YUGABYTE CLOUD
    • Overview
    • Quick start
      • Create a free cluster
      • Connect to the cluster
      • Create a database
      • Explore distributed SQL
      • Build an application
        • Before you begin
        • Java
        • Go
        • Python
        • Node.js
        • C
        • C++
        • C#
        • Ruby
        • Rust
        • PHP
    • Deploy clusters
      • Planning a cluster
      • Create a free cluster
      • Create a standard cluster
      • VPC network
        • Overview
        • Set up a VPC network
        • VPCs
        • Peering Connections
    • Secure clusters
      • IP allow lists
      • Database authorization
      • Add database users
      • Encryption in transit
      • Audit cloud activity
    • Connect to clusters
      • Cloud Shell
      • Client shell
      • Connect applications
    • Alerts and monitoring
      • Alerts
      • Performance metrics
      • Live queries
      • Slow YSQL queries
      • Cluster activity
    • Manage clusters
      • Backup and restore
      • Scale and configure clusters
      • Create extensions
    • Administer Yugabyte Cloud
      • Manage cloud users
      • Manage billing
      • Cluster costs
    • Example applications
      • Connect a Spring application
      • Connect a YCQL Java application
      • Hasura Cloud
      • Deploy a GraphQL application
    • Security architecture
      • Security architecture
      • Shared responsibility model
    • Troubleshoot
    • Yugabyte Cloud FAQ
    • What's new
  • INTEGRATIONS
    • Apache Kafka
    • Apache Spark
    • JanusGraph
    • KairosDB
    • Presto
    • Metabase
    • WSO2 Identity Server
    • YSQL Loader
    • Yugabyte JDBC Driver
    • Prisma
    • Hasura
      • Application Development
      • Benchmarking
    • Spring Framework
      • Spring Data YugabyteDB
      • Spring Data Cassandra
    • Flyway
    • GORM
    • Liquibase
    • Sequelize
    • SQLAlchemy
    • Entity Framework
    • Django REST framework
  • REFERENCE
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Core functions
        • Universe creation
        • Table creation
        • Write IO path
        • Read IO path
        • High availability
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Read Committed
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • The SQL language
          • SQL statements
            • ABORT
            • ALTER DATABASE
            • ALTER DEFAULT PRIVILEGES
            • ALTER DOMAIN
            • ALTER GROUP
            • ALTER POLICY
            • ALTER ROLE
            • ALTER SEQUENCE
            • ALTER TABLE
            • ALTER USER
            • ANALYZE
            • BEGIN
            • CALL
            • COMMENT
            • COMMIT
            • COPY
            • CREATE AGGREGATE
            • CREATE CAST
            • CREATE DATABASE
            • CREATE DOMAIN
            • CREATE EXTENSION
            • CREATE FUNCTION
            • CREATE GROUP
            • CREATE INDEX
            • CREATE MATERIALIZED VIEW
            • CREATE OPERATOR
            • CREATE OPERATOR CLASS
            • CREATE POLICY
            • CREATE PROCEDURE
            • CREATE ROLE
            • CREATE RULE
            • CREATE SCHEMA
            • CREATE SEQUENCE
            • CREATE TABLE
            • CREATE TABLE AS
            • CREATE TRIGGER
            • CREATE TYPE
            • CREATE USER
            • CREATE VIEW
            • DEALLOCATE
            • DELETE
            • DO
            • DROP AGGREGATE
            • DROP CAST
            • DROP DATABASE
            • DROP DOMAIN
            • DROP EXTENSION
            • DROP FUNCTION
            • DROP GROUP
            • DROP MATERIALIZED VIEW
            • DROP OPERATOR
            • DROP OPERATOR CLASS
            • DROP OWNED
            • DROP POLICY
            • DROP PROCEDURE
            • DROP ROLE
            • DROP RULE
            • DROP SEQUENCE
            • DROP TABLE
            • DROP TRIGGER
            • DROP TYPE
            • DROP USER
            • END
            • EXECUTE
            • EXPLAIN
            • GRANT
            • INSERT
            • LOCK
            • PREPARE
            • REASSIGN OWNED
            • REFRESH MATERIALIZED VIEW
            • RELEASE SAVEPOINT
            • RESET
            • REVOKE
            • ROLLBACK
            • ROLLBACK TO SAVEPOINT
            • SAVEPOINT
            • SELECT
            • SET
            • SET CONSTRAINTS
            • SET ROLE
            • SET SESSION AUTHORIZATION
            • SET TRANSACTION
            • SHOW
            • SHOW TRANSACTION
            • TRUNCATE
            • UPDATE
            • VALUES
          • WITH clause
            • WITH clause—SQL syntax and semantics
            • recursive CTE
            • case study—traversing an employee hierarchy
            • traversing general graphs
              • graph representation
              • common code
              • undirected cyclic graph
              • directed cyclic graph
              • directed acyclic graph
              • rooted tree
              • Unique containing paths
              • Stress testing find_paths()
            • case study—Bacon Numbers from IMDb
              • Bacon numbers for synthetic data
              • Bacon numbers for IMDb data
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
            • Conceptual background
            • Timezones and UTC offsets
              • Catalog views
              • Extended_timezone_names
                • Unrestricted full projection
                • Real timezones with DST
                • Real timezones no DST
                • Synthetic timezones no DST
              • Offset/timezone-sensitive operations
                • Timestamptz to/from timestamp conversion
                • Pure 'day' interval arithmetic
              • Four ways to specify offset
                • Name-resolution rules
                  • 1 case-insensitive resolution
                  • 2 ~names.abbrev never searched
                  • 3 'set timezone' string not resolved in ~abbrevs.abbrev
                  • 4 ~abbrevs.abbrev before ~names.name
                  • Helper functions
              • Syntax contexts for offset
              • Recommended practice
            • Typecasting between date-time and text-values
            • Semantics of the date-time data types
              • Date data type
              • Time data type
              • Plain timestamp and timestamptz
              • Interval data type
                • Interval representation
                  • Ad hoc examples
                  • Representation model
                • Interval value limits
                • Declaring intervals
                • Justify() and extract(epoch...)
                • Interval arithmetic
                  • Interval-interval comparison
                  • Interval-interval addition and subtraction
                  • Interval-number multiplication
                  • Moment-moment overloads of "-"
                  • Moment-interval overloads of "+" and "-"
                • Custom interval domains
                • Interval utility functions
            • Typecasting between date-time datatypes
            • Operators
              • Test comparison overloads
              • Test addition overloads
              • Test subtraction overloads
              • Test multiplication overloads
              • Test division overloads
            • General-purpose functions
              • Creating date-time values
              • Manipulating date-time values
              • Current date-time moment
              • Delaying execution
              • Miscellaneous
                • Function age()
                • Function extract() | date_part()
                • Implementations that model the overlaps operator
            • Formatting functions
            • Case study—SQL stopwatch
            • Download & install the date-time utilities
            • ToC
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Range
          • Serial
          • UUID
        • Functions and operators
          • Aggregate functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • grouping sets, rollup, cube
            • Per function signature and purpose
              • avg(), count(), max(), min(), sum()
              • array_agg(), string_agg(), jsonb_agg(), jsonb_object_agg()
              • bit_and(), bit_or(), bool_and(), bool_or()
              • variance(), var_pop(), var_samp(), stddev(), stddev_pop(), stddev_samp()
              • linear regression
                • covar_pop(), covar_samp(), corr()
                • regr_%()
              • mode(), percentile_disc(), percentile_cont()
              • rank(), dense_rank(), percent_rank(), cume_dist()
            • case study—percentile_cont() and the "68–95–99.7" rule
            • case study—linear regression on COVID data
              • Download the COVIDcast data
              • Ingest the COVIDcast data
                • Inspect the COVIDcast data
                • Copy the .csv files to staging tables
                • Check staged data conforms to the rules
                • Join the staged data into a single table
                • SQL scripts
                  • Create cr_staging_tables()
                  • Create cr_copy_from_scripts()
                  • Create assert_assumptions_ok()
                  • Create xform_to_covidcast_fb_survey_results()
                  • ingest-the-data.sql
              • Analyze the COVIDcast data
                • symptoms vs mask-wearing by day
                • Data for scatter-plot for 21-Oct-2020
                • Scatter-plot for 21-Oct-2020
                • SQL scripts
                  • analysis-queries.sql
                  • synthetic-data.sql
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • case study—analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
          • yb_hash_code()
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL
      • Client drivers for YCQL
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • Arctype
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
      • Retail Analytics
  • RELEASES
    • Releases overview
      • v2.13 series (latest)
      • v2.12 series (stable)
      • v2.11 series
      • v2.9 series
      • v2.8 series
      • v2.7 series
      • v2.6 series
      • v2.5 series
      • v2.4 series
      • v2.3 series
      • v2.2 series
      • v2.1 series
      • v2.0 series
      • v1.3 series
      • v1.2 series
    • Release versioning
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • General FAQ
    • Operations FAQ
    • API compatibility FAQ
    • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> Explore > Indexes and Constraints >

Generalized inverted indexes

Report a doc issue Suggest new content
  • Compatible types
  • Grammar
    • CREATE INDEX
    • DELETE, INSERT, UPDATE
    • SELECT
  • Changes from PostgreSQL
  • Examples
    • Timing
    • Using opclass jsonb_path_ops
    • Presplitting
  • Roadmap
  • Limitations
    • Unsupported queries
  • YSQL

In YugabyteDB, tables and secondary indexes are both key-value stores internally. Loosely speaking:

  • A table's internal key-value store maps primary keys to the remaining columns.
  • A secondary index's internal key-value store maps index keys to primary keys.

Regular indexes index columns. This makes queries with conditions on the columns more efficient. For example, if you had a regular index on a single int array column (currently not possible in YSQL), queries like WHERE myintarray = '{1,3,6}' would be more efficient when using the index. However, queries like WHERE myintarray @> '{3}' (meaning "is 3 an element of myintarray?") would not benefit from the regular index.

Generalized inverted indexes (GIN indexes) index elements inside container columns. This makes queries with conditions on elements inside the columns more efficient. The above example would benefit from a GIN index since we can look up the key 3 in the gin index.

Compatible types

GIN indexes can only be created over a few types:

  • A GIN index on a tsvector column indexes text elements
  • A GIN index on an array column indexes array elements
  • A GIN index on a jsonb column indexes keys/values

You can use extensions to support more types. However, extension support is still in progress:

  • btree_gin: unsupported
  • hstore: supported
  • intarray: unsupported
  • pg_trgm: supported

Grammar

CREATE INDEX

Create the index using USING ybgin to specify the index access method:

CREATE INDEX ON mytable USING ybgin (mycol);

The gin access method is reserved for temporary relations while ybgin is for Yugabyte-backed relations. You can still specify USING gin, and, if mytable is not a temporary table, it will be automatically substituted for ybgin.

GIN indexes can't be unique, so CREATE UNIQUE INDEX is not allowed.

DELETE, INSERT, UPDATE

Writes are fully supported.

SELECT

Only certain SELECTs use the GIN index.

Changes from PostgreSQL

YugabyteDB GIN indexes are somewhat different from PostgreSQL GIN indexes:

  • PostgreSQL uses bitmap index scan, while YugabyteDB uses index scan.
  • In YugabyteDB, deletes to the index are written explicitly. This is due to storage-layer architecture differences and is also true for regular indexes.
  • YugabyteDB doesn't support fast update, as it isn't practical for a distributed, log-structured database.
  • YugabyteDB doesn't yet support fuzzy search limit.

Examples

  1. To begin, set up a YugabyteDB cluster. For instance, using yugabyted,

    ./bin/yugabyted start --base_dir /tmp/gindemo/1 --listen 127.0.0.201
    ./bin/yugabyted start --base_dir /tmp/gindemo/2 --listen 127.0.0.202 \
      --join 127.0.0.201
    ./bin/yugabyted start --base_dir /tmp/gindemo/3 --listen 127.0.0.203 \
      --join 127.0.0.201
    ./bin/ysqlsh --host 127.0.0.201
    
  2. Set up the tables, indexes, and data.

    CREATE TABLE vectors (v tsvector, k serial PRIMARY KEY);
    CREATE TABLE arrays (a int[], k serial PRIMARY KEY);
    CREATE TABLE jsonbs (j jsonb, k serial PRIMARY KEY);
    
    -- Use NONCONCURRENTLY since there is no risk of online ops.
    CREATE INDEX NONCONCURRENTLY ON vectors USING ybgin (v);
    CREATE INDEX NONCONCURRENTLY ON arrays USING ybgin (a);
    CREATE INDEX NONCONCURRENTLY ON jsonbs USING ybgin (j);
    
    INSERT INTO vectors VALUES
        (to_tsvector('simple', 'the quick brown fox')),
        (to_tsvector('simple', 'jumps over the')),
        (to_tsvector('simple', 'lazy dog'));
    -- Add some filler rows to make sequential scan more costly.
    INSERT INTO vectors SELECT to_tsvector('simple', 'filler') FROM generate_series(1, 1000);
    
    INSERT INTO arrays VALUES
        ('{1,1,6}'),
        ('{1,6,1}'),
        ('{2,3,6}'),
        ('{2,5,8}'),
        ('{null}'),
        ('{}'),
        (null);
    INSERT INTO arrays SELECT '{0}' FROM generate_series(1, 1000);
    
    INSERT INTO jsonbs VALUES
        ('{"a":{"number":5}}'),
        ('{"some":"body"}'),
        ('{"some":"one"}'),
        ('{"some":"thing"}'),
        ('{"some":["where","how"]}'),
        ('{"some":{"nested":"jsonb"}, "and":["another","element","not","a","number"]}');
    INSERT INTO jsonbs SELECT '"filler"' FROM generate_series(1, 1000);
    

Timing

Here are some examples to show the speed improvement of queries using GIN index. GIN indexes currently support IndexScan only, not IndexOnlyScan. The difference is that IndexScan uses the results of a scan to the index for filtering on the indexed table whereas an IndexOnlyScan need not go to the indexed table since the results from the index are sufficient. Therefore, a GIN index scan can be more costly than a sequential scan straight to the main table if the index scan does not filter out many rows. Since cost estimates currently aren't very accurate, the more costly index scan may be chosen in some cases.

The assumption in the following examples is that the user is using the GIN index in ways that take advantage of it.

  1. First, enable timing for future queries.

    \timing on
    
  2. Test GIN index on tsvector:

    SET enable_indexscan = off;
    SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'the');
    -- Run it once more to reduce cache bias.
    SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'the');
    
                      v                  | k
    -------------------------------------+---
    'brown':3 'fox':4 'quick':2 'the':1 | 1
    'jumps':1 'over':2 'the':3          | 2
    (2 rows)
    
    Time: 11.141 ms
    
    SET enable_indexscan = on;
    SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'the');
    SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'the');
    
    ...
    Time: 2.838 ms
    


    Notice the over 3x timing improvement when using GIN index. This is on a relatively small table: a little over 1000 rows. With more and/or bigger rows, the timing improvement should get better.

  3. Next, try on an int array:

    SET enable_indexscan = off;
    SELECT * FROM arrays WHERE a @> '{6}';
    SELECT * FROM arrays WHERE a @> '{6}';
    
        a    | k
    ---------+---
    {1,1,6} | 1
    {1,6,1} | 2
    {2,3,6} | 3
    (3 rows)
    
    Time: 9.501 ms
    
    SET enable_indexscan = on;
    SELECT * FROM arrays WHERE a @> '{6}';
    SELECT * FROM arrays WHERE a @> '{6}';
    
    ...
    Time: 2.989 ms
    
  4. Finally, try with jsonb:

    SET enable_indexscan = off;
    SELECT * FROM jsonbs WHERE j ? 'some';
    SELECT * FROM jsonbs WHERE j ? 'some';
    
                                            j                                          | k
    ------------------------------------------------------------------------------------+---
    {"some": ["where", "how"]}                                                         | 5
    {"and": ["another", "element", "not", "a", "number"], "some": {"nested": "jsonb"}} | 6
    {"some": "thing"}                                                                  | 4
    {"some": "body"}                                                                   | 2
    {"some": "one"}                                                                    | 3
    (5 rows)
    
    Time: 13.451 ms
    
    SET enable_indexscan = on;
    SELECT * FROM jsonbs WHERE j ? 'some';
    SELECT * FROM jsonbs WHERE j ? 'some';
    
    ...
    Time: 2.115 ms
    

Using opclass jsonb_path_ops

By default, jsonb GIN indexes use the opclass jsonb_ops. There is another opclass jsonb_path_ops that can be used instead.

The difference is the way they extract elements out of a jsonb. jsonb_ops extracts keys and values and encodes them as <flag_byte><value>. For example, '{"abc":[123,true]}' maps to three GIN keys: \001abc, \004123, \003t. The flag bytes here indicate the types key, numeric, and boolean, respectively.

On the other hand, jsonb_path_ops extracts hashed paths. Using the above example, there are two paths: "abc" -> 123 and "abc" -> true. Then, there are two GIN keys based on those paths using an internal hashing mechanism: -1570777299, -1227915239.

jsonb_path_ops is better suited for queries involving paths, such as the jsonb @> jsonb operator. However, it doesn't support as many operators as jsonb_ops. If write performance and storage aren't an issue, it may be worth creating a GIN index of each jsonb opclass so that reads can choose the faster one.

Presplitting

By default, ybgin indexes use a single range-partitioned tablet. Like regular tables and indexes, it is possible to presplit a ybgin index to multiple tablets at specified split points. These split points are for the index, so they need to be represented in the index key format. This is simple for tsvector and array types, but it gets complicated for jsonb and text (pg_trgm). jsonb_path_ops especially should use hash partitioning since the index key is itself a hash, but hash partitioning ybgin indexes is currently unsupported.

CREATE INDEX NONCONCURRENTLY vectors_split_idx ON vectors USING ybgin (v) SPLIT AT VALUES (('j'), ('o'));
CREATE INDEX NONCONCURRENTLY arrays_split_idx ON arrays USING ybgin (a) SPLIT AT VALUES ((2), (3), (5));
CREATE INDEX NONCONCURRENTLY jsonbs_split_idx1 ON jsonbs USING ybgin (j) SPLIT AT VALUES ((E'\001some'), (E'\005jsonb'));
CREATE INDEX NONCONCURRENTLY jsonbs_split_idx2 ON jsonbs USING ybgin (j jsonb_path_ops) SPLIT AT VALUES ((-1000000000), (0), (1000000000));

Let's focus on just one index for the remainder of this example: jsonbs_split_idx1.

First, check how the index is partitioned.

bin/yb-admin \
  --master_addresses 127.0.0.201,127.0.0.202,127.0.0.203 \
  list_tablets ysql.yugabyte jsonbs_split_idx1
Tablet-UUID                       Range                                                                               Leader-IP         Leader-UUID
43b2a0f0dac44018b60eebeee489e391  partition_key_start: "" partition_key_end: "S\001some\000\000!"                     127.0.0.201:9100  2702ace451fe46bd81dd2a19ea539163
c32e1066cefb449cb191ff23d626125f  partition_key_start: "S\001some\000\000!" partition_key_end: "S\005jsonb\000\000!"  127.0.0.203:9100  3a80acb8df5d45e38b388ffdc17a59e0
ba23b657eb5b4bc891ca794bcad06db7  partition_key_start: "S\005jsonb\000\000!" partition_key_end: ""                    127.0.0.202:9100  e24423119e734860bb0c3516df948b5c

Then, check the data in each partition. Flush it to SST files so that we can read them with sst_dump. Ignore lines with "filler" because there are too many of them. !! refers to the previous list_tablets command. Adjust it if it doesn't work in your shell.

bin/yb-admin \
  --master_addresses 127.0.0.201,127.0.0.202,127.0.0.203 \
  flush_table ysql.yugabyte jsonbs_split_idx1
while read -r tablet_id; do
  bin/sst_dump \
    --command=scan \
    --output_format=decoded_regulardb \
    --file=$(find /tmp/gindemo/1/data/yb-data/tserver/data \
               -name tablet-"$tablet_id") \
  | grep -v filler
done <<(!! \
        | tail -n +2 \
        | awk '{print$1}')
from [] to []
Process /tmp/gindemo/1/data/yb-data/tserver/data/rocksdb/table-000033c000003000800000000000401d/tablet-43b2a0f0dac44018b60eebeee489e391/000010.sst
Sst file format: block-based
SubDocKey(DocKey([], ["\x01a", EncodedSubDocKey(DocKey(0x1210, [1], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 73 }]) -> null; intent doc ht: HT{ physical: 1636678107937571 w: 73 }
SubDocKey(DocKey([], ["\x01a", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 315 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 142 }
SubDocKey(DocKey([], ["\x01and", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 316 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 143 }
SubDocKey(DocKey([], ["\x01another", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 317 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 144 }
SubDocKey(DocKey([], ["\x01element", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 318 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 145 }
SubDocKey(DocKey([], ["\x01how", EncodedSubDocKey(DocKey(0x0a73, [5], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 46 }]) -> null; intent doc ht: HT{ physical: 1636678107937571 w: 46 }
SubDocKey(DocKey([], ["\x01nested", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 319 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 146 }
SubDocKey(DocKey([], ["\x01not", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 320 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 147 }
SubDocKey(DocKey([], ["\x01number", EncodedSubDocKey(DocKey(0x1210, [1], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 74 }]) -> null; intent doc ht: HT{ physical: 1636678107937571 w: 74 }
SubDocKey(DocKey([], ["\x01number", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 321 }]) -> null; intent doc ht: HT{ physical: 1636678107947022 w: 148 }
from [] to []
Process /tmp/gindemo/1/data/yb-data/tserver/data/rocksdb/table-000033c000003000800000000000401d/tablet-c32e1066cefb449cb191ff23d626125f/000010.sst
Sst file format: block-based
SubDocKey(DocKey([], ["\x01some", EncodedSubDocKey(DocKey(0x0a73, [5], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 }]) -> null; intent doc ht: HT{ physical: 1636678107935594 }
SubDocKey(DocKey([], ["\x01some", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 3 }]) -> null; intent doc ht: HT{ physical: 1636678107944604 }
SubDocKey(DocKey([], ["\x01some", EncodedSubDocKey(DocKey(0x9eaf, [4], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 4 }]) -> null; intent doc ht: HT{ physical: 1636678107961642 }
SubDocKey(DocKey([], ["\x01some", EncodedSubDocKey(DocKey(0xc0c4, [2], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 5 }]) -> null; intent doc ht: HT{ physical: 1636678107973196 }
SubDocKey(DocKey([], ["\x01some", EncodedSubDocKey(DocKey(0xfca0, [3], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 7 }]) -> null; intent doc ht: HT{ physical: 1636678107973196 w: 2 }
SubDocKey(DocKey([], ["\x01where", EncodedSubDocKey(DocKey(0x0a73, [5], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 1 }]) -> null; intent doc ht: HT{ physical: 1636678107935594 w: 1 }
SubDocKey(DocKey([], ["\x045", EncodedSubDocKey(DocKey(0x1210, [1], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 2 }]) -> null; intent doc ht: HT{ physical: 1636678107935594 w: 2 }
SubDocKey(DocKey([], ["\x05body", EncodedSubDocKey(DocKey(0xc0c4, [2], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 6 }]) -> null; intent doc ht: HT{ physical: 1636678107973196 w: 1 }
from [] to []
Process /tmp/gindemo/1/data/yb-data/tserver/data/rocksdb/table-000033c000003000800000000000401d/tablet-ba23b657eb5b4bc891ca794bcad06db7/000010.sst
Sst file format: block-based
SubDocKey(DocKey([], ["\x05jsonb", EncodedSubDocKey(DocKey(0x4e58, [6], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 }]) -> null; intent doc ht: HT{ physical: 1636678107944677 }
SubDocKey(DocKey([], ["\x05one", EncodedSubDocKey(DocKey(0xfca0, [3], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 2 }]) -> null; intent doc ht: HT{ physical: 1636678107974363 }
SubDocKey(DocKey([], ["\x05thing", EncodedSubDocKey(DocKey(0x9eaf, [4], []), [])]), [SystemColumnId(0); HT{ physical: 1636678107997627 w: 1 }]) -> null; intent doc ht: HT{ physical: 1636678107961628 }

Roadmap

GIN indexes are still in progress: see the GIN roadmap tracking GitHub issue for more details.

Limitations

GIN indexes are in active development, and currently have the following limitations:

  • Multi-column GIN indexes are not currently supported. (#10652)
  • You can't yet specify ASC, DESC, or HASH sort methods. The default is ASC for prefix match purposes, so this can be relaxed in the future. (#10653)
  • UPDATEs may be expensive, since they're currently implemented as DELETE + INSERT.
  • SELECT operations have the following limitations:
    • Scans with non-default search mode aren't currently supported.
    • Scans can't ask for more than one index key.
      For example, a request for all rows whose array contains elements 1 or 3 will fail, but one that asks for elements 1 and 3 can succeed by choosing one of the elements for index scan and rechecking the entire condition later.
      However, the choice between 1 and 3 is currently unoptimized, so 3 may be chosen even though 1 corresponds to less rows.
    • Recheck is always done rather than on a case-by-case basis, meaning there can be an unnecessary performance penalty.

If a query is unsupported, you can disable index scan to avoid an error message (SET enable_indexscan TO off;) before the query, and re-enable it (SET enable_indexscan TO on;) afterwards. In the near future, cost estimates should route such queries to sequential scan.

Unsupported queries

Sometimes, an unsupported query may be encountered by getting an ERROR. Here are some workarounds for some of these cases.

More than one required scan entry

Perhaps the most common issue would be "cannot use more than one required scan entry". It means that the GIN index scan internally tries to fetch more than one index key. Since this is currently not supported, it throws an ERROR.

RESET enable_indexscan;

SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick | lazy');
ERROR:  unsupported ybgin index scan
DETAIL:  ybgin index method cannot use more than one required scan entry: got 2.
Time: 2.885 ms

One way to get around this is to use OR outside the tsquery:

SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick')
                      OR v @@ to_tsquery('simple', 'lazy');
                  v                  | k
-------------------------------------+---
 'brown':3 'fox':4 'quick':2 'the':1 | 1
 'dog':2 'lazy':1                    | 3
(2 rows)

Time: 10.169 ms

However, this doesn't use the index:

EXPLAIN
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick')
                      OR v @@ to_tsquery('simple', 'lazy');
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on vectors  (cost=0.00..105.00 rows=1000 width=36)
   Filter: ((v @@ '''quick'''::tsquery) OR (v @@ '''lazy'''::tsquery))
(2 rows)

Time: 1.050 ms

Another way that does use the index is UNION:

EXPLAIN
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick') UNION
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'lazy');
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Unique  (cost=163.76..178.76 rows=2000 width=36)
   ->  Sort  (cost=163.76..168.76 rows=2000 width=36)
         Sort Key: vectors.v, vectors.k
         ->  Append  (cost=4.00..54.10 rows=2000 width=36)
               ->  Index Scan using vectors_v_idx on vectors  (cost=4.00..12.05 rows=1000 width=36)
                     Index Cond: (v @@ '''quick'''::tsquery)
               ->  Index Scan using vectors_v_idx on vectors vectors_1  (cost=4.00..12.05 rows=1000 width=36)
                     Index Cond: (v @@ '''lazy'''::tsquery)
(8 rows)

Time: 1.143 ms
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick') UNION
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'lazy');
                  v                  | k
-------------------------------------+---
 'dog':2 'lazy':1                    | 3
 'brown':3 'fox':4 'quick':2 'the':1 | 1
(2 rows)

Time: 5.559 ms

If performance doesn't matter, the universal fix is to disable index scan so that sequential scan is used. For sequential scan to be chosen, make sure that sequential scan is not also disabled.

SET enable_indexscan = off;
SELECT * FROM vectors WHERE v @@ to_tsquery('simple', 'quick | lazy');
                  v                  | k
-------------------------------------+---
 'brown':3 'fox':4 'quick':2 'the':1 | 1
 'dog':2 'lazy':1                    | 3
(2 rows)

Time: 11.188 ms

Notice that the modified query using the index is still 2x faster than the original query to the main table.

Non-default search mode

All search modes/strategies besides the default one are currently unsupported. Many of these are best off using a sequential scan. In fact, the query planner avoids many of these types of index scans by increasing the cost, leading to sequential scan being chosen as the better alternative. Nevertheless, here are some cases that hit the ERROR.

RESET enable_indexscan;
\timing off

SELECT * FROM arrays WHERE a = '{}';
ERROR:  unsupported ybgin index scan
DETAIL:  ybgin index method does not support non-default search mode: include-empty.
SELECT * FROM arrays WHERE a <@ '{6,1,1,null}';
ERROR:  unsupported ybgin index scan
DETAIL:  ybgin index method does not support non-default search mode: include-empty.

There's currently no choice but to use a sequential scan on these.

  • Compatible types
  • Grammar
    • CREATE INDEX
    • DELETE, INSERT, UPDATE
    • SELECT
  • Changes from PostgreSQL
  • Examples
    • Timing
    • Using opclass jsonb_path_ops
    • Presplitting
  • Roadmap
  • Limitations
    • Unsupported queries
Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2022 Yugabyte, Inc. All rights reserved.