Join us on YugabyteDB Community Slack
Star us on
Get Started
Slack
GitHub
Get Started
v2.13 (latest) v2.12 (stable) v2.8 (earlier version) v2.6 (earlier version) v2.4 (earlier version) v2.2 (earlier version) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • Node.js
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore
      • SQL features
        • Schemas and Tables
        • Data Types
        • Data Manipulation
        • Queries and Joins
        • Expressions and Operators
        • Stored Procedures
        • Triggers
        • Advanced features
          • Cursors
          • Table Partitioning
          • Views
          • Savepoints
          • Collations
          • Extensions
        • Going beyond SQL
          • Follower reads
          • Tablespaces
      • Fault tolerance
      • Horizontal scalability
        • Scaling Transactions
        • Sharding Data
      • Transactions
        • Distributed Transactions
        • Isolation Levels
        • Explicit Locking
      • Indexes and Constraints
        • Overview
        • Unique Indexes
        • Partial Indexes
        • Expression Indexes
        • Generalized Inverted Indexes
        • Primary Key
        • Foreign Key
        • Other Constraints
      • JSON support
      • Multi-region deployments
        • Sync replication (3+ regions)
        • Async Replication (2+ regions)
        • Row-Level Geo-Partitioning
        • Read replicas
      • Query tuning
        • Introduction
        • Get query statistics using pg_stat_statements
        • Viewing live queries with pg_stat_activity
        • Analyzing queries with EXPLAIN
        • Optimizing YSQL queries using pg_hint_plan
      • Cluster management
        • Point-in-time recovery
      • Security
      • Observability
        • Prometheus Integration
        • Grafana Dashboard
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
      • Explore sample apps
      • Best practices
      • Cloud-native development
        • Codespaces
        • Gitpod
    • Migrate
      • Migration process overview
      • Migrate from PostgreSQL
        • Convert a PostgreSQL schema
        • Migrate a PostgreSQL application
        • Export PostgreSQL data
        • Prepare a cluster
        • Import PostgreSQL data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect Clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Asynchronous Replication
        • Read replica clusters
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
      • Performance Troubleshooting
    • Secure
      • Security checklist
      • Enable Authentication
        • Enable User Authentication
        • Configure ysql_hba_conf_csv
      • Authentication Methods
        • Password Authentication
        • LDAP Authentication
        • Host-Based Authentication
        • Trust Authentication
      • Role-Based Access Control
        • Overview
        • Manage Users and Roles
        • Grant Privileges
        • Row-Level Security (RLS)
        • Column-Level Security
      • Encryption in Transit
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to Clusters
        • TLS and authentication
      • Encryption at rest
      • Column-level encryption
      • Audit Logging
        • Configure Audit Logging
        • Session-Level Audit Logging
        • Object-Level Audit Logging
      • Vulnerability disclosure policy
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Point-in-time recovery
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
      • Grow cluster
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
        • Replace a failed YB-TServer
        • Replace a failed YB-Master
        • Manual remote bootstrap when a majority of peers fail
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
        • Common error messages
    • Contribute
      • Core database
        • Contribution checklist
        • Build the source
        • Configure a CLion project
        • Run the tests
        • Coding style
  • YUGABYTE PLATFORM
    • Overview
      • Install
      • Configure
    • Install Yugabyte Platform
      • Prerequisites
      • Prepare the environment
      • Install software
      • Prepare nodes (on-prem)
      • Uninstall software
    • Configure Yugabyte Platform
      • Create admin user
      • Configure the cloud provider
      • Configure the backup target
      • Configure alerts
    • Create deployments
      • Multi-zone universe
      • Multi-region universe
      • Multi-cloud universe
      • Read replica cluster
      • Asynchronous replication
    • Manage deployments
      • Start and stop processes
      • Add a node
      • Eliminate an unresponsive node
      • Enable high availability
      • Edit configuration flags
      • Edit a universe
      • Delete a universe
      • Configure instance tags
      • Upgrade YugabyteDB software
      • Migrate to Helm 3
    • Back up universes
      • Configure backup storage
      • Back up universe data
      • Restore universe data
      • Schedule data backups
    • Security
      • Security checklist
      • Customize ports
      • LDAP authentication
      • Authorization platform
      • Create a KMS configuration
      • Enable encryption at rest
      • Enable encryption in transit (TLS)
      • Network security
    • Alerts and monitoring
      • Alerts
      • Live Queries dashboard
      • Slow Queries dashboard
    • Troubleshoot
      • Install and upgrade issues
      • Universe issues
    • Administer Yugabyte Platform
      • Back Up Yugabyte Platform
      • Authenticate with LDAP
    • Upgrade Yugabyte Platform
      • Upgrade using Replicated
  • YUGABYTE CLOUD
    • Overview
    • Quick start
      • Create a free cluster
      • Connect to the cluster
      • Create a database
      • Explore distributed SQL
      • Build an application
        • Before you begin
        • Java
        • Go
        • Python
        • Node.js
        • C
        • C++
        • C#
        • Ruby
        • Rust
        • PHP
    • Deploy clusters
      • Planning a cluster
      • Create a free cluster
      • Create a standard cluster
      • VPC network
        • Overview
        • Set up a VPC network
        • VPCs
        • Peering Connections
    • Secure clusters
      • IP allow lists
      • Database authorization
      • Add database users
      • Encryption in transit
      • Audit cloud activity
    • Connect to clusters
      • Cloud Shell
      • Client shell
      • Connect applications
    • Alerts and monitoring
      • Alerts
      • Performance metrics
      • Live queries
      • Slow YSQL queries
      • Cluster activity
    • Manage clusters
      • Backup and restore
      • Scale and configure clusters
      • Create extensions
    • Administer Yugabyte Cloud
      • Manage cloud users
      • Manage billing
      • Cluster costs
    • Example applications
      • Connect a Spring application
      • Connect a YCQL Java application
      • Hasura Cloud
      • Deploy a GraphQL application
    • Security architecture
      • Security architecture
      • Shared responsibility model
    • Troubleshoot
    • Yugabyte Cloud FAQ
    • What's new
  • INTEGRATIONS
    • Apache Kafka
    • Apache Spark
    • JanusGraph
    • KairosDB
    • Presto
    • Metabase
    • WSO2 Identity Server
    • YSQL Loader
    • Yugabyte JDBC Driver
    • Prisma
    • Hasura
      • Application Development
      • Benchmarking
    • Spring Framework
      • Spring Data YugabyteDB
      • Spring Data Cassandra
    • Flyway
    • GORM
    • Liquibase
    • Sequelize
    • SQLAlchemy
    • Entity Framework
    • Django REST framework
  • REFERENCE
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Core functions
        • Universe creation
        • Table creation
        • Write IO path
        • Read IO path
        • High availability
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Read Committed
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • The SQL language
          • SQL statements
            • ABORT
            • ALTER DATABASE
            • ALTER DEFAULT PRIVILEGES
            • ALTER DOMAIN
            • ALTER GROUP
            • ALTER POLICY
            • ALTER ROLE
            • ALTER SEQUENCE
            • ALTER TABLE
            • ALTER USER
            • ANALYZE
            • BEGIN
            • CALL
            • COMMENT
            • COMMIT
            • COPY
            • CREATE AGGREGATE
            • CREATE CAST
            • CREATE DATABASE
            • CREATE DOMAIN
            • CREATE EXTENSION
            • CREATE FUNCTION
            • CREATE GROUP
            • CREATE INDEX
            • CREATE MATERIALIZED VIEW
            • CREATE OPERATOR
            • CREATE OPERATOR CLASS
            • CREATE POLICY
            • CREATE PROCEDURE
            • CREATE ROLE
            • CREATE RULE
            • CREATE SCHEMA
            • CREATE SEQUENCE
            • CREATE TABLE
            • CREATE TABLE AS
            • CREATE TRIGGER
            • CREATE TYPE
            • CREATE USER
            • CREATE VIEW
            • DEALLOCATE
            • DELETE
            • DO
            • DROP AGGREGATE
            • DROP CAST
            • DROP DATABASE
            • DROP DOMAIN
            • DROP EXTENSION
            • DROP FUNCTION
            • DROP GROUP
            • DROP MATERIALIZED VIEW
            • DROP OPERATOR
            • DROP OPERATOR CLASS
            • DROP OWNED
            • DROP POLICY
            • DROP PROCEDURE
            • DROP ROLE
            • DROP RULE
            • DROP SEQUENCE
            • DROP TABLE
            • DROP TRIGGER
            • DROP TYPE
            • DROP USER
            • END
            • EXECUTE
            • EXPLAIN
            • GRANT
            • INSERT
            • LOCK
            • PREPARE
            • REASSIGN OWNED
            • REFRESH MATERIALIZED VIEW
            • RELEASE SAVEPOINT
            • RESET
            • REVOKE
            • ROLLBACK
            • ROLLBACK TO SAVEPOINT
            • SAVEPOINT
            • SELECT
            • SET
            • SET CONSTRAINTS
            • SET ROLE
            • SET SESSION AUTHORIZATION
            • SET TRANSACTION
            • SHOW
            • SHOW TRANSACTION
            • TRUNCATE
            • UPDATE
            • VALUES
          • WITH clause
            • WITH clause—SQL syntax and semantics
            • recursive CTE
            • case study—traversing an employee hierarchy
            • traversing general graphs
              • graph representation
              • common code
              • undirected cyclic graph
              • directed cyclic graph
              • directed acyclic graph
              • rooted tree
              • Unique containing paths
              • Stress testing find_paths()
            • case study—Bacon Numbers from IMDb
              • Bacon numbers for synthetic data
              • Bacon numbers for IMDb data
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
            • Conceptual background
            • Timezones and UTC offsets
              • Catalog views
              • Extended_timezone_names
                • Unrestricted full projection
                • Real timezones with DST
                • Real timezones no DST
                • Synthetic timezones no DST
              • Offset/timezone-sensitive operations
                • Timestamptz to/from timestamp conversion
                • Pure 'day' interval arithmetic
              • Four ways to specify offset
                • Name-resolution rules
                  • 1 case-insensitive resolution
                  • 2 ~names.abbrev never searched
                  • 3 'set timezone' string not resolved in ~abbrevs.abbrev
                  • 4 ~abbrevs.abbrev before ~names.name
                  • Helper functions
              • Syntax contexts for offset
              • Recommended practice
            • Typecasting between date-time and text-values
            • Semantics of the date-time data types
              • Date data type
              • Time data type
              • Plain timestamp and timestamptz
              • Interval data type
                • Interval representation
                  • Ad hoc examples
                  • Representation model
                • Interval value limits
                • Declaring intervals
                • Justify() and extract(epoch...)
                • Interval arithmetic
                  • Interval-interval comparison
                  • Interval-interval addition and subtraction
                  • Interval-number multiplication
                  • Moment-moment overloads of "-"
                  • Moment-interval overloads of "+" and "-"
                • Custom interval domains
                • Interval utility functions
            • Typecasting between date-time datatypes
            • Operators
              • Test comparison overloads
              • Test addition overloads
              • Test subtraction overloads
              • Test multiplication overloads
              • Test division overloads
            • General-purpose functions
              • Creating date-time values
              • Manipulating date-time values
              • Current date-time moment
              • Delaying execution
              • Miscellaneous
                • Function age()
                • Function extract() | date_part()
                • Implementations that model the overlaps operator
            • Formatting functions
            • Case study—SQL stopwatch
            • Download & install the date-time utilities
            • ToC
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Range
          • Serial
          • UUID
        • Functions and operators
          • Aggregate functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • grouping sets, rollup, cube
            • Per function signature and purpose
              • avg(), count(), max(), min(), sum()
              • array_agg(), string_agg(), jsonb_agg(), jsonb_object_agg()
              • bit_and(), bit_or(), bool_and(), bool_or()
              • variance(), var_pop(), var_samp(), stddev(), stddev_pop(), stddev_samp()
              • linear regression
                • covar_pop(), covar_samp(), corr()
                • regr_%()
              • mode(), percentile_disc(), percentile_cont()
              • rank(), dense_rank(), percent_rank(), cume_dist()
            • case study—percentile_cont() and the "68–95–99.7" rule
            • case study—linear regression on COVID data
              • Download the COVIDcast data
              • Ingest the COVIDcast data
                • Inspect the COVIDcast data
                • Copy the .csv files to staging tables
                • Check staged data conforms to the rules
                • Join the staged data into a single table
                • SQL scripts
                  • Create cr_staging_tables()
                  • Create cr_copy_from_scripts()
                  • Create assert_assumptions_ok()
                  • Create xform_to_covidcast_fb_survey_results()
                  • ingest-the-data.sql
              • Analyze the COVIDcast data
                • symptoms vs mask-wearing by day
                • Data for scatter-plot for 21-Oct-2020
                • Scatter-plot for 21-Oct-2020
                • SQL scripts
                  • analysis-queries.sql
                  • synthetic-data.sql
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • case study—analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
          • yb_hash_code()
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL
      • Client drivers for YCQL
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • Arctype
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
      • Retail Analytics
  • RELEASES
    • Releases overview
      • v2.13 series (latest)
      • v2.12 series (stable)
      • v2.11 series
      • v2.9 series
      • v2.8 series
      • v2.7 series
      • v2.6 series
      • v2.5 series
      • v2.4 series
      • v2.3 series
      • v2.2 series
      • v2.1 series
      • v2.0 series
      • v1.3 series
      • v1.2 series
    • Release versioning
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • General FAQ
    • Operations FAQ
    • API compatibility FAQ
    • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> APIs > YSQL > Functions and operators > Aggregate functions > Per function signature and purpose >

Within-group ordered-set aggregate functions

Report a doc issue Suggest new content
  • mode()
    • Simple example without GROUP BY
    • Example that uses GROUP BY
  • percentile_disc(), percentile_cont()
    • Basic example using the scalar overloads
    • Basic example using the scalar overloads with GROUP BY
    • Determining the median value of a set of values
    • Exhaustive example using the array overloads

This section describes the use of mode(), percentile_disc(), and percentile_cont() as "within-group ordered-set" aggregate functions.

The functionality of the mode() aggregate function differs from that of the mutually closely related percentile_disc(), percentile_cont() aggregate function pair. All three are described in the same section because they share the same invocation syntax:

select
  grouping_column_1, grouping_column_1,...
  the_function(function-specific arg list) within group (order by ordering expression)
from t
group by grouping_column_1, grouping_column_1,...
order by grouping_column_1, grouping_column_1,...

The syntax gives this kind of aggregate function its name.

You might wonder why this particular syntax is required rather than the usual syntax that, for example, the max() and min() aggregate functions require. There is no underlying semantic reason. Rather, YugabyteDB inherits this from PostgreSQL. Other RDBMSs do use the same syntax for the "mode()" functionality as YugabyteDB uses for max() and min()—but they might spell the name of the function differently. Don't worry about this—the within-group ordered-set is sufficient to implement the required functionality.

mode()

Signature:

input value:       <none>
                   togther with "WITHIN GROUP" and "ORDER BY anyelement" invocation syntax 
return value:      anyelement

Purpose: Return the most frequent value of "sort expression". If there's more than one equally-frequent value, then one of these is silently chosen arbitrarily.

Simple example without GROUP BY

In this first contrived example, the "ordering expression" uses the lower() function and refers to two columns. It also evaluates to null for one row.

drop table if exists t cascade;
create table t(
  k     int   primary key,
  v1    text,
  v2    text);

insert into t(k, v1, v2) values
  (1, 'dog',   'food'),
  (2, 'cat',   'flap'),
  (3, null,    null),
  (4, 'zebra', 'stripe'),
  (5, 'frog',  'man'),
  (6, 'fish',  'soup'),
  (7, 'ZEB',   'RASTRIPE');

First, use count(*) ordinarily to determine how often each value produced by the chosen expression occurs.

\pset null <null>

select
  lower(v1||v2) as "ordering expr",
  count(*) as n
from t
group by lower(v1||v2)
order by n desc, lower(v1||v2) asc nulls last;

This is the result:

 ordering expr | n 
---------------+---
 zebrastripe   | 2
 catflap       | 1
 dogfood       | 1
 fishsoup      | 1
 frogman       | 1
 <null>        | 1

Some RDBMSs don't implement a "mode()" function but require you to use a query like this together with further syntax to pick out the top hit. Here is the YugabyteDB approach:

select
  mode() within group (order by lower(v1||v2)) as "most frequent ordering expr"
from t;

This is the result:

 ordering expr 
---------------
 zebrastripe

Example that uses GROUP BY

In this second example, the table has a grouping column to show how the WITHIN GROUP ORDER BY syntax is used together with the GROUP BY clause. The data is contrived so that the value "v = 37" occurs twice for "class = 1" and so that the value "v = 42" occurs twice for "class = 2". Otherwise each distinct value of "v" occurs just once.

drop table if exists t cascade;
create table t(
  k     int  primary key,
  class int  not null,
  v     int  not null);

insert into t(k, class, v)
select
  s.v,
  1,
  case s.v between 5 and 6
    when true then 37
              else s.v
  end
from generate_series(1, 10) as s(v)
union all
select
  s.v,
  2,
  case s.v between 15 and 17
    when true then 42
              else s.v
  end
from generate_series(11, 20) as s(v);

The dense_rank() window function provides an effective way to inspect the result:

with
  a1 as (
  select class, v, count(*) as n
  from t
  group by class, v)

select
  class,
  n,
  v,
  dense_rank() over (partition by class order by n desc) as r
from a1
order by class, r;

This is the result:

 class | n | v  | r 
-------+---+----+---
     1 | 2 | 37 | 1
     1 | 1 |  2 | 2
     1 | 1 | 10 | 2
     1 | 1 |  1 | 2
     1 | 1 |  7 | 2
     1 | 1 |  9 | 2
     1 | 1 |  8 | 2
     1 | 1 |  4 | 2
     1 | 1 |  3 | 2
     
     2 | 3 | 42 | 1
     2 | 1 | 20 | 2
     2 | 1 | 13 | 2
     2 | 1 | 12 | 2
     2 | 1 | 18 | 2
     2 | 1 | 14 | 2
     2 | 1 | 19 | 2
     2 | 1 | 11 | 2

The blank line was added manually to improve readability.

It's easy, now, to elaborate the query with a second WITH clause view to pick out the row from each class whose dense rank is one—in other words the modal value. (If the data had more than one row with a dense rank of one, then they would all be picked out.)

with
  a1 as (
    select class, v, count(*) as n
    from t
    group by class, v),

  a2 as (
    select
      class,
      n,
      v,
      dense_rank() over (partition by class order by n desc) as r
    from a1)

select class, n, v
from a2
where r = 1
order by class;

This is the result:

 class | n | v  
-------+---+----
     1 | 2 | 37
     2 | 3 | 42

Here's how to use the mode() within-group ordered-set aggregate function to get the modal value in each class.

select
  class,
  mode() within group (order by v)
from t
group by class
order by class;

This is the result:

 class | mode 
-------+------
     1 |   37
     2 |   42

The query that gets just the modal value for each class using the mode() function is very much simpler that the one that gets both the modal value and its frequency of occurrence using the dense_rank() window function. It would be possible to involve the basic use of mode() in a WITH clause construct that defines other subqueries to get both the modal value and its frequency of occurrence that way. But this would bring the complexity up to a similar level to that of the query that uses dense_rank() to this end.

percentile_disc(), percentile_cont()

These two aggregate functions are closely related. Briefly, they implement the inverse of the functionality that the percent_rank() window function implements. For this reason, they are sometimes referred to as “inverse distribution” functions.

Signature:

-- First overload.
input value:       double precision
                   togther with "WITHIN GROUP" and "ORDER BY anyelement" invocation syntax 
return value:      anyelement

-- Second overload.
input value:       double precision[]
                   togther with "WITHIN GROUP" and "ORDER BY anyelement" invocation syntax 
return value:      anyarray

Purpose: Each scalar overload of percentile_disc() and percentile_cont() takes a percentile rank value as input and returns the value, within the specified window, that would produce this. The window is specified by WITHIN GROUP in conjunction with the ORDER BY clause, and by the optional GROUP BY clause, in the subquery that invokes the aggregate function in its SELECT list.

  • percentile_disc() (the discrete variant) simply returns the first actual value (in the ordered set of candidate values in the window) that is equal to, or greater than, the value that would produce the specified percentile rank value.

  • percentile_cont() (the continuous variant) does just the same as percentile_disc() if there happens to be an exact match; otherwise it finds the immediately smaller value and the immediately bigger value and interpolates between them.

Each array overload takes an array of percentile rank values as input and returns the array of values that would produce these according to the rules for the discrete and continuous variants specified above.

Basic example using the scalar overloads

First, create a table with a set of ten successive prime numbers starting from an arbitrary prime.

drop table if exists t cascade;
create table t(v double precision primary key);

insert into t(v)
values (47), (53), (59), (61), (67), (71), (73), (83), (89), (97);

Now, note what percent_rank() produces:

with a as (
  select
    v,
    percent_rank() over (order by v) as pr
  from t) 
select
  to_char(v,  '90.99')              as v,
  to_char(pr, '90.999999999999999') as pr
from a;

This is the result:

   v    |         pr         
--------+--------------------
  47.00 |   0.00000000000000
  53.00 |   0.11111111111111
  59.00 |   0.22222222222222
  61.00 |   0.33333333333333
  67.00 |   0.44444444444444
  71.00 |   0.55555555555556
  73.00 |   0.66666666666667
  83.00 |   0.77777777777778
  89.00 |   0.88888888888889
  97.00 |   1.00000000000000

Next, choose the value 0.66666666666667 from the "pr" column. Notice that this corresponds to the value 73.00 from the "v" column. Use the chosen "pr" value as input to the scalar overloads of percentile_disc() and percentile_cont().

with a as (
  select
    percentile_disc(0.66666666666667) within group (order by v) as p_disc,
    percentile_cont(0.66666666666667) within group (order by v) as p_cont
  from t)
select
  to_char(a.p_disc, '90.99') as pd,
  to_char(a.p_cont, '90.99') as pc
from a;

This is the result:

   pd   |   pc   
--------+--------
  73.00 |  73.00

Notice that the value 73.00 from the "v" column has been recovered—and that the discrete and continuous variants each produce the same result in this example.

Next, choose the value 0.61111111111112. This does not occur in the "pr" column but lies mid-way between 0.55555555555556 and 0.66666666666667, produced respectively by 71.00 and 73.00 in the "v" column.

with a as (
  select
    percentile_disc(0.61111111111112) within group (order by v) as p_disc,
    percentile_cont(0.61111111111112) within group (order by v) as p_cont
  from t)
select
  to_char(a.p_disc, '90.99') as pd,
  to_char(a.p_cont, '90.99') as pc
from a;

This is the result:

   pd   |   pc   
--------+--------
  73.00 |  72.00

Notice that the "pd" value, 73.00, is the first "v" value that is greater than 72.00—and that the "pc" value, 72.00, has been interpolated between 71.00 and 73.00.

Basic example using the scalar overloads with GROUP BY

Re-create table "t" with a "class" column for grouping and display its contents:

rop table if exists t2 cascade;
create table t2(
  class  int,
  v      double precision,
  constraint t_pk primary key(class, v));

insert into t2(class, v) values
  (1, 47), (1, 53), (1, 59), (1, 61), (1, 67),
  (2, 47), (2, 53), (2, 59), (2, 61), (2, 67), (2, 71);

with a as (
  select
    class,
    v,
    percent_rank() over (partition by class order by v) as pr
  from t2)
select
                          class,
  to_char(v,  '90.99') as v,
  to_char(pr, '90.99') as pr
from a
order by class, v;

This is the result:

   class |   v    |   pr   
-------+--------+--------
     1 |  47.00 |   0.00
     1 |  53.00 |   0.25
     1 |  59.00 |   0.50
     1 |  61.00 |   0.75
     1 |  67.00 |   1.00
     2 |  47.00 |   0.00
     2 |  53.00 |   0.20
     2 |  59.00 |   0.40
     2 |  61.00 |   0.60
     2 |  67.00 |   0.80
     2 |  71.00 |   1.00

Notice that there are five rows for "class = 1" and six rows for "class = 2".

Next, choose the value 0.50 from the "pr" column. Notice that this corresponds to the value 59.00 from the "v" column when "class = 1" and it corresponds to a value between59.00 and 61.00 from the "v" column when "class = 2".

Use the chosen "pr" value as input to the scalar overloads of percentile_disc() and percentile_cont().

with a as (
  select
    class,
    percentile_disc(0.50) within group (order by v) as p_disc,
    percentile_cont(0.50) within group (order by v) as p_cont
  from t2
  group by class)
select
  to_char(a.p_disc, '90.99') as pd,
  to_char(a.p_cont, '90.99') as pc
from a;

This is the result:

   pd   |   pc   
--------+--------
  59.00 |  59.00
  59.00 |  60.00

percentile_disc() returned the exact "v" value when this was found and the immediately next larger value when it wasn't found. And percentile_cont() returned the exact "v" value when this was found and it interpolated between the immediately enclosing smaller and larger values when it wasn't found.

Determining the median value of a set of values

See this Wikipedia entry on median.

The median of a set of values is defined, naïvely, as the value that divides the set into two same-sized subsets when the values are sorted in size order. Loosely stated, it's the "middle" value in the ordered set. The values on the smaller side are all less than, or equal to, the median value. And the values on the larger side are all greater than, or equal to, the median value. Edge cases (like, for example, when the set has an even number of unique values) are accommodated by a refinement of the naïve rule statement.

The advantage brought by using the median in describing data rather than using the mean (implemented by the avg() aggregate function) is that it is not skewed so much by a small proportion of extremely large or small values; it therefore probably gives a better idea of a "typical" value. For example, household income or asset ownership is usually characterized by the median value because just a few multi-billionaires can skew the mean.

Some SQL database systems have a built-in median() aggregate function. But PostgreSQL, and therefore YugabyteDB, do not. However, this is of no practical consequence because percentile_cont(0.5), by definition, returns the median value.

Here are some examples and counter-examples.

drop function  if exists  median_test()  cascade;
drop type      if exists  rt             cascade; 
drop table     if exists  t              cascade;

create table t(v double precision primary key);
insert into t(v)
select s.v from generate_series(11, 19) as s(v);

create type rt as (
  "count"             bigint,
  "avg"               text,
  "median"            text ,
  "count l.e. median" bigint,
  "count g.e. median" bigint);

create or replace function median_test_result()
  returns rt
  language sql
as $body$
with a as (
select
  avg(v)   as avg,
  percentile_cont(0.5) within group (order by v) as median
from t)
select
  (select count(*) from t),
  to_char(a.avg,    '90.99'),
  to_char(a.median, '90.99'),
  (select count(*) from t where v <= a.median),
  (select count(*) from t where v >= a.median)
from a;
$body$;

select * from median_test_result();

This is the result:

 count |  avg   | median | count l.e. median | count g.e. median 
-------+--------+--------+-------------------+-------------------
     9 |  15.00 |  15.00 |                 5 |                 5

Here, there is an odd number of unique input values and the median is among these values. Now insert one extra unique value and repeat the test:

insert into t(v) values(20);
select * from median_test_result();

This is the new result:

 count |  avg   | median | count l.e. median | count g.e. median 
-------+--------+--------+-------------------+-------------------
    10 |  15.50 |  15.50 |                 5 |                 5

Here, there is an even number of unique input values and the median is not among these values. Rather, it lies between the two existing values that are immediately less than, and immediately greater than, the median value.

In these first two examples, the median and the mean are the same. But here's an extreme edge case where, as promised, the median differs significantly from the mean because of the skewing effect of the single outlier value 1000.00.:

drop table if exists t cascade;
create table t(k serial primary key, v double precision not null);
insert into t(v)
select 1 from generate_series(1, 3)
union all
select 2 from generate_series(1, 10)
union all
select 3 from generate_series(1, 5)
union all
select 1000;

select * from median_test_result();

select to_char(v, '9990.00') as v from t order by v;

Here is the result:

 count |  avg   | median | count l.e. median | count g.e. median 
-------+--------+--------+-------------------+-------------------
    19 |  54.63 |   2.00 |                13 |                16

This shows that the naïve rule statement is insufficiently precise and that the design of median_test_result() function is too crude. Here are the contents of the table, where white space has been introduced manually:

    v     
----------
     1.00
     1.00
     1.00
     2.00
     2.00
     2.00
     2.00
     2.00
     2.00
     
     2.00
     
     2.00
     2.00
     2.00
     3.00
     3.00
     3.00
     3.00
     3.00
  1000.00

The number of rows (nine) on the lower side of the middle-most row with the median value, 2.00, is indeed the same as the number of rows on the higher side of it. But the number of rows with a value less than or equal to the median value (thirteen) is different from the number of rows with a value greater than or equal to the median value (sixteen).

It isn't useful, here, to find unassailably precise wording to define the semantics of the median. The code illustrations are sufficient.

Exhaustive example using the array overloads

Recreate table "t" as it was defined and populated for the "Basic example using the scalar overloads".

drop table if exists t cascade;
create table t(v double precision primary key);

insert into t(v)
values (47), (53), (59), (61), (67), (71), (73), (83), (89), (97);

Now create a function that returns an array of all the values from the "pr" column for the demonstration of percent_rank() shown in that section.

create or replace function pr_vals()
  returns double precision[]
  language plpgsql
as $body$
declare
  a constant double precision[] := (
    with a as (
      select percent_rank() over (order by v) as v from t) 
    select array_agg(v order by v) from a);
begin
  return a;
end;
$body$;

Now define three relations in a WITH clause that correspond to "pr" values and the outputs of percentile_disc() and percentile_cont() from these inputs. Then inner-join these three relations. These comments are key:

  • Think of the original table "t(v double precision)" as an array "v[]".
  • Define the relation "i, percent_rank(v[i])".
  • Define the relation "i, percentile_disc(percent_rank(v[i]))".
  • Define the relation "i, percentile_cont(percent_rank(v[i]))".
  • Inner join them all using "i".
with
  -- Think of the original table "t(v double precision)" as an array "v[]".
  -- Define the relation "i, percent_rank(v[i])".
  pr_vals as (
    select
      g.i,
      (pr_vals())[g.i] as v
    from generate_subscripts(pr_vals(), 1) as g(i)),

  -- Define the relation "i, percentile_disc(percent_rank(v[i]))".
  pd_arr as (
    select
      percentile_disc(pr_vals()) within group (order by v) as v
    from t),
  pd_vals as (
    select
      g.i,
      (select v from pd_arr)[i] as v
    from generate_subscripts((select v from pd_arr), 1) as g(i)),

  -- Define the relation "i, percentile_cont(percent_rank(v[i]))".
  pc_arr as (
    select
      percentile_cont(pr_vals()) within group (order by v) as v
    from t),
  pc_vals as (
    select
      g.i,
      (select v from pc_arr)[i] as v
    from generate_subscripts((select v from pc_arr), 1) as g(i))

-- Inner join them all using "i".
select
  to_char(pr_vals.v, '90.999999999999999')  as pr,
  to_char(pd_vals.v, '90.99')               as "pd(pr)",
  to_char(pc_vals.v, '90.99')               as "pc(pr)"
from
  pr_vals
  inner join
  pd_vals using(i)
  inner join
  pc_vals using(i);

This is the result:

         pr         | pd(pr) | pc(pr) 
--------------------+--------+--------
   0.00000000000000 |  47.00 |  47.00
   0.11111111111111 |  53.00 |  53.00
   0.22222222222222 |  59.00 |  59.00
   0.33333333333333 |  61.00 |  61.00
   0.44444444444444 |  67.00 |  67.00
   0.55555555555556 |  71.00 |  71.00
   0.66666666666667 |  73.00 |  73.00
   0.77777777777778 |  83.00 |  83.00
   0.88888888888889 |  89.00 |  89.00
   1.00000000000000 |  97.00 |  97.00

Because the exact set of the ten "pr" values from the "pr" column for the demonstration of percent_rank() shown above is used as input for both the discrete and continuous array overloads, each produces the same result: the exact set of "v" values that produced the input set of "pr" values.

  • mode()
    • Simple example without GROUP BY
    • Example that uses GROUP BY
  • percentile_disc(), percentile_cont()
    • Basic example using the scalar overloads
    • Basic example using the scalar overloads with GROUP BY
    • Determining the median value of a set of values
    • Exhaustive example using the array overloads
Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2022 Yugabyte, Inc. All rights reserved.