Join us on YugabyteDB Community Slack
Star us on
Get Started
Slack
GitHub
Get Started
v2.13 (latest) v2.12 (stable) v2.8 (earlier version) v2.6 (earlier version) v2.4 (earlier version) v2.2 (earlier version) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • Node.js
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore
      • SQL features
        • Schemas and Tables
        • Data Types
        • Data Manipulation
        • Queries and Joins
        • Expressions and Operators
        • Stored Procedures
        • Triggers
        • Advanced features
          • Cursors
          • Table Partitioning
          • Views
          • Savepoints
          • Collations
          • Extensions
        • Going beyond SQL
          • Follower reads
          • Tablespaces
      • Fault tolerance
      • Horizontal scalability
        • Scaling Transactions
        • Sharding Data
      • Transactions
        • Distributed Transactions
        • Isolation Levels
        • Explicit Locking
      • Indexes and Constraints
        • Overview
        • Unique Indexes
        • Partial Indexes
        • Expression Indexes
        • Generalized Inverted Indexes
        • Primary Key
        • Foreign Key
        • Other Constraints
      • JSON support
      • Multi-region deployments
        • Sync replication (3+ regions)
        • Async Replication (2+ regions)
        • Row-Level Geo-Partitioning
        • Read replicas
      • Query tuning
        • Introduction
        • Get query statistics using pg_stat_statements
        • Viewing live queries with pg_stat_activity
        • Analyzing queries with EXPLAIN
        • Optimizing YSQL queries using pg_hint_plan
      • Cluster management
        • Point-in-time recovery
      • Security
      • Observability
        • Prometheus Integration
        • Grafana Dashboard
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
      • Explore sample apps
      • Best practices
      • Cloud-native development
        • Codespaces
        • Gitpod
    • Migrate
      • Migration process overview
      • Migrate from PostgreSQL
        • Convert a PostgreSQL schema
        • Migrate a PostgreSQL application
        • Export PostgreSQL data
        • Prepare a cluster
        • Import PostgreSQL data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect Clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Asynchronous Replication
        • Read replica clusters
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
      • Performance Troubleshooting
    • Secure
      • Security checklist
      • Enable Authentication
        • Enable User Authentication
        • Configure ysql_hba_conf_csv
      • Authentication Methods
        • Password Authentication
        • LDAP Authentication
        • Host-Based Authentication
        • Trust Authentication
      • Role-Based Access Control
        • Overview
        • Manage Users and Roles
        • Grant Privileges
        • Row-Level Security (RLS)
        • Column-Level Security
      • Encryption in Transit
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to Clusters
        • TLS and authentication
      • Encryption at rest
      • Column-level encryption
      • Audit Logging
        • Configure Audit Logging
        • Session-Level Audit Logging
        • Object-Level Audit Logging
      • Vulnerability disclosure policy
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Point-in-time recovery
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
      • Grow cluster
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
        • Replace a failed YB-TServer
        • Replace a failed YB-Master
        • Manual remote bootstrap when a majority of peers fail
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
        • Common error messages
    • Contribute
      • Core database
        • Contribution checklist
        • Build the source
        • Configure a CLion project
        • Run the tests
        • Coding style
  • YUGABYTE PLATFORM
    • Overview
      • Install
      • Configure
    • Install Yugabyte Platform
      • Prerequisites
      • Prepare the environment
      • Install software
      • Prepare nodes (on-prem)
      • Uninstall software
    • Configure Yugabyte Platform
      • Create admin user
      • Configure the cloud provider
      • Configure the backup target
      • Configure alerts
    • Create deployments
      • Multi-zone universe
      • Multi-region universe
      • Multi-cloud universe
      • Read replica cluster
      • Asynchronous replication
    • Manage deployments
      • Start and stop processes
      • Add a node
      • Eliminate an unresponsive node
      • Enable high availability
      • Edit configuration flags
      • Edit a universe
      • Delete a universe
      • Configure instance tags
      • Upgrade YugabyteDB software
      • Migrate to Helm 3
    • Back up universes
      • Configure backup storage
      • Back up universe data
      • Restore universe data
      • Schedule data backups
    • Security
      • Security checklist
      • Customize ports
      • LDAP authentication
      • Authorization platform
      • Create a KMS configuration
      • Enable encryption at rest
      • Enable encryption in transit (TLS)
      • Network security
    • Alerts and monitoring
      • Alerts
      • Live Queries dashboard
      • Slow Queries dashboard
    • Troubleshoot
      • Install and upgrade issues
      • Universe issues
    • Administer Yugabyte Platform
      • Back Up Yugabyte Platform
      • Authenticate with LDAP
    • Upgrade Yugabyte Platform
      • Upgrade using Replicated
  • YUGABYTE CLOUD
    • Overview
    • Quick start
      • Create a free cluster
      • Connect to the cluster
      • Create a database
      • Explore distributed SQL
      • Build an application
        • Before you begin
        • Java
        • Go
        • Python
        • Node.js
        • C
        • C++
        • C#
        • Ruby
        • Rust
        • PHP
    • Deploy clusters
      • Planning a cluster
      • Create a free cluster
      • Create a standard cluster
      • VPC network
        • Overview
        • Set up a VPC network
        • VPCs
        • Peering Connections
    • Secure clusters
      • IP allow lists
      • Database authorization
      • Add database users
      • Encryption in transit
      • Audit cloud activity
    • Connect to clusters
      • Cloud Shell
      • Client shell
      • Connect applications
    • Alerts and monitoring
      • Alerts
      • Performance metrics
      • Live queries
      • Slow YSQL queries
      • Cluster activity
    • Manage clusters
      • Backup and restore
      • Scale and configure clusters
      • Create extensions
    • Administer Yugabyte Cloud
      • Manage cloud users
      • Manage billing
      • Cluster costs
    • Example applications
      • Connect a Spring application
      • Connect a YCQL Java application
      • Hasura Cloud
      • Deploy a GraphQL application
    • Security architecture
      • Security architecture
      • Shared responsibility model
    • Troubleshoot
    • Yugabyte Cloud FAQ
    • What's new
  • INTEGRATIONS
    • Apache Kafka
    • Apache Spark
    • JanusGraph
    • KairosDB
    • Presto
    • Metabase
    • WSO2 Identity Server
    • YSQL Loader
    • Yugabyte JDBC Driver
    • Prisma
    • Hasura
      • Application Development
      • Benchmarking
    • Spring Framework
      • Spring Data YugabyteDB
      • Spring Data Cassandra
    • Flyway
    • GORM
    • Liquibase
    • Sequelize
    • SQLAlchemy
    • Entity Framework
    • Django REST framework
  • REFERENCE
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Core functions
        • Universe creation
        • Table creation
        • Write IO path
        • Read IO path
        • High availability
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Read Committed
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • The SQL language
          • SQL statements
            • ABORT
            • ALTER DATABASE
            • ALTER DEFAULT PRIVILEGES
            • ALTER DOMAIN
            • ALTER GROUP
            • ALTER POLICY
            • ALTER ROLE
            • ALTER SEQUENCE
            • ALTER TABLE
            • ALTER USER
            • ANALYZE
            • BEGIN
            • CALL
            • COMMENT
            • COMMIT
            • COPY
            • CREATE AGGREGATE
            • CREATE CAST
            • CREATE DATABASE
            • CREATE DOMAIN
            • CREATE EXTENSION
            • CREATE FUNCTION
            • CREATE GROUP
            • CREATE INDEX
            • CREATE MATERIALIZED VIEW
            • CREATE OPERATOR
            • CREATE OPERATOR CLASS
            • CREATE POLICY
            • CREATE PROCEDURE
            • CREATE ROLE
            • CREATE RULE
            • CREATE SCHEMA
            • CREATE SEQUENCE
            • CREATE TABLE
            • CREATE TABLE AS
            • CREATE TRIGGER
            • CREATE TYPE
            • CREATE USER
            • CREATE VIEW
            • DEALLOCATE
            • DELETE
            • DO
            • DROP AGGREGATE
            • DROP CAST
            • DROP DATABASE
            • DROP DOMAIN
            • DROP EXTENSION
            • DROP FUNCTION
            • DROP GROUP
            • DROP MATERIALIZED VIEW
            • DROP OPERATOR
            • DROP OPERATOR CLASS
            • DROP OWNED
            • DROP POLICY
            • DROP PROCEDURE
            • DROP ROLE
            • DROP RULE
            • DROP SEQUENCE
            • DROP TABLE
            • DROP TRIGGER
            • DROP TYPE
            • DROP USER
            • END
            • EXECUTE
            • EXPLAIN
            • GRANT
            • INSERT
            • LOCK
            • PREPARE
            • REASSIGN OWNED
            • REFRESH MATERIALIZED VIEW
            • RELEASE SAVEPOINT
            • RESET
            • REVOKE
            • ROLLBACK
            • ROLLBACK TO SAVEPOINT
            • SAVEPOINT
            • SELECT
            • SET
            • SET CONSTRAINTS
            • SET ROLE
            • SET SESSION AUTHORIZATION
            • SET TRANSACTION
            • SHOW
            • SHOW TRANSACTION
            • TRUNCATE
            • UPDATE
            • VALUES
          • WITH clause
            • WITH clause—SQL syntax and semantics
            • recursive CTE
            • case study—traversing an employee hierarchy
            • traversing general graphs
              • graph representation
              • common code
              • undirected cyclic graph
              • directed cyclic graph
              • directed acyclic graph
              • rooted tree
              • Unique containing paths
              • Stress testing find_paths()
            • case study—Bacon Numbers from IMDb
              • Bacon numbers for synthetic data
              • Bacon numbers for IMDb data
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
            • Conceptual background
            • Timezones and UTC offsets
              • Catalog views
              • Extended_timezone_names
                • Unrestricted full projection
                • Real timezones with DST
                • Real timezones no DST
                • Synthetic timezones no DST
              • Offset/timezone-sensitive operations
                • Timestamptz to/from timestamp conversion
                • Pure 'day' interval arithmetic
              • Four ways to specify offset
                • Name-resolution rules
                  • 1 case-insensitive resolution
                  • 2 ~names.abbrev never searched
                  • 3 'set timezone' string not resolved in ~abbrevs.abbrev
                  • 4 ~abbrevs.abbrev before ~names.name
                  • Helper functions
              • Syntax contexts for offset
              • Recommended practice
            • Typecasting between date-time and text-values
            • Semantics of the date-time data types
              • Date data type
              • Time data type
              • Plain timestamp and timestamptz
              • Interval data type
                • Interval representation
                  • Ad hoc examples
                  • Representation model
                • Interval value limits
                • Declaring intervals
                • Justify() and extract(epoch...)
                • Interval arithmetic
                  • Interval-interval comparison
                  • Interval-interval addition and subtraction
                  • Interval-number multiplication
                  • Moment-moment overloads of "-"
                  • Moment-interval overloads of "+" and "-"
                • Custom interval domains
                • Interval utility functions
            • Typecasting between date-time datatypes
            • Operators
              • Test comparison overloads
              • Test addition overloads
              • Test subtraction overloads
              • Test multiplication overloads
              • Test division overloads
            • General-purpose functions
              • Creating date-time values
              • Manipulating date-time values
              • Current date-time moment
              • Delaying execution
              • Miscellaneous
                • Function age()
                • Function extract() | date_part()
                • Implementations that model the overlaps operator
            • Formatting functions
            • Case study—SQL stopwatch
            • Download & install the date-time utilities
            • ToC
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Range
          • Serial
          • UUID
        • Functions and operators
          • Aggregate functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • grouping sets, rollup, cube
            • Per function signature and purpose
              • avg(), count(), max(), min(), sum()
              • array_agg(), string_agg(), jsonb_agg(), jsonb_object_agg()
              • bit_and(), bit_or(), bool_and(), bool_or()
              • variance(), var_pop(), var_samp(), stddev(), stddev_pop(), stddev_samp()
              • linear regression
                • covar_pop(), covar_samp(), corr()
                • regr_%()
              • mode(), percentile_disc(), percentile_cont()
              • rank(), dense_rank(), percent_rank(), cume_dist()
            • case study—percentile_cont() and the "68–95–99.7" rule
            • case study—linear regression on COVID data
              • Download the COVIDcast data
              • Ingest the COVIDcast data
                • Inspect the COVIDcast data
                • Copy the .csv files to staging tables
                • Check staged data conforms to the rules
                • Join the staged data into a single table
                • SQL scripts
                  • Create cr_staging_tables()
                  • Create cr_copy_from_scripts()
                  • Create assert_assumptions_ok()
                  • Create xform_to_covidcast_fb_survey_results()
                  • ingest-the-data.sql
              • Analyze the COVIDcast data
                • symptoms vs mask-wearing by day
                • Data for scatter-plot for 21-Oct-2020
                • Scatter-plot for 21-Oct-2020
                • SQL scripts
                  • analysis-queries.sql
                  • synthetic-data.sql
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • case study—analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
          • yb_hash_code()
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL
      • Client drivers for YCQL
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • Arctype
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
      • Retail Analytics
  • RELEASES
    • Releases overview
      • v2.13 series (latest)
      • v2.12 series (stable)
      • v2.11 series
      • v2.9 series
      • v2.8 series
      • v2.7 series
      • v2.6 series
      • v2.5 series
      • v2.4 series
      • v2.3 series
      • v2.2 series
      • v2.1 series
      • v2.0 series
      • v1.3 series
      • v1.2 series
    • Release versioning
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • General FAQ
    • Operations FAQ
    • API compatibility FAQ
    • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> APIs > YSQL > The SQL language > WITH clause >

Case study—using a recursive CTE to traverse an employee hierarchy

Report a doc issue Suggest new content
  • Create and populate the "emps" table
  • List the employees top-down with their immediate managers in breadth first order
  • List the path top-down from the ultimate manager to each employee in breadth first order
  • List the path top-down from the ultimate manager to each employee in depth-first order
  • Pretty-printing the top-down depth-first report of paths
  • List the path bottom-up from a chosen employee to the ultimate manager

A hierarchy is a specialization of the general notion of a graph—and, as such, it's the simplest kind of graph that still deserves that name. The taxonomy of successive specializations starts with the most general (the undirected cyclic graph) and successively descends to the most restricted, a hierarchy. The taxonomy refers to a hierarchy as a rooted tree. All this is explained in the section Using a recursive CTE to traverse graphs of all kinds.

The representation of a general graph requires an explicit, distinct, representation of the nodes and the edges. Of course, a hierarchy can be represented in this way. But because of how it's restricted, it allows a simpler representation in a SQL database where only the nodes are explicitly represented, in a single table, and where the edges are inferred using a self-referential foreign key:

  • A "parent ID" column [list] references the table's primary key—the "ID" column [list] enforced by a foreign key constraint.

This is referred to as a one-to-many recursive relationship, or one-to-many "pig's ear", in the jargon of entity-relationship modeling. The ultimate, unique, root of the hierarchy has the "parent ID" set to NULL.

The SQL that this section presents uses the simpler representation. But the section Using a recursive CTE to traverse graphs of all kinds shows, for completeness, that the general SQL that you need to follow edges in a general graph works when the general representation happens to describe a hierarchy. See the section Finding the paths in a rooted tree.

Download a zip of scripts that include all the code examples that implement this case study

All of the .sql scripts that this case study presents for copy-and-paste at the ysqlsh prompt are included for download in a zip-file.

Download recursive-cte-code-examples.zip.

After unzipping it on a convenient new directory, you'll see a README.txt. It tells you how to start the master-script. Simply start it in ysqlsh. You can run it time and again. It always finishes silently. You can see the report that it produces on a dedicated spool directory and confirm that your report is identical to the reference copy that is delivered in the zip-file.

Create and populate the "emps" table

The following code creates a table of employees that records each employee's manager using a self-referential foreign key as explained above. This is a stylized example where "name" serves as the primary key column, "mgr_name" serves as the self-referential foreign key column, and there are no other columns.

cr-table.sql
set client_min_messages = warning;
drop table if exists emps cascade;
create table emps(
  name     text primary key,
  mgr_name text);

-- The order of insertion is arbitrary
insert into emps(name, mgr_name) values
  ('mary',   null),
  ('fred',  'mary'),
  ('susan', 'mary'),
  ('john',  'mary'),
  ('doris', 'fred'),
  ('alice', 'john'),
  ('bill',  'john'),
  ('joan',  'bill'),
  ('george', 'mary'),
  ('edgar', 'john'),
  ('alfie', 'fred'),
  ('dick',  'fred');

-- Implement the one-to-many "pig's ear".
alter table emps
add constraint emps_mgr_name_fk
foreign key(mgr_name) references emps(name)
on delete restrict;

-- The ultimate manager has no manager.
-- Enforce the business rule "Maximum one ultimate manager".
create unique index t_mgr_name on emps((mgr_name is null)) where mgr_name is null;

Inspect the contents:

select name, mgr_name from emps order by 2 nulls first, 1;

This is the result:

  name  | mgr_name 
--------+----------
 mary   | 

 joan   | bill

 alfie  | fred
 dick   | fred
 doris  | fred

 alice  | john
 bill   | john
 edgar  | john

 fred   | mary
 george | mary
 john   | mary
 susan  | mary

The blank lines were added by hand to make the results easier to read.

Stress the constraints with this attempt to insert a second ultimate manager:

insert into emps(name, mgr_name) values ('bad', null);

It cases this error:

23505: duplicate key value violates unique constraint "t_mgr_name"

Stress the constraints with this attempt to delete an employee with reports:

delete from emps where name = 'fred';

It cases this error:

23503: update or delete on table "emps" violates foreign key constraint "emps_mgr_name_fk" on table "emps"
Key (name)=(fred) is still referenced from table "emps".

This constraint, together with the fact that the "mgr_name" column is nullable, guarantees the invariant that there is exactly one ultimate manager—in other words that the employee graph is a rooted tree (a.k.a. hierarchy). Check it thus:

do-assert-is-hierarchy.sql
do $body$
begin
  assert (select count(*) from emps where mgr_name is null) = 1,
    'Rule "Employee graph is a hierarchy" does not hold';
end;
$body$;

List the employees top-down with their immediate managers in breadth first order

The naïve formulation of the query to list the employees with their immediate managers uses a WITH clause that has a recursive CTE like this:

cr-view-top-down-simple.sql
drop view if exists top_down_simple cascade;
create view top_down_simple(depth, mgr_name, name) as
with
  recursive hierarchy_of_emps(depth, mgr_name, name) as (

    -- Non-recursive term.
    -- Select the exactly one ultimate manager.
    -- Define this emp to be at depth 1.
    (
      select
        1,
        '-',
        name
      from emps             
      where mgr_name is null
    )

    union all

    -- Recursive term.
    -- Treat the emps from the previous iteration as managers.
    -- Join these with their reports, if they have any.
    -- Increase the emergent depth by 1 with each step.
    -- Stop when none of the current putative managers has a report.
    -- Each successive iteration goes one level deeper in the hierarchy.
    (
      select
        h.depth + 1,
        e.mgr_name,
        e.name
      from
      emps as e
      inner join
      hierarchy_of_emps as h on e.mgr_name = h.name 
    )
  )
select
  depth,
  mgr_name,
  name
from hierarchy_of_emps;

Each successive iteration of the recursive term accumulates the set of direct reports, where such a report exists, of the employees produced first by the non-recursive term (i.e. the ultimate manager) and then by the employees produced by the previous iteration of the recursive term. The iteration stops when none of the employees produced by the previous iteration of the recursive term has a report.

Notice that the choice to define the ultimate manager to be at depth 1 is just that: a design choice. You might prefer to define the ultimate manager to be at depth 0, arguing that it's better to interpret this measure as the number of managers up the chain that the present employee has.

The result set that this view represents is usually ordered first by the calculated "depth" column and then by usefully orderable actual columns—the manager name and then the employee name in the case of the "emps" example table:

do-breadth-first-query.sql
select
  depth,
  mgr_name,
  name
from top_down_simple
order by
  depth,
  mgr_name nulls first,
  name;

This produces a so-called "breadth first" order. This is the result:

 depth | mgr_name |  name  
-------+----------+--------
     1 | -        | mary

     2 | mary     | fred
     2 | mary     | george
     2 | mary     | john
     2 | mary     | susan

     3 | fred     | alfie
     3 | fred     | dick
     3 | fred     | doris

     3 | john     | alice
     3 | john     | bill
     3 | john     | edgar

     4 | bill     | joan

The blank lines were added by hand to make the results easier to read.

List the path top-down from the ultimate manager to each employee in breadth first order

The term of art "path" denotes the list of managers from the ultimate manager through each next direct report down to the current employee. It is easily calculated by using array concatenation as described in the || operator subsection of the Array data types and functionality major section. Yet again, "array" functionality comes to the rescue.

cr-view-top-down-paths.sql
drop view if exists top_down_paths cascade;
create view top_down_paths(path) as
with
  recursive paths(path) as (
    select array[name]
    from emps
    where mgr_name is null

    union all

    select p.path||e.name
    from emps as e
    inner join paths as p on e.mgr_name = p.path[cardinality(path)] 
  )
select path from paths;

The cardinality of the path represents the depth. The result set that this view represents can be easily ordered first by the emergent "depth" and then by the employee name:

do-breadth-first-path-query.sql
select cardinality(path) as depth, path
from top_down_paths
order by
  depth,
  path[2] asc nulls first,
  path[3] asc nulls first,
  path[4] asc nulls first,
  path[5] asc nulls first;

The design of the ORDER BY clause relies on the following fact—documented in the Array data types and functionality major section:

If you read a within-array value with a tuple of index values that puts it outside of the array bounds, then you silently get NULL.

This is the result:

 depth |         path          
-------+-----------------------
     1 | {mary}

     2 | {mary,fred}
     2 | {mary,george}
     2 | {mary,john}
     2 | {mary,susan}

     3 | {mary,fred,alfie}
     3 | {mary,fred,dick}
     3 | {mary,fred,doris}
     3 | {mary,john,alice}
     3 | {mary,john,bill}
     3 | {mary,john,edgar}

     4 | {mary,john,bill,joan}

The blank lines were added by hand to make the results easier to read.

Notice that the "top_down_paths" view has the same information content as the "top_down_simple" view. And the queries that were used against each listed the information in the same order. The results from querying the "top_down_paths" view show, in the last-but-one and last array elements, the identical values to the "mgr_name" and "name" columns in the output from querying the "top_down_simple" view. But it's easier to read the results from the "top_down_paths" view because you don't need mentally to construct the path by looking, recursively, for the row that has the "mgr_name" of the row of interest as its "name" until you reach the ultimate manager.

This assert confirms the conclusion:

do-assert-top-down-path-view-and-top-down-simple-view-same-info.sql
do $body$
declare
  c constant int not null :=
    (
      with
        a1(depth, mgr_name, name) as (
          select depth, mgr_name, name from top_down_simple),

        a2(depth, mgr_name, name) as (
          select
            cardinality(path),
            case cardinality(path)
              when 1 then '-'
              else        path[cardinality(path) - 1]
            end,
            path[cardinality(path)]
          from top_down_paths),

        a1_except_a2(depth, mgr_name, name) as (
          select depth, mgr_name, name from a1
          except
          select depth, mgr_name, name from a2),

        a2_except_a1(depth, mgr_name, name) as (
          select depth, mgr_name, name from a2
          except
          select depth, mgr_name, name from a1)

      select count(*) from
      (
        select depth, mgr_name, name from a1_except_a2
        union all
        select depth, mgr_name, name from a2_except_a1
      ) as n
    );
begin
 assert c = 0, 'Unexpected';
end;
$body$;

This "UNION ALL of two complementary EXCEPT queries" is the standard pattern for checking if two different relations have the same content. Notice how the use of a WITH clause with ordinary (non-recursive) CTEs lets you express the logic in a maximally readable way.

List the path top-down from the ultimate manager to each employee in depth-first order

Do this:

do-depth-first-query.sql
select cardinality(path) as depth, path
from top_down_paths
order by
  path[2] asc nulls first,
  path[3] asc nulls first,
  path[4] asc nulls first,
  path[5] asc nulls first;

This query is almost identical to the query shown at do-breadth-first-path-query.sql. The only difference is that the first ORDER BY rule (to order by the depth) has been omitted.

This is the result:

 depth |         path          
-------+-----------------------
     1 | {mary}
     2 | {mary,fred}
     3 | {mary,fred,alfie}
     3 | {mary,fred,dick}
     3 | {mary,fred,doris}
     2 | {mary,george}
     2 | {mary,john}
     3 | {mary,john,alice}
     3 | {mary,john,bill}
     4 | {mary,john,bill,joan}
     3 | {mary,john,edgar}
     2 | {mary,susan}

You can see that the results are sorted in depth-first order with the employees at the same depth ordered by "name".

Notice that this:

select max(cardinality(path)) from top_down_paths;

returns the value 4 for the present example. This means that path[5] returns NULL—as would, for example, path[6], path[17], and path[42]. When such a query is issued programmatically, you can determine the maximum path cardinality and build the ORDER BY clause to have just the necessary and sufficient number of terms. Alternatively, for simpler code, you could write it with a number of terms that exceeds your best estimate of the maximum cardinality of the arrays that your program will have to deal with, ensuring safety with a straightforward test of the actual maximum cardinality.

Pretty-printing the top-down depth-first report of paths

Users often like to use indentation to visualize hierarchical depth. This is easy to achieve, thus:

do-indented-depth-first-query.sql
select
  rpad(' ', 2*cardinality(path) - 2, ' ')||path[cardinality(path)] as "emps hierarchy"
from top_down_paths
order by
  path[2] asc nulls first,
  path[3] asc nulls first,
  path[4] asc nulls first,
  path[5] asc nulls first;

This is the result:

 emps hierarchy 
----------------
 mary
   fred
     alfie
     dick
     doris
   george
   john
     alice
     bill
       joan
     edgar
   susan

You've probably seen how the Unix tree command presents the hierarchy that it computes using the long vertical bar, the long horizontal bar, the sideways "T", and the "L" symbols: │, ─, ├, and └. It's easy to output an approximation to this, that omits the long vertical bar, with a single SQL statement. The trick is to use the lead() window function to calculate the "next_depth" for each row as well as the "depth". If the "next_depth" value is equal to the "depth" value: then output the sideways "T"; else output the "L" (at the appropriate indentation level).

do-unix-tree-query.sql
with
  a1 as (
    select
      cardinality(path) as depth,
      path
    from top_down_paths),

  a2 as (
    select
      depth,
      lead(depth, 1, 0) over w as next_depth,
      path
    from a1
    window w as (
      order by
        path[2] asc nulls first,
        path[3] asc nulls first,
        path[4] asc nulls first,
        path[5] asc nulls first))

select
  case depth = next_depth
    when true then
      lpad(' ├── ', (depth - 1)*5, ' ')
    else
      lpad(' └── ', (depth - 1)*5, ' ')
  end
  ||
  path[depth] as "Approx. 'Unix tree'"
from a2
order by
  path[2] asc nulls first,
  path[3] asc nulls first,
  path[4] asc nulls first,
  path[5] asc nulls first;

Using 0 as the actual for the third (optional) lag() parameter means that the function will return 0 for the last row in the sorted order rather than NULL. This allows a simple equality test to be used reliably.

This is the result:

 mary
  └── fred
       ├── alfie
       ├── dick
       └── doris
  ├── george
  └── john
       ├── alice
       └── bill
            └── joan
       └── edgar
  └── susan

It's easy to transform this result manually into the format that Unix uses by filling in vertical bars as appropriate, thus:

 mary
  ├── fred
  │    ├── alfie
  │    ├── dick
  │    └── doris
  ├── george
  ├── john
  │    ├── alice
  │    ├── bill
  │    │    └── joan
  │    └── edgar
  └── susan

Of course, when you do this, you're intuitively following a rule. So it could certainly be implemented in procedural code, using this harness:

cr-function-unix-tree.sql
drop function if exists unix_tree() cascade;

create function unix_tree()
  returns table(t text)
  language plpgsql
as $body$
<<b>>declare
  results text[] not null := '{}';
begin
  with
    a1 as (
      select
        cardinality(path) as depth,
        path
      from top_down_paths),

    a2 as (
      select
        depth,
        lead(depth, 1, 99) over w as next_depth,
        path
      from a1
      window w as (
        order by
          path[2] asc nulls first,
          path[3] asc nulls first,
          path[4] asc nulls first,
          path[5] asc nulls first)),

    results as (
      select
        case depth = next_depth
          when true then
            lpad(' ├── ', (depth - 1)*5, ' ')
          else
            lpad(' └── ', (depth - 1)*5, ' ')
        end
        ||
        path[depth] as result
      from a2
      order by
        path[2] asc nulls first,
        path[3] asc nulls first,
        path[4] asc nulls first,
        path[5] asc nulls first)

  select array_agg(r.result)
  into b.results
  from results as r;

  foreach t in array results loop
    -- Implement the logic to transform the present results
    -- into the Unix "tree" format by adding vertical bars
    -- as approapriate.
  end loop;

  foreach t in array results loop
    return next;
  end loop;
end b;
$body$;

select t from unix_tree();

This is a sketch of the required logic, expressed crudely and somewhat approximately:

  • Scan each result row looking for an "L" in any of the character positions that it might occur.
  • When an "L" is found, look forward over as many result rows as it takes until you find the first non-space character in the character position where the "L" was found.
    • If this is found on the immediately next row, then do nothing; move to the next row, and start again.
    • Otherwise, and when the next non-space character is an "L" or a "T", substitute a "T" for the starting "L" or "T" and substitute a vertical bar for the remaining spaces. When the next non-space is not an "L" or "T", leave the spaces as is.
  • Repeat this for each character position where an "L" is found.

Implementing, and testing, the logic is left as an exercise for the reader.

List the path bottom-up from a chosen employee to the ultimate manager

The essential property of a hierarchy is that each successive upwards step defines exactly one parent (in this example, exactly one manager) by traversing the foreign key reference in the "many" to "one" direction. It's this property that distinguishes a hierarchy from a more general graph. This means that there is no distinction between breadth-first and depth-first when traversing a hierarchy upwards.

Here is the query. It's presented using the prepare-and-execute paradigm so that you can supply the starting employee of interest as a parameter. Notice that "path" is not yet defined in the non-recursive term. This means that the only practical design for the "depth" notion here is different from what was used in the top-down approach: the design chosen has it starting at 0 and increasing by 1 with each step up the hierarchy.

do-bottom-up-simple-query.sql
deallocate all;

prepare bottom_up_simple(text) as
with
  recursive hierarchy_of_emps(depth, name, mgr_name) as (

    -- Non-recursive term.
    -- Select the exactly one employee of interest.
    -- Define the depth to be zero.
    (
      select
        0,
        name,
        mgr_name
      from emps             
      where name = $1
    )

    union all

    -- Recursive term.
    -- Treat the employee from the previous iteration as a report.
    -- Join this with its manager.
    -- Increase the depth with each step upwards.
    -- Stop when the current putative report has no manager, i.e. is
    -- the ultimate manager.
    -- Each successive iteration goes one level higher in the hierarchy.
    (
      select
        h.depth + 1,
        e.name,
        e.mgr_name
      from
      emps as e
      inner join
      hierarchy_of_emps as h on h.mgr_name = e.name
    )
  )
select
  depth,
  name,
  coalesce(mgr_name, null, '-') as mgr_name
from hierarchy_of_emps;

execute bottom_up_simple('joan');

This is the result:

 depth | name | mgr_name 
-------+------+----------
     0 | joan | bill
     1 | bill | john
     2 | john | mary
     3 | mary | -

Alternatively, you could encapsulate the query in a function to deliver the same benefit. In this scheme, the result is a single array that represents the path from bottom to top.

cr-function-bottom-up-path.sql
drop function if exists bottom_up_path(text) cascade;

create function bottom_up_path(start_name in text)
  returns text[]
  language sql
as $body$
  with
    recursive hierarchy_of_emps(mgr_name, path) as (
      (
        select mgr_name, array[name]
        from emps             
        where name = start_name
      )
      union all
      (
        select e.mgr_name, h.path||e.name
        from
        emps as e
        inner join
        hierarchy_of_emps as h on e.name = h.mgr_name
      )
    )
  select path
  from hierarchy_of_emps
  order by cardinality(path) desc
  limit 1;
$body$;

select bottom_up_path('joan');

This is the result:

    bottom_up_path     
-----------------------
 {joan,bill,john,mary}

You can produce a prettier output like this:

cr-function-bottom-up-path-display.sql
drop function if exists bottom_up_path_display(text) cascade;
create function bottom_up_path_display(start_name in text)
  returns text
  language plpgsql
as $body$
declare
  path constant text[] not null := (bottom_up_path(start_name));
  t text not null := path[1];
begin
  for j in 2..cardinality(path) loop
    t := t||' > '||path[j];
  end loop;
  return t;
end;
$body$;

select bottom_up_path_display('joan');

This is the result:

  bottom_up_path_display   
---------------------------
 joan > bill > john > mary
  • Create and populate the "emps" table
  • List the employees top-down with their immediate managers in breadth first order
  • List the path top-down from the ultimate manager to each employee in breadth first order
  • List the path top-down from the ultimate manager to each employee in depth-first order
  • Pretty-printing the top-down depth-first report of paths
  • List the path bottom-up from a chosen employee to the ultimate manager
Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2022 Yugabyte, Inc. All rights reserved.