Join us on YugabyteDB Community Slack
Star us on
Get Started
Slack
GitHub
Get Started
v2.13 (latest) v2.12 (stable) v2.8 (earlier version) v2.6 (earlier version) v2.4 (earlier version) v2.2 (earlier version) v2.1 (earlier version) v2.0 (earlier version) v1.3 (earlier version)
  • YUGABYTEDB CORE
    • Quick start
      • 1. Install YugabyteDB
      • 2. Create a local cluster
      • 3. Explore distributed SQL
      • 4. Build an application
        • Java
        • Node.js
        • Go
        • Python
        • Ruby
        • C#
        • PHP
        • C++
        • C
        • Scala
    • Explore
      • SQL features
        • Schemas and Tables
        • Data Types
        • Data Manipulation
        • Queries and Joins
        • Expressions and Operators
        • Stored Procedures
        • Triggers
        • Advanced features
          • Cursors
          • Table Partitioning
          • Views
          • Savepoints
          • Collations
          • Extensions
        • Going beyond SQL
          • Follower reads
          • Tablespaces
      • Fault tolerance
      • Horizontal scalability
        • Scaling Transactions
        • Sharding Data
      • Transactions
        • Distributed Transactions
        • Isolation Levels
        • Explicit Locking
      • Indexes and Constraints
        • Overview
        • Unique Indexes
        • Partial Indexes
        • Expression Indexes
        • Generalized Inverted Indexes
        • Primary Key
        • Foreign Key
        • Other Constraints
      • JSON support
      • Multi-region deployments
        • Sync replication (3+ regions)
        • Async Replication (2+ regions)
        • Row-Level Geo-Partitioning
        • Read replicas
      • Query tuning
        • Introduction
        • Get query statistics using pg_stat_statements
        • Viewing live queries with pg_stat_activity
        • Analyzing queries with EXPLAIN
        • Optimizing YSQL queries using pg_hint_plan
      • Cluster management
        • Point-in-time recovery
      • Security
      • Observability
        • Prometheus Integration
        • Grafana Dashboard
    • Develop
      • Learn app development
        • 1. SQL vs NoSQL
        • 2. Data modeling
        • 3. Data types
        • 4. ACID transactions
        • 5. Aggregations
        • 6. Batch operations
        • 7. Date and time
        • 8. Strings and text
        • 9. TTL for data expiration
      • Real-world examples
        • E-Commerce app
        • IoT fleet management
      • Explore sample apps
      • Best practices
      • Cloud-native development
        • Codespaces
        • Gitpod
    • Migrate
      • Migration process overview
      • Migrate from PostgreSQL
        • Convert a PostgreSQL schema
        • Migrate a PostgreSQL application
        • Export PostgreSQL data
        • Prepare a cluster
        • Import PostgreSQL data
        • Verify Migration
    • Deploy
      • Deployment checklist
      • Manual deployment
        • 1. System configuration
        • 2. Install software
        • 3. Start YB-Masters
        • 4. Start YB-TServers
        • 5. Verify deployment
      • Kubernetes
        • Single-zone
          • Open Source
          • Amazon EKS
          • Google Kubernetes Engine
          • Azure Kubernetes Service
        • Multi-zone
          • Amazon EKS
          • Google Kubernetes Engine
        • Multi-cluster
          • Google Kubernetes Engine
        • Best practices
        • Connect Clients
      • Docker
      • Public clouds
        • Amazon Web Services
        • Google Cloud Platform
        • Microsoft Azure
      • Multi-DC deployments
        • Three+ data center (3DC)
        • Asynchronous Replication
        • Read replica clusters
    • Benchmark
      • TPC-C
      • sysbench
      • YCSB
      • Key-value workload
      • Large datasets
      • Scalability
        • Scaling queries
      • Resilience
        • Jepsen testing
      • Performance Troubleshooting
    • Secure
      • Security checklist
      • Enable Authentication
        • Enable User Authentication
        • Configure ysql_hba_conf_csv
      • Authentication Methods
        • Password Authentication
        • LDAP Authentication
        • Host-Based Authentication
        • Trust Authentication
      • Role-Based Access Control
        • Overview
        • Manage Users and Roles
        • Grant Privileges
        • Row-Level Security (RLS)
        • Column-Level Security
      • Encryption in Transit
        • Create server certificates
        • Enable server-to-server encryption
        • Enable client-to-server encryption
        • Connect to Clusters
        • TLS and authentication
      • Encryption at rest
      • Column-level encryption
      • Audit Logging
        • Configure Audit Logging
        • Session-Level Audit Logging
        • Object-Level Audit Logging
      • Vulnerability disclosure policy
    • Manage
      • Back up and restore
        • Back up data
        • Restore data
        • Point-in-time recovery
        • Snapshot and restore data
      • Migrate data
        • Bulk import
        • Bulk export
      • Change cluster configuration
      • Diagnostics reporting
      • Upgrade a deployment
      • Grow cluster
    • Troubleshoot
      • Troubleshooting
      • Cluster level issues
        • YCQL connection issues
        • YEDIS connection Issues
        • Recover tserver/master
        • Replace a failed YB-TServer
        • Replace a failed YB-Master
        • Manual remote bootstrap when a majority of peers fail
      • Node level issues
        • Check servers
        • Inspect logs
        • System statistics
        • Disk failure
        • Common error messages
    • Contribute
      • Core database
        • Contribution checklist
        • Build the source
        • Configure a CLion project
        • Run the tests
        • Coding style
  • YUGABYTE PLATFORM
    • Overview
      • Install
      • Configure
    • Install Yugabyte Platform
      • Prerequisites
      • Prepare the environment
      • Install software
      • Prepare nodes (on-prem)
      • Uninstall software
    • Configure Yugabyte Platform
      • Create admin user
      • Configure the cloud provider
      • Configure the backup target
      • Configure alerts
    • Create deployments
      • Multi-zone universe
      • Multi-region universe
      • Multi-cloud universe
      • Read replica cluster
      • Asynchronous replication
    • Manage deployments
      • Start and stop processes
      • Add a node
      • Eliminate an unresponsive node
      • Enable high availability
      • Edit configuration flags
      • Edit a universe
      • Delete a universe
      • Configure instance tags
      • Upgrade YugabyteDB software
      • Migrate to Helm 3
    • Back up universes
      • Configure backup storage
      • Back up universe data
      • Restore universe data
      • Schedule data backups
    • Security
      • Security checklist
      • Customize ports
      • LDAP authentication
      • Authorization platform
      • Create a KMS configuration
      • Enable encryption at rest
      • Enable encryption in transit (TLS)
      • Network security
    • Alerts and monitoring
      • Alerts
      • Live Queries dashboard
      • Slow Queries dashboard
    • Troubleshoot
      • Install and upgrade issues
      • Universe issues
    • Administer Yugabyte Platform
      • Back Up Yugabyte Platform
      • Authenticate with LDAP
    • Upgrade Yugabyte Platform
      • Upgrade using Replicated
  • YUGABYTE CLOUD
    • Overview
    • Quick start
      • Create a free cluster
      • Connect to the cluster
      • Create a database
      • Explore distributed SQL
      • Build an application
        • Before you begin
        • Java
        • Go
        • Python
        • Node.js
        • C
        • C++
        • C#
        • Ruby
        • Rust
        • PHP
    • Deploy clusters
      • Planning a cluster
      • Create a free cluster
      • Create a standard cluster
      • VPC network
        • Overview
        • Set up a VPC network
        • VPCs
        • Peering Connections
    • Secure clusters
      • IP allow lists
      • Database authorization
      • Add database users
      • Encryption in transit
      • Audit cloud activity
    • Connect to clusters
      • Cloud Shell
      • Client shell
      • Connect applications
    • Alerts and monitoring
      • Alerts
      • Performance metrics
      • Live queries
      • Slow YSQL queries
      • Cluster activity
    • Manage clusters
      • Backup and restore
      • Scale and configure clusters
      • Create extensions
    • Administer Yugabyte Cloud
      • Manage cloud users
      • Manage billing
      • Cluster costs
    • Example applications
      • Connect a Spring application
      • Connect a YCQL Java application
      • Hasura Cloud
      • Deploy a GraphQL application
    • Security architecture
      • Security architecture
      • Shared responsibility model
    • Troubleshoot
    • Yugabyte Cloud FAQ
    • What's new
  • INTEGRATIONS
    • Apache Kafka
    • Apache Spark
    • JanusGraph
    • KairosDB
    • Presto
    • Metabase
    • WSO2 Identity Server
    • YSQL Loader
    • Yugabyte JDBC Driver
    • Prisma
    • Hasura
      • Application Development
      • Benchmarking
    • Spring Framework
      • Spring Data YugabyteDB
      • Spring Data Cassandra
    • Flyway
    • GORM
    • Liquibase
    • Sequelize
    • SQLAlchemy
    • Entity Framework
    • Django REST framework
  • REFERENCE
    • Architecture
      • Design goals
      • Key concepts
        • Universe
        • YB-TServer Service
        • YB-Master Service
      • Core functions
        • Universe creation
        • Table creation
        • Write IO path
        • Read IO path
        • High availability
      • Layered architecture
      • Query layer
        • Overview
      • DocDB transactions layer
        • Transactions overview
        • Transaction isolation levels
        • Explicit locking
        • Read Committed
        • Single-row transactions
        • Distributed transactions
        • Transactional IO path
      • DocDB sharding layer
        • Hash & range sharding
        • Tablet splitting
        • Colocated tables
      • DocDB replication layer
        • Replication
        • xCluster replication
        • Read replicas
        • Change data capture (CDC)
      • DocDB storage layer
        • Persistence
        • Performance
    • APIs
      • YSQL
        • The SQL language
          • SQL statements
            • ABORT
            • ALTER DATABASE
            • ALTER DEFAULT PRIVILEGES
            • ALTER DOMAIN
            • ALTER GROUP
            • ALTER POLICY
            • ALTER ROLE
            • ALTER SEQUENCE
            • ALTER TABLE
            • ALTER USER
            • ANALYZE
            • BEGIN
            • CALL
            • COMMENT
            • COMMIT
            • COPY
            • CREATE AGGREGATE
            • CREATE CAST
            • CREATE DATABASE
            • CREATE DOMAIN
            • CREATE EXTENSION
            • CREATE FUNCTION
            • CREATE GROUP
            • CREATE INDEX
            • CREATE MATERIALIZED VIEW
            • CREATE OPERATOR
            • CREATE OPERATOR CLASS
            • CREATE POLICY
            • CREATE PROCEDURE
            • CREATE ROLE
            • CREATE RULE
            • CREATE SCHEMA
            • CREATE SEQUENCE
            • CREATE TABLE
            • CREATE TABLE AS
            • CREATE TRIGGER
            • CREATE TYPE
            • CREATE USER
            • CREATE VIEW
            • DEALLOCATE
            • DELETE
            • DO
            • DROP AGGREGATE
            • DROP CAST
            • DROP DATABASE
            • DROP DOMAIN
            • DROP EXTENSION
            • DROP FUNCTION
            • DROP GROUP
            • DROP MATERIALIZED VIEW
            • DROP OPERATOR
            • DROP OPERATOR CLASS
            • DROP OWNED
            • DROP POLICY
            • DROP PROCEDURE
            • DROP ROLE
            • DROP RULE
            • DROP SEQUENCE
            • DROP TABLE
            • DROP TRIGGER
            • DROP TYPE
            • DROP USER
            • END
            • EXECUTE
            • EXPLAIN
            • GRANT
            • INSERT
            • LOCK
            • PREPARE
            • REASSIGN OWNED
            • REFRESH MATERIALIZED VIEW
            • RELEASE SAVEPOINT
            • RESET
            • REVOKE
            • ROLLBACK
            • ROLLBACK TO SAVEPOINT
            • SAVEPOINT
            • SELECT
            • SET
            • SET CONSTRAINTS
            • SET ROLE
            • SET SESSION AUTHORIZATION
            • SET TRANSACTION
            • SHOW
            • SHOW TRANSACTION
            • TRUNCATE
            • UPDATE
            • VALUES
          • WITH clause
            • WITH clause—SQL syntax and semantics
            • recursive CTE
            • case study—traversing an employee hierarchy
            • traversing general graphs
              • graph representation
              • common code
              • undirected cyclic graph
              • directed cyclic graph
              • directed acyclic graph
              • rooted tree
              • Unique containing paths
              • Stress testing find_paths()
            • case study—Bacon Numbers from IMDb
              • Bacon numbers for synthetic data
              • Bacon numbers for IMDb data
        • Data types
          • Array
            • array[] constructor
            • Literals
              • Text typecasting and literals
              • Array of primitive values
              • Row
              • Array of rows
            • FOREACH loop (PL/pgSQL)
            • array of DOMAINs
            • Functions and operators
              • ANY and ALL
              • Array comparison
              • Array slice operator
              • Array concatenation
              • Array properties
              • array_agg(), unnest(), generate_subscripts()
              • array_fill()
              • array_position(), array_positions()
              • array_remove()
              • array_replace() / set value
              • array_to_string()
              • string_to_array()
          • Binary
          • Boolean
          • Character
          • Date and time
            • Conceptual background
            • Timezones and UTC offsets
              • Catalog views
              • Extended_timezone_names
                • Unrestricted full projection
                • Real timezones with DST
                • Real timezones no DST
                • Synthetic timezones no DST
              • Offset/timezone-sensitive operations
                • Timestamptz to/from timestamp conversion
                • Pure 'day' interval arithmetic
              • Four ways to specify offset
                • Name-resolution rules
                  • 1 case-insensitive resolution
                  • 2 ~names.abbrev never searched
                  • 3 'set timezone' string not resolved in ~abbrevs.abbrev
                  • 4 ~abbrevs.abbrev before ~names.name
                  • Helper functions
              • Syntax contexts for offset
              • Recommended practice
            • Typecasting between date-time and text-values
            • Semantics of the date-time data types
              • Date data type
              • Time data type
              • Plain timestamp and timestamptz
              • Interval data type
                • Interval representation
                  • Ad hoc examples
                  • Representation model
                • Interval value limits
                • Declaring intervals
                • Justify() and extract(epoch...)
                • Interval arithmetic
                  • Interval-interval comparison
                  • Interval-interval addition and subtraction
                  • Interval-number multiplication
                  • Moment-moment overloads of "-"
                  • Moment-interval overloads of "+" and "-"
                • Custom interval domains
                • Interval utility functions
            • Typecasting between date-time datatypes
            • Operators
              • Test comparison overloads
              • Test addition overloads
              • Test subtraction overloads
              • Test multiplication overloads
              • Test division overloads
            • General-purpose functions
              • Creating date-time values
              • Manipulating date-time values
              • Current date-time moment
              • Delaying execution
              • Miscellaneous
                • Function age()
                • Function extract() | date_part()
                • Implementations that model the overlaps operator
            • Formatting functions
            • Case study—SQL stopwatch
            • Download & install the date-time utilities
            • ToC
          • JSON
            • JSON literals
            • Primitive and compound data types
            • Code example conventions
            • Indexes and check constraints
            • Functions & operators
              • ::jsonb, ::json, ::text (typecast)
              • ->, ->>, #>, #>> (JSON subvalues)
              • - and #- (remove)
              • || (concatenation)
              • = (equality)
              • @> and <@ (containment)
              • ? and ?| and ?& (key or value existence)
              • array_to_json()
              • jsonb_agg()
              • jsonb_array_elements()
              • jsonb_array_elements_text()
              • jsonb_array_length()
              • jsonb_build_object()
              • jsonb_build_array()
              • jsonb_each()
              • jsonb_each_text()
              • jsonb_extract_path()
              • jsonb_extract_path_text() and json_extract_path_text()
              • jsonb_object()
              • jsonb_object_agg()
              • jsonb_object_keys()
              • jsonb_populate_record()
              • jsonb_populate_recordset()
              • jsonb_pretty()
              • jsonb_set() and jsonb_insert()
              • jsonb_strip_nulls()
              • jsonb_to_record()
              • jsonb_to_recordset()
              • jsonb_typeof()
              • row_to_json()
              • to_jsonb()
          • Money
          • Numeric
          • Range
          • Serial
          • UUID
        • Functions and operators
          • Aggregate functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • grouping sets, rollup, cube
            • Per function signature and purpose
              • avg(), count(), max(), min(), sum()
              • array_agg(), string_agg(), jsonb_agg(), jsonb_object_agg()
              • bit_and(), bit_or(), bool_and(), bool_or()
              • variance(), var_pop(), var_samp(), stddev(), stddev_pop(), stddev_samp()
              • linear regression
                • covar_pop(), covar_samp(), corr()
                • regr_%()
              • mode(), percentile_disc(), percentile_cont()
              • rank(), dense_rank(), percent_rank(), cume_dist()
            • case study—percentile_cont() and the "68–95–99.7" rule
            • case study—linear regression on COVID data
              • Download the COVIDcast data
              • Ingest the COVIDcast data
                • Inspect the COVIDcast data
                • Copy the .csv files to staging tables
                • Check staged data conforms to the rules
                • Join the staged data into a single table
                • SQL scripts
                  • Create cr_staging_tables()
                  • Create cr_copy_from_scripts()
                  • Create assert_assumptions_ok()
                  • Create xform_to_covidcast_fb_survey_results()
                  • ingest-the-data.sql
              • Analyze the COVIDcast data
                • symptoms vs mask-wearing by day
                • Data for scatter-plot for 21-Oct-2020
                • Scatter-plot for 21-Oct-2020
                • SQL scripts
                  • analysis-queries.sql
                  • synthetic-data.sql
          • currval()
          • lastval()
          • nextval()
          • Window functions
            • Informal functionality overview
            • Invocation syntax and semantics
            • Per function signature and purpose
              • row_number(), rank() and dense_rank()
              • percent_rank(), cume_dist() and ntile()
              • first_value(), nth_value(), last_value()
              • lag(), lead()
              • Tables for the code examples
                • table t1
                • table t2
                • table t3
                • table t4
            • case study—analyzing a normal distribution
              • Bucket allocation scheme
              • do_clean_start.sql
              • cr_show_t4.sql
              • cr_dp_views.sql
              • cr_int_views.sql
              • cr_pr_cd_equality_report.sql
              • cr_bucket_using_width_bucket.sql
              • cr_bucket_dedicated_code.sql
              • do_assert_bucket_ok
              • cr_histogram.sql
              • cr_do_ntile.sql
              • cr_do_percent_rank.sql
              • cr_do_cume_dist.sql
              • do_populate_results.sql
              • do_report_results.sql
              • do_compare_dp_results.sql
              • do_demo.sql
              • Reports
                • Histogram report
                • dp-results
                • compare-dp-results
                • int-results
          • yb_hash_code()
        • Extensions
        • Keywords
        • Reserved names
      • YCQL
        • ALTER KEYSPACE
        • ALTER ROLE
        • ALTER TABLE
        • CREATE INDEX
        • CREATE KEYSPACE
        • CREATE ROLE
        • CREATE TABLE
        • CREATE TYPE
        • DROP INDEX
        • DROP KEYSPACE
        • DROP ROLE
        • DROP TABLE
        • DROP TYPE
        • GRANT PERMISSION
        • GRANT ROLE
        • REVOKE PERMISSION
        • REVOKE ROLE
        • USE
        • INSERT
        • SELECT
        • EXPLAIN
        • UPDATE
        • DELETE
        • TRANSACTION
        • TRUNCATE
        • Simple expressions
        • Subscripted expressions
        • Function call
        • Operators
        • BLOB
        • BOOLEAN
        • Collection
        • FROZEN
        • INET
        • Integer and counter
        • Non-integer
        • TEXT
        • DATE, TIME, and TIMESTAMP
        • UUID and TIMEUUID
        • JSONB
        • Date and time
        • BATCH
    • CLIs
      • yb-ctl
      • yb-docker-ctl
      • ysqlsh
      • ycqlsh
      • yb-admin
      • yb-ts-cli
      • ysql_dump
      • ysql_dumpall
    • Configuration
      • yb-tserver
      • yb-master
      • yugabyted
      • Default ports
    • Drivers
      • Client drivers for YSQL
      • Client drivers for YCQL
    • Connectors
      • Kafka Connect YugabyteDB
    • Third party tools
      • Arctype
      • DBeaver
      • DbSchema
      • pgAdmin
      • SQL Workbench/J
      • TablePlus
      • Visual Studio Code
    • Sample datasets
      • Chinook
      • Northwind
      • PgExercises
      • SportsDB
      • Retail Analytics
  • RELEASES
    • Releases overview
      • v2.13 series (latest)
      • v2.12 series (stable)
      • v2.11 series
      • v2.9 series
      • v2.8 series
      • v2.7 series
      • v2.6 series
      • v2.5 series
      • v2.4 series
      • v2.3 series
      • v2.2 series
      • v2.1 series
      • v2.0 series
      • v1.3 series
      • v1.2 series
    • Release versioning
  • FAQ
    • Comparisons
      • Amazon Aurora
      • Google Cloud Spanner
      • CockroachDB
      • TiDB
      • Vitess
      • MongoDB
      • FoundationDB
      • Amazon DynamoDB
      • Azure Cosmos DB
      • Apache Cassandra
      • PostgreSQL
      • Redis in-memory store
      • Apache HBase
    • General FAQ
    • Operations FAQ
    • API compatibility FAQ
    • Yugabyte Platform FAQ
  • MISC
    • YEDIS
      • Quick start
      • Develop
        • Build an application
        • C#
        • C++
        • Go
        • Java
        • NodeJS
        • Python
      • API reference
        • APPEND
        • AUTH
        • CONFIG
        • CREATEDB
        • DELETEDB
        • LISTDB
        • SELECT
        • DEL
        • ECHO
        • EXISTS
        • EXPIRE
        • EXPIREAT
        • FLUSHALL
        • FLUSHDB
        • GET
        • GETRANGE
        • GETSET
        • HDEL
        • HEXISTS
        • HGET
        • HGETALL
        • HINCRBY
        • HKEYS
        • HLEN
        • HMGET
        • HMSET
        • HSET
        • HSTRLEN
        • HVALS
        • INCR
        • INCRBY
        • KEYS
        • MONITOR
        • PEXPIRE
        • PEXPIREAT
        • PTTL
        • ROLE
        • SADD
        • SCARD
        • RENAME
        • SET
        • SETEX
        • PSETEX
        • SETRANGE
        • SISMEMBER
        • SMEMBERS
        • SREM
        • STRLEN
        • ZRANGE
        • TSADD
        • TSCARD
        • TSGET
        • TSLASTN
        • TSRANGEBYTIME
        • TSREM
        • TSREVRANGEBYTIME
        • TTL
        • ZADD
        • ZCARD
        • ZRANGEBYSCORE
        • ZREM
        • ZREVRANGE
        • ZSCORE
        • PUBSUB
        • PUBLISH
        • SUBSCRIBE
        • UNSUBSCRIBE
        • PSUBSCRIBE
        • PUNSUBSCRIBE
    • Legal
      • Third party software
> APIs > YSQL > The SQL language > WITH clause > traversing general graphs >

Stress testing different find_paths() implementations on maximally connected graphs

Report a doc issue Suggest new content
  • Definition of "maximally connected graph"
  • The number of paths in a maximally connected graph grows very much faster than linearly with the number of nodes
  • The stress test experiment
    • Create some helpers
    • Create a script to do the stress-test kernel
    • Implement scripts to execute the stress-test kernel for each method for each of the numbers of nodes in the range of interest.
  • Implication of the stress-test experiment for the implementation of the computation of Bacon Numbers

Before trying the code in this section, make sure that you have created the "edges" table (see cr-edges.sql) and installed all the code shown in the section Common code for traversing all kinds of graph. Make sure that you start by choosing the option that adds a tracing column and a trigger to the "raw_paths" table. (You will later re-create the "raw_paths" table without the tracing code before you do some timing tests.)

Definition of "maximally connected graph"

Look at this undirected cyclic graph:

six-node-maximally-connected-graph

Each of its six nodes has an edge to each of the other five nodes—in other words, it's maximally connected.

A "maximally connected graph" is a connected undirected graph where every node has an edge to every other node. Such a graph is inevitably cyclic.

("Connected" means that there exists a path from every node to every other node in the graph.)

The following code automates the creation of such a maximally connected undirected cyclic graph for the specified number of nodes. First create a helper procedure:

cr-populate-maximum-connectvity-edges.sql
drop procedure if exists populate_maximum_connectvity_edges(int) cascade;

create procedure populate_maximum_connectvity_edges(nr_nodes in int)
  language plpgsql
as $body$
declare
  o constant text := '(''n';
  p constant text := ''', ''n';
  c constant text := '''),';
  stmt text not null := 'insert into edges(node_1, node_2) values';
begin
  for j in 1..nr_nodes loop
    for k in j..(nr_nodes - 1) loop
      stmt := stmt||
        o||ltrim(to_char(j,     '009'))||
        p||ltrim(to_char(k + 1, '009'))||c;
    end loop;
  end loop;

  stmt := rtrim(stmt, ',');
  execute stmt;
end;
$body$;

Now invoke it to populate the "edges" table and inspect the outcome:

delete from edges;
alter table edges drop constraint if exists edges_chk cascade;
alter table edges add constraint edges_chk check(node_1 <> node_2);
call populate_maximum_connectvity_edges(nr_nodes=>6);

-- Implement the denormalization.
insert into edges(node_1, node_2)
select node_2 as node_1, node_1 as node_2
from edges;

select node_1, node_2 from edges order by node_1, node_2;

This is the result:

 node_1 | node_2 
--------+--------
 n001   | n002
 n001   | n003
 n001   | n004
 n001   | n005
 n001   | n006
 
 n002   | n001
 n002   | n003
 n002   | n004
 n002   | n005
 n002   | n006
 
 n003   | n001
 n003   | n002
 n003   | n004
 n003   | n005
 n003   | n006
 
 n004   | n001
 n004   | n002
 n004   | n003
 n004   | n005
 n004   | n006
 
 n005   | n001
 n005   | n002
 n005   | n003
 n005   | n004
 n005   | n006
 
 n006   | n001
 n006   | n002
 n006   | n003
 n006   | n004
 n006   | n005

The whitespace was added by hand to group the results into the sets of five edges that connect to each of the six nodes.

The number of paths in a maximally connected graph grows very much faster than linearly with the number of nodes

Recall how the logic of the recursive CTE from cr-find-paths-with-nocycle-check.sql is expressed:

with
  recursive paths(path) as (
    select array[start_node, node_2]
    from edges
    where
    node_1 = start_node

    union all

    select p.path||e.node_2
    from edges e, paths p
    where
    e.node_1 = terminal(p.path)
    and not e.node_2 = any(p.path) -- <<<<< Prevent cycles.
    )

Using node "n001" as the start node for the six-node example maximally connected graph, the result of the non-recursive term will be five paths. Then, the result of the first repeat of the recursive term will be four longer paths for each of the input five paths because the cycle prevention logic will exclude, for each input path in turn, the node that can be reached from its terminal that has already been found. And so it goes on: the result of the next repeat of the recursive term will be three longer paths for each input path; the next result will be two longer paths for each input path; the next result will be one longer path for each input path; and the next repeat will find no paths so that the repetition will end. The number of paths that each repetition finds will therefore grow like this:

Repeat number    Number of paths Found
            0                        5  -- non-recursive term
            1                5*4 =  20  -- 1st recursive term
            2              5*4*3 =  60  -- 2nd recursive term
            3            5*4*3*2 = 120  -- 3rd recursive term
            4          5*4*3*2*1 = 120  -- 4th recursive term
                                   ---
Final total number of paths:       325            

You can see this in action by re-creating the "find_paths()" implementation shown in cr-find-paths-with-pruning.sql and invoking it like this:

call find_paths(start_node => 'n001', prune => false);

Now inspect the repetition history:

select repeat_nr, count(*) as n
from raw_paths
group by repeat_nr
order by 1;

This is the result:

 repeat_nr |  n  
-----------+-----
         0 |   5
         1 |  20
         2 |  60
         3 | 120
         4 | 120

This agrees with what the analysis presented above predicts.

The following function implements the rule described above to compute, for a maximally connected graph, the total number of paths from an arbitrary start node versus the number of nodes in the graph. It follows from the definition that the number of paths for a particular graph is the same for every possible start node.

cr-total-paths-versus-number-of-nodes.sql
drop function if exists total_paths_versus_number_of_nodes(int) cascade;

create function total_paths_versus_number_of_nodes(nr_nodes in int)
  returns table(t text)
  language plpgsql
as $body$
begin
  t := 'no. of nodes   no. of paths';  return next;
  t := '------------   ------------';  return next;
  for j in 4..nr_nodes loop
    declare
      -- Non-recursive term result
      new_paths numeric not null := j - 1;
      total numeric not null := new_paths;
    begin
      -- recursive term repetitions.
      for k in 1..(j - 2) loop
        new_paths := new_paths * (j - k - 1)::bigint;
        total := total + new_paths;
      end loop;

      t := case total < 10000000
             when true then
                lpad(to_char(j, '9999'), 12)||lpad(to_char(total, '9,999,999'), 15) 
             else
                lpad(to_char(j, '9999'), 12)||lpad(to_char(total, '9.9EEEE'), 15)
           end;
      return next;
    end;
  end loop;
end;
$body$;

Invoke it like this:

\t on
select t from total_paths_versus_number_of_nodes(20);
\t off

This is the result:

 no. of nodes   no. of paths
 ------------   ------------
            4             15
            5             64
            6            325
            7          1,956
            8         13,699
            9        109,600
           10        986,409
           11      9,864,100
           12        1.1e+08
           13        1.3e+09
           14        1.7e+10
           15        2.4e+11
           16        3.6e+12
           17        5.7e+13
           18        9.7e+14
           19        1.7e+16
           20        3.3e+17

The growth rate is astonishing. You can see immediately that there must be a limit to the viability of using a "find_paths()" implementation that doesn't do early path pruning as the size of the maximally connected graph increases. The following thought experiment makes this clear.

Suppose that a clever, highly compressive, representation scheme were invented to store paths with an average space consumption per path of, say, just 10 bytes. And suppose that you invest in a 1 TB external SSD drive for your laptop to hold paths represented using this clever scheme. Your SSD's capacity is 2^40 bytes—about 10^12 bytes. This isn't enough to store the 2.4*10^11 paths that a maximally connected graph with just 15 nodes has.

Consider trying this test again and again with increasing values for :nr_nodes.

-- Sketch of stress-test kernel.
delete from edges;
call populate_maximum_connectvity_edges(nr_nodes=>:nr_nodes);

insert into edges(node_1, node_2)
select node_2 as node_1, node_1 as node_2
from edges;

call find_paths(start_node => 'n001', prune => false);

Even if you did create a single-node YugabyteDB cluster so that it used your 1 TB SSD for its data files, you doubtless wouldn't have the capacity for the paths for even a 14-node maximally connected graph because the hypothetical dedicated highly compressive path representation scheme isn't in use. This can only mean that the "find_paths()" execution would crash for some run with fewer nodes than 14.

The stress test experiment

The thought experiment, above, was described to show that the "find_paths()" scheme that uses schema-level tables to implement the intermediate and final results for the recursive CTE algorithm, without pruning, will inevitably reach a limit and crash when the target graph has too many paths. Similar reasoning shows that the ordinary, explicit, use of the recursive CTE will inevitably crash, too, under similar circumstances. There's no point in trying to make more realistic estimates of the various sizes that will jointly conspire to bring eventual failure. Rather, an empirical test better serves the purpose.

Before doing the stress-test experiment, make sure that you have created the "edges" table (see cr-edges.sql) and installed all the code shown in the section Common code for traversing all kinds of graph. But this time, make sure that you start by choosing the option that re-creates the "raw_paths" table without the tracing code before you do some timing tests.)

Note: The stress-test is implemented by a few scripts where all but one call other script(s). In order to run the whole test, you should create a directory on your computer and, when told to, save each script there with the name that's given. The downloadable code zip arranges all of the scripts it includes in a directory tree that implements a useful classification scheme. This means that, in the files from the zip, the arguments of the \i, \ir, and \o metacommands are spelled differently than they are here to include directory names.

Create some helpers

A scripting scheme is needed to call the basic test (see the code, above, that starts with the comment "Sketch of stress-test kernel") repeatedly for each value of interest of :nr_nodes and after installing, in turn, each of the three methods. It will also be useful if the stress-test kernel times each invocation of "find_paths()".

The "find_paths()" invocation syntax depends on the present method. A procedure manages this nicely:

cr-invoke-find-paths.sql
drop procedure if exists invoke_find_paths(text) cascade;

create procedure invoke_find_paths(method in text)
  language plpgsql
as $body$
begin
  case method
    when 'pure-with'   then call find_paths(start_node => 'n001');
    when 'prune-false' then call find_paths(start_node => 'n001', prune => false);
    when 'prune-true'  then call find_paths(start_node => 'n001', prune => true);
  end case;
end;
$body$;

The stress-test kernel must spool the output to a file whose name encodes the present number of nodes and method name. A well-known pattern comes to the rescue: use a table function to write the script that turns on spooling and invoke that script. This is the function:

cr-start-spooling-script.sql
drop function if exists start_spooling_script(int, text) cascade;

create function start_spooling_script(nr_nodes in int, method in text)
  returns text
  language plpgsql
as $body$
begin
  return '\o '||ltrim(nr_nodes::text)||'-nodes--'||method||'.txt';
end;
$body$;

Create a script to do the stress-test kernel

With the two building blocks, the procedure "invoke_find_paths()" and the function "start_spooling_script()", in place, the following script implements the stress-test kernel:

do-stress-test-kernel.sql

Save this script.

delete from edges;
call populate_maximum_connectvity_edges(nr_nodes=>:nr_nodes);

-- Implement the denormalization.
insert into edges(node_1, node_2)
select node_2 as node_1, node_1 as node_2
from edges;

\t on

\o start_spooling.sql
select start_spooling_script(:nr_nodes, :method);
\o

\i start_spooling.sql
select :method||' -- '||ltrim(:nr_nodes::text)||' nodes';
call start_stopwatch();
call invoke_find_paths(:method);
select 'elapsed time: '||stopwatch_reading() as t;
select 'count(*) from raw_paths: '||ltrim(to_char(count(*), '9,999,999')) from raw_paths;
\o

\t off

You can manually test a single use of it. First decide on the method. These are the choices:

  • "pure-with" — cr-find-paths-with-nocycle-check.sql

  • "prune-false" — cr-find-paths-with-pruning.sql invoked with prune => false

  • "prune-true" — cr-find-paths-with-pruning.sql invoked with prune => true

Decide on the number of nodes, for example 7. And choose the method, for example "pure-with". (Remember to install it.) Then do this:

\set nr_nodes 7
\set method '\''prune-false'\''
\i do-stress-test-kernel.sql

This will produce the 7-nodes--prune-false.txt spool file. It will look like this:

 prune-false -- 7 nodes

 elapsed time: 40 ms.

 count(*) from raw_paths: 1,956

Of course, the reported elapsed time that you see will doubtless differ from "40 ms".

Implement scripts to execute the stress-test kernel for each method for each of the numbers of nodes in the range of interest.

First, implement a script to invoke each of the three methods for a particular, pre-set, value of :nr_nodes. Get a clean slate for the "paths" tables before each timing test by dropping and re-creating each of them.

do-stress-test-for-all-methods.sql

Save this script. You'll also need to save the scripts that it calls: cr-raw-paths-no-tracing.sql, cr-supporting-path-tables.sql, cr-find-paths-with-nocycle-check.sql, and cr-find-paths-with-pruning.sql.

\set method '\''pure-with'\''
\i cr-raw-paths-no-tracing.sql
\i cr-supporting-path-tables.sql
\i cr-find-paths-with-nocycle-check.sql
\i do-stress-test-kernel.sql

\set method '\''prune-false'\''
\i cr-raw-paths-no-tracing.sql
\i cr-supporting-path-tables.sql
\i cr-find-paths-with-pruning.sql
\i do-stress-test-kernel.sql

\set method '\''prune-true'\''
\i cr-raw-paths-no-tracing.sql
\i cr-supporting-path-tables.sql
\i cr-find-paths-with-pruning.sql
\i do-stress-test-kernel.sql

Finally, create a script to invoke do-stress-test-for-all-methods.sql for each of the numbers of nodes in the range of interest.

do-stress-test.sql

Save this script.

\set nr_nodes 4
\i do-stress-test-for-all-methods.sql

\set nr_nodes 5
\i do-stress-test-for-all-methods.sql

\set nr_nodes 6
\i do-stress-test-for-all-methods.sql

\set nr_nodes 7
\i do-stress-test-for-all-methods.sql

\set nr_nodes 8
\i do-stress-test-for-all-methods.sql

\set nr_nodes 9
\i do-stress-test-for-all-methods.sql

\set nr_nodes 10
\i do-stress-test-for-all-methods.sql

Then invoke do-stress-test-kernel.sql by hand for each of the three methods, setting the number of nodes to 11. You'll see that each of "pure-with" and "prune-false" fails, with error messages that reflect the fact that the implementation can't handle as many as 9,864,100 nodes. But "prune-true" completes without error very quickly.

Finally, invoke do-stress-test-kernel.sql by hand using the "prune-true" method and 100 nodes. You'll see that it completes without error in about the same time as for any smaller number of nodes.

Here are the results:

The elapsed times are in seconds.

no. of nodes no. of paths pure-with prune-false prune-true
4 15 0.1 0.1 0.2
5 64 0.1 0.1 0.2
6 325 0.1 0.2 0.2
7 1,956 0.2 0.7 0.2
8 13,699 0.5 3.8 0.2
9 109,600 4.1 31.3 0.2
10 986,409 42.51 297.01 0.2
11 9,864,100 crash [1] crash [2] 0.2
100 2.5e+156 0.3

[1] Failed after 12 min. "Timed out: Read RPC (request call id 42549) to 127.0.0.1:9100 timed out after 1.117s"

[2] Failed after 48:52 min.with errors like this: "Timed out: Read RPC (request call id 9911242) to 127.0.0.1:9100 timed out after 0.983s."

The experiment shows that the "prune-false" approach is never preferable to the "pure-with" approach. Each fails beyond a 10-node graph; and the "prune-false" method is slower. This is very much to be expected. It was implemented only for pedagogic purposes and, more importantly, to provide the framework that allows the implementation of early pruning—the "prune-true" approach.

The experiment also shows that the "prune-true" approach is viable and quick even for graphs with unimaginably huge numbers of possible paths.

This leads to two conclusions:

  • It simply isn't feasible to use the native recursive CTE features of YugabyteDB, those of PostgreSQL, or indeed those of any SQL database, to discover all the paths in any typically highly connected graph that's likely to be interesting in practice—especially, for example, the IMDb data.

  • It is straightforward to use ordinary SQL and stored procedure features to discover the shortest paths in even very large and highly connected graphs. As noted, you cannot use the recursive CTE for this purpose because it doesn't let you implement early pruning. Rather, you must use a table-based approach that implements the recursive CTE's algorithm explicitly by hand.

Implication of the stress-test experiment for the implementation of the computation of Bacon Numbers

The introduction to the section Using a recursive CTE to compute Bacon Numbers for actors listed in the IMDb mentions this data set:

  • imdb.small.txt: a... file with just a handful of performers (161), fully connected

It seems very likely that a fully connected undirected cyclic graph with 161 nodes, even if it isn't maximally connected will be highly connected—very many of the nodes will have edges to very many of the other nodes. It's therefore likely that the implementation of the computation of Bacon Numbers will have to use the "prune-true" approach.

  • Definition of "maximally connected graph"
  • The number of paths in a maximally connected graph grows very much faster than linearly with the number of nodes
  • The stress test experiment
    • Create some helpers
    • Create a script to do the stress-test kernel
    • Implement scripts to execute the stress-test kernel for each method for each of the numbers of nodes in the range of interest.
  • Implication of the stress-test experiment for the implementation of the computation of Bacon Numbers
Ask our community
  • Slack
  • Github
  • Forum
  • StackOverflow
Yugabyte
Contact Us
Copyright © 2017-2022 Yugabyte, Inc. All rights reserved.