Tuning queries in YugabyteDB
A YugabyteDB cluster consists of multiple nodes running the YB-TServer server process. Due to its distributed nature, tuning a YugabyteDB database requires keeping in mind that, unlike most common databases where only a local machine influences a database query, data is stored across all the nodes of the cluster. This section provides an introduction to tuning YugabyteDB and the tools available.
Before you begin
Before trying to optimize individual statements, first make sure the YugabyteDB cluster is running optimally:
- Verify that all nodes that are supposed to be running are indeed running.
- Verify that the YB-TServer and, where applicable, YB-Master processes are running on the nodes.
To view the nodes and servers that make up your cluster, use the yb-admin
command to request the master and tablet servers. For example:
$ ./bin/yb-admin -init_master_addrs=$(hostname):7100 list_all_masters
Master UUID RPC Host/Port State Role
a637b88dfc0c4476862ca79872d763d7 172.158.22.229:7100 ALIVE LEADER
968eb4a7a2e24c3ab95929d3a888ed05 172.158.39.23:7100 ALIVE FOLLOWER
44e13e096955412a88c6ffd65a1e241c 172.158.50.212:7100 ALIVE FOLLOWER
$ ./bin/yb-admin -init_master_addrs=$(hostname):7100 list_all_tablet_servers
Tablet Server UUID RPC Host/Port Heartbeat delay Status Reads/s
d80150e0eeda4477a231968440dd89a9 172.158.50.212:9100 0.20s ALIVE ..
721661fd5d2044179e03707a862aa578 172.158.22.229:9100 0.39s ALIVE ..
54e17c914c0f4679aa7b07a9c7a8ddf5 172.158.39.23:9100 0.48s ALIVE ..
Next, make sure enough disk space is available, elementary components such as CPU, disk, and the network do not report errors, and the operating system does not report any malfunction.
For more information on troubleshooting cluster- and node-level issues, refer to Troubleshoot YugabyteDB.
Once you know the entire cluster is running correctly, you can move to statement tuning.
Tuning in YSQL
To investigate a specific apparently slow running query, you first need to determine which node the query is running on. To identify the node on which the query is running, log on to the nodes and use the pg_stat_activity
view to see running queries.
Note
Remember that even for a single query running on one node, the whole cluster can show activity, because the storage layer is distributed over all nodes.Once you identify the correct node, YSQL provides two views you can use to identify SQL statements and their performance characteristics:
-
pg_stat_activity
provides an overview, including SQL, of current activity. Apart from the query text, it does not provide information about the SQL executed; rather it shows the current state of the PostgreSQL backend. -
pg_stat_statements
provides historic information about executed SQL statements. Currently, no 'block'/'blk' information is recorded (which is logical and physical I/O information), because the PostgreSQL I/O codepath is not used.
Which view you use for performance tuning depends on the performance issue and whether you can witness the performance problem happening live, or if it was a performance problem in the past.
Note that identical queries might not perform identically because of differences in data.
Once you identify a problematic SQL statement, the next step is to look at the execution plan using EXPLAIN
. An execution plan is the order and method of execution to perform the work requested in the SQL statement. This plan is generated by the database planner.
YugabyteDB uses PostgreSQL’s cost-based optimizer, which estimates the costs of each possible execution plan for an SQL statement. The planner calculates the cheapest plan to execute a query, assuming that is also the fastest and the best plan, based on statistics derived from the database table and supporting structures like secondary indexes. The execution plan with the lowest cost finally is executed.
Note
Currently, YugabyteDB does not perform size-specific costing, and assumes every plan generates a thousand rows, which translates to a cost of a thousand.You can use EXPLAIN
in two modes:
-
The default mode shows the plan that the planner created and the predicted cost.
-
EXPLAIN ANALYZE
mode executes the query, measures the time and actual work done, and adds these statistics to the execution plan as 'actual'.
Both the predicted and actual statistics have their uses. The predicted statistics are what the planner uses, and assuming the statistics don't change, the planner will come up with the same execution plan the next time the query is run, even if it caused performance issues in the past. The actual statistics show how much and what work was actually performed, and thus are useful to see the effectiveness of the execution plan.
Tip
Complex execution plans can be difficult to read or understand. Use online tools to calculate and visualize execution plan timelines. For example, https://explain.dalibo.com.Performance tuning tools in YugabyteDB
YugabyteDB provides a number of familiar tools you can use for tuning queries.
Find slow queries
Use the pg_stat_statements
extension to get statistics on past queries. Using pg_stat_statements
, you can investigate queries by userid and dbid, calls, rows, and min, max, mean, standard deviation and total time.
The pg_stat_statements
extension module is installed by default, but must be enabled for a database before you can query the pg_stat_statements
view.
CREATE EXTENSION if not exists pg_stat_statements;
To get the output of pg_stat_statements
in JSON format, visit https://<yb-tserver-ip>:13000/statements
in your web browser, where <yb-tserver-ip>
is the IP address of any YB-TServer node in your cluster.
For more information, refer to Get query statistics using pg_stat_statements
.
View live queries
Use the pg_stat_activity
view to get information on currently running tasks. Using pg_stat_activity
you can identify inactive, active, and long time active sessions, and get process information and the current query.
To get the output of pg_stat_activity
in JSON format, visit https://<yb-tserver-ip>:13000/rpcz
in your web browser, where <yb-tserver-ip>
is the IP address of any YB-TServer node in your cluster.
For more information, refer to Viewing live queries with pg_stat_activity
.
View plans with EXPLAIN
Like PostgreSQL, YugabyteDB provides the EXPLAIN
statement to show the query execution plan generated by YSQL for a given SQL statement. Using EXPLAIN
, you can discover where in the query plan the query is spending most of its time, and using this information, decide on the best approach for improving query performance. This could include strategies such as adding an index or changing index sort order.
For more information, refer to Analyzing queries with EXPLAIN.
Advanced tools
Use the following tools to log slow-running queries and optimize queries using hint plans.
Log slow queries
You can set the --ysql_log_min_duration_statement
flag to help track down slow queries. When configured, YugabyteDB logs the duration of each completed SQL statement that runs the specified duration (in milliseconds) or longer. (Setting the value to 0 prints all statement durations.)
$ ./bin/yb-tserver --ysql_log_min_duration_statement 1000
Example log output:
2021-05-26 21:13:49.701 EDT [73005] LOG: duration: 34.378 ms statement: SELECT c.oid,
n.nspname,
c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(products)$'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
Results are written to the current postgres*log
file. For information on the YB-TServer logs, refer to YB-TServer logs.
Note
Depending on the database and the work being performed, long-running queries don't necessarily need to be optimized.
Ensure that the threshold is high enough so that you don't flood the postgres*log
log files.
For more information on flags for configuring the YB-TServer server, refer to YSQL Flags.
Use a hint plan
YugabyteDB uses the PostgreSQL pg_hint_plan
extension to control query execution plans with hints.
pg_hint_plan
makes it possible to influence the query planner using so-called "hints", which are C-style comments that use a special syntax.
Note
To usepg_hint_plan
effectively, you need a thorough knowledge of how your application will be deployed. Hint plans also need to be revisited when the database grows or the deployment changes to ensure that the plan is not limiting performance rather than optimizing it.
For more information, refer to Optimizing YSQL queries using pg_hint_plan.