【PostgreSQL】実行計画もろもろ

PostgreSQLの実行計画

PostgreSQLの実行計画を見るためにはSQLの冒頭にexplainをつけて実行するだけ。ここではパラレルクエリが実行されないように設定している。

set max_parallel_workers_per_gather to 0;
explain select * from public.orders;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..594632.88 rows=32434488 width=32)
(1 row)

Seq Scan on orders：シーケンシャルスキャンを実行。シーケンシャルスキャンをはインデックススキャンとは異なり、データを1つずつ順番に取り出して条件に一致するか確認する方法。
cost=0.00..594632.88：処理実行にかかるコスト。このコストを比較して実行計画を判断する。
0.00：1行目を取得するまでのコスト
594632.88：1すべての行を取得するまでのコスト
rows=32434488：出力する推定行数
width=32：行の平均バイト数

このテーブルには、user_idにインデックスがついている。

CREATE INDEX order_user_id_idx ON public.orders (user_id);
CREATE INDEX user_id_idx ON public.users (user_id);

\di
                   List of relations
 Schema |       Name        | Type  |  Owner   | Table  
--------+-------------------+-------+----------+--------
 public | order_user_id_idx | index | postgres | orders
 public | user_id_idx       | index | postgres | users

インデックスがついているuser_idを利用した場合、実行計画がどうなるのかみると、Bitmap Heap Scanが選択されている。

explain SELECT * FROM public.orders where user_id = '53605';
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Bitmap Heap Scan on orders  (cost=5.62..608.62 rows=153 width=32)
   Recheck Cond: ((user_id)::text = '53605'::text)
   ->  Bitmap Index Scan on order_user_id_idx  (cost=0.00..5.58 rows=153 width=0)
         Index Cond: ((user_id)::text = '53605'::text)
(4 rows)

user_idを利用し、branch_numでソートすると、Bitmap Index Scanが実行され、Bitmap Heap Scanでuser_idで一致する行を検索し、branch_numでソートしていることがわかる。->が深い位置から実行されている。

explain SELECT * FROM public.orders where user_id = '53605' order by branch_num;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Sort  (cost=614.17..614.55 rows=153 width=32)
   Sort Key: branch_num
   ->  Bitmap Heap Scan on orders  (cost=5.62..608.62 rows=153 width=32)
         Recheck Cond: ((user_id)::text = '53605'::text)
         ->  Bitmap Index Scan on order_user_id_idx  (cost=0.00..5.58 rows=153 width=0)
               Index Cond: ((user_id)::text = '53605'::text)
(6 rows)

ANALYZEオプションを付与すると、実際にクエリを実行した結果の情報が得られる。(cost....) (actual time...)を比較すると、計画と実際の乖離が確認できる。

EXPLAIN ANALYZE
SELECT *
FROM PUBLIC.ORDERS
WHERE USER_ID = '53605';

                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on orders  (cost=5.62..608.62 rows=153 width=32) (actual time=0.018..0.024 rows=27 loops=1)
   Recheck Cond: ((user_id)::text = '53605'::text)
   Heap Blocks: exact=4
   ->  Bitmap Index Scan on order_user_id_idx  (cost=0.00..5.58 rows=153 width=0) (actual time=0.014..0.014 rows=27 loops=1)
         Index Cond: ((user_id)::text = '53605'::text)
 Planning Time: 0.078 ms
 Execution Time: 0.044 ms
(7 rows)

ここが大きい場合、テーブルの更新があると統計情報が更新されていないなどが考えられるので、analyzeコマンドで統計情報を更新する。これにより、実行計画の乖離がなくなる。

analyze verbose public.orders;
INFO:  analyzing "public.orders"
INFO:  "orders": scanned 30000 of 270288 pages, containing 3600001 live rows and 0 dead rows; 30000 rows in sample, 32434569 estimated total rows
ANALYZE

他にも処理が遅くなるような原因として、更新処理などが増えてくると、インデックスが断片化されてしまう場合がある。そのような場合はreindexを利用する。

reindex index {index name};
reindex table {table name};
reindex database {database name};

order_user_id_idxをリインデックスする。

reindex index order_user_id_idx;
REINDEX

クラスタ化を検討する場合もある。インデックスのデータとテーブルデータの物理的な配置が近いほど、効果が出やすい。更新などの運用が増えると、どんどん配置が異なってしまう。インデックスのデータに合わせて配置し直すのが、clusterコマンド。

cluster {teble name} using {index name};
CLUSTER

MySQLのこと。

MySQLのことについてまとめているブログ。他人に見せる用でもなく、自分の勉強備忘録。検索インデックスも外してるので、辿りついた方・・・ようこそ。そんな大した情報ないですよ?!たまにアルゴリズムの練習も

【PostgreSQL】実行計画もろもろ

PostgreSQLの実行計画