GP SQL 优化及执行计划相关记录

优化器

greenplum 6 有两种sql 优化器

ORCA
Postgre

默认优先使用ORCA，无法使用ORCA使用pg的sql优化。ORCA可以关闭不用，通过show optimizer （on代表打开）进行查看

 test=# show optimizer;
 optimizer 
-----------
 on
(1 row)
1
2
3
4
5

ORCA打开时候需要保证optimizer_analyze_root_partition =on

test=# show optimizer_analyze_root_partition;
 optimizer_analyze_root_partition 
----------------------------------
 on
(1 row)
1
2
3
4
5

可以在数据库、role、session级别设置打开和关闭

 set optimizer to on;        
1

ORCA 不支持的SQL查询特性

可以参考
http://docs-cn.greenplum.org/v6/admin_guide/query/topics/query-piv-opt-limitations.html

是否使用了ORCA可以通过explain查看。

Explain

Explain分两种：
1、不执行 explain
2、执行sql explain analyze

test=# explain select * from foo where 2=1;                      
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0)
   One-Time Filter: false
 Optimizer: Postgres query optimizer
(3 rows)

test=# explain analyze select * from foo where 2=1;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
   One-Time Filter: false
 Planning time: 1.032 ms
   (slice0)    Executor memory: 89K bytes.
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution time: 0.255 ms
(7 rows)

test=# select * from foo where 2=1;
 fid | ftype | fdate 
-----+-------+-------
(0 rows)

test=# set optimizer to on;                                                           
SET
test=# explain analyze select * from foo where 2=1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=16) (actual time=0.027..0.027 rows=0 loops=1)
   ->  Result  (cost=0.00..0.00 rows=0 width=16) (actual time=0.010..0.010 rows=0 loops=1)
         One-Time Filter: false
 Planning time: 13.894 ms
   (slice0)    Executor memory: 24K bytes.
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 0.082 ms
(8 rows)

test=# 

test=# select * from foo where 2=1;
 fid | ftype | fdate 
-----+-------+-------
(0 rows)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Analyze

GP的analyze 分为
1、分析整个db ：不带参数
2、分析表（默认分析root partition）
3、分析表内指定字段

test=# analyze rootpartition foo;
ANALYZE
test=# analyze  foo;             
ANALYZE
test=# analyze  foo(fid);
ANALYZE
1
2
3
4
5
6

一般建议在空闲时候定时分析表。
以下gp_autostats_mode的内容来自https://blog.csdn.net/u012948976/article/details/52695397

在postgresql.conf中有控制自动收集的参数gp_autostats_mode设置，gp_autostats_mode三个值：none、no_change、on_no_stats（默认）
none：禁止收集统计信息
on change：当一条DML执行后影响的行数超过gp_autostats_on_change_threshold参数指定的值时，会执行完这条DML后再自动执行一个analyze 的操作来收集表的统计信息。
no_no_stats：当使用create talbe as select 、insert 、copy时，如果在目标表中没有收集过统计信息，那么会自动执行analyze 来收集这张表的信息。gp默认使用on_no_stats，对数据库的消耗比较小，但是对于不断变更的表，数据库在第一次收集统计信息之后就不会再收集了。需要人为定时执行analyze.
如果有大量的运行时间在1分钟以下的SQL，你会发现大量的时间消耗在收集统计信息上。为了降低这一部分的消耗，可以指定对某些列不收集统计信息 alter table test alter note SET STATISTICS 0;

————————————————
版权声明：本文为CSDN博主「asin929」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/u012948976/article/details/52695397

查看表及index结构

分析sql需要关注表及index，可以通过\d 查看。

test=# CREATE INDEX foo_idx on foo(fid);
CREATE INDEX
test=# \d foo;
Append-Only Columnar Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 fid    | integer | 
 ftype  | text    | 
 fdate  | date    | 
Checksum: t
Indexes:
    "foo_idx" btree (fid)
Number of child tables: 13 (Use \d+ to list them.)
Distributed by: (fid)
Partition by: (fdate)

test=# \d foo_idx;
    Index "public.foo_idx"
 Column |  Type   | Definition 
--------+---------+------------
 fid    | integer | fid
btree, for table "public.foo"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

相关阅读:
全网最全谷粒商城记录_06、环境-使用vagrant快速创建linux虚拟机——1、VirtualBox下载安装
 Vue学习之--------路由的query、params参数、路由命名（3）（2022/9/5）
GaussDB数据库SQL系列-表连接（JOIN）
路径规划算法 - 求解最短路径 - Dijkstra(迪杰斯特拉)算法
 BIRCH算法全解析：从原理到实战
 String的compareTo()方法使用场景介绍及全量ASCII 码表（完整版）
day36:网编day3，TCP、UDP模型
 ELK Kibana搜索框模糊搜索包含不包含
 python基础项目实战-俄罗斯方块
 体育场馆LED显示屏的分类及应用
原文地址：https://blog.csdn.net/weixin_40455124/article/details/126337244