Greenplum-表的分布策略

Greenplum中所有的表都是必须分布存放的，这样可以充分利用MPP的并发特性。
在创建表时可以指定不同的分布策略，包括三种分布策略：HASH分布、随机分布和复制表。

HASH分布

概念：选择一个列或多个列作为数据表的分布键，通过hash计算，将插入的数据路由到特定的segment上。
HASH分布是默认的分布策略。也可通过指定DISTRIBUTED BY语法来指定使用HASH分布。
注：当建表时未定义分布键时，如果表有主键，使用主键字段作为默认的分布键；如果表上没有主键，默认按照第一个字段来分布。

示例：

=# create table dist1(a int, b text);
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'a' as the Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
=# \d dist1
Table "public.dist1"
Column |  Type   | Modifiers
--------+---------+-----------
a      | integer |
b      | text    |
Distributed by: (a)
=# create table dist1(a int, b text, primary key (b));
CREATE TABLE
=# \d dist1
Table "public.dist1"
Column |  Type   | Modifiers
--------+---------+-----------
a      | integer |
b      | text    | not null
Indexes:
"dist1_pkey" PRIMARY KEY, btree (b)
Distributed by: (b)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

随机分布

概念：数据随机分散在每一个节点中，可以保证数据平均分布，但是在执行 SQL 的过程中，关联等操作都需要将数据重分布，性能较差。
通过指定DISTRIBUTED RANDOMLY语法来指定使用随机分布。
注：如果表上有主键，不能创建为随机分布表。
示例：

=# create table dist1(a int, b text, primary key (a)) distributed randomly;
ERROR:  PRIMARY KEY and DISTRIBUTED RANDOMLY are incompatible
=# create table dist1(a int, b text) distributed randomly;
CREATE TABLE
=# \d dist1
Table "public.dist1"
Column |  Type   | Modifiers
--------+---------+-----------
a      | integer |
b      | text    |
Distributed randomly
1
2
3
4
5
6
7
8
9
10
11

注：如果想将默认的分布策略设置为随机分布，我们可以通过设置gp_create_table_random_default_distribution值为on来实现，默认值为off。
gpconfig --show gp_create_table_random_default_distribution

Values on all segments are consistent
GUC          : gp_create_table_random_default_distribution
Master  value: off
Segment value: off
1
2
3
4

复制表

概念：每一条记录都会分布到整个集群的所有Instance上，仅用于小表。
通过指定DISTRIBUTED REPLICATED语法来指定使用复制表。

示例：

=# create table dist1(a int, b text, primary key (a)) distributed replicated;
CREATE TABLE
qianbase=# \d dist1
Table "public.dist1"
Column |  Type   | Modifiers
--------+---------+-----------
a      | integer | not null
b      | text    |
Indexes:
"dist1_pkey" PRIMARY KEY, btree (a)
Distributed Replicated
1
2
3
4
5
6
7
8
9
10
11

相关阅读:
带你了解LINUX反弹Shell的各种姿势(超详细)
【ChatGPT散文篇】ChatGPT-清华大学的讲座要点
Linux安装MySQL8.0+及自启
CDN：网站性能的加速器 —— 选择最适合你的CDN服务商
GBase 8c V3.0.0数据类型——密态等值函数
接口自动化之测试数据动态生成并替换
React后台管理（十三）-- 页面常用hook封装 --- useTable封装
最快的开源UDP传输工具：Kcptun
用深度强化学习来玩Chrome小恐龙快跑
17.11 JDBC 2.0 操作(血干JAVA系类)

原文地址：https://blog.csdn.net/Post_Yuan/article/details/126868191