Doris测试与实战
命令行操作
- ```shell
- ### wm_doris01上启动fe
- [doris01]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --daemon
- [doris01]$ jps
- 19929Jps
- 19871PaloFe
- [doris02]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010--daemon
- [doris03]$ sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010--daemon
-
-
- ### wm_doris01,wm_doris02,wm_doris03上启动be
- [doris01]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh --daemon
- [doris02]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh --daemon
- [doris03]$ sh /opt/module/doris/doris0.12.21/be/bin/start_be.sh --daemon
-
-
- [doris01]$ mysql -hdoris01 -P 9030 -uroot
- mysql>SET PASSWORD FOR 'root' =PASSWORD('niaoshu123456');
- mysql> exit;
- mysql -hdoris01 -P 9030 -uroot -niaoshu123456
-
-
- mysql> alter system add backend "doris01:9050";
- mysql> alter system add backend "doris02:9050";
- mysql> alter system add backend "doris03:9050";
- mysql> alter system add broker niaoshu_broker_hdfs "doris01:8000","doris02:8000","doris03:8000";
- mysql> alter system add follower "doris02:9010";
- mysql> alter system add observer "doris03:9010";
-
-
- ### 从节点首次启动
- sh /opt/module/doris/doris0.12.21/fe/bin/start_fe.sh --helper doris01:9010 --daemon[节点2,节点3]
-
- mysql>show proc '/backends'; ###查看Be状态
- mysql>show proc '/brokers'; ###查看Broker状态
- mysql>show proc '/frontends'; ###查看Fe状态
- ```
注意:
① 增加FE节点,FE分为Leader,Follower和Observer三种角色。默认一个集群只能有一个Leader,可以有多个Follower和Observer.其中Leader和Follower组成一个Paxos选择组,如果Leader宕机,则剩下的Follower会成为Leader,保证HA。Observer是负责同步Leader数据的不参与选举。如果只部署一个FE,则FE默认就是Leader。
② 在doris02再部署一台FE,doris03上部署Observer。–helper参数指定leader地址和端口号。
③ 增加BE节点,就像上面安装一样在mysql客户端,使用ALTER SYSTEM ADD BACKEND语句即可,相对应删除BE节点,使用==ALTER SYSTEM DECOMMISSION BACKEND”
be_host:be_heartbeat_service_port”。
- ```shell
- ###删除节点
- mysql> alter system drop follower "hadoop01:9010";
- mysql> alter system drop observer "hadoop01:9010";
-
-
- ## 删除对应的 FE 节点:
- [ALTER SYSTEM DROP FOLLOWER[OBSERVER] "fe_host:edit_log_port";]
- ### 提示:删除 Follower FE 时,确保最终剩余的 Follower(包括 Leader)节点为奇数
- #### 删除 BE 节点
- mysql> ALTER SYSTEM DECOMMISSION BACKEND "hadoop01:9050"
- [ALTER SYSTEM DECOMMISSION BACKEND "be_host:be_heartbeat_service_port";]
-
-
- ### borker删除
- mysql> ALTER SYSTEM DROP BROKER broker_name "hadoop01:9050";
- [ALTER SYSTEM DROP BROKER broker_name "broker_host:broker_ipc_port";]
- [ALTER SYSTEM DROP ALL BROKER broker_name;]
- ```
提示:ALTER SYSTEMDROP BACKEND “
be_host:be_heartbeat_service_port”;注意:DROP BACKEND 会直接删除该 BE,并且其上的数据将不能再恢复!!!所以我们强烈不推荐使用 DROP BACKEND 这种方式删除 BE 节点。当你使用这个语句时,会有对应的防误操作提示。
重点说明:当做HA是其他节点首次启动没有使用 –helper 的解决办法,把alive值为false的节点的doris-mete文件夹下的所有文件删除[建议留一个备份]
然后重新使用以下命令启动:
sh /opt/module/doris/fe/bin/start_fe.sh --helper doris01:9010 --daemon
建表
- ```sql
- create table niaoshu.niaoshu_snap (
- uid int comment '用户uid',
- tid int comment '老师id',
- aid varchar(32) comment 'aid'
- )duplicate key(uid)
- distributed by hash(uid) buckets 10;
-
-
- create table niaoshu.niaoshu_mid(
- uid int COMMENT '用户uid',
- k_id varchar(32) COMMENT '课程id',
- )duplicate key(uid)
- distributed by hash(uid) buckets 10;
- ```
流式导入数据
- ```sql
- curl --location-trusted -u root -H "label:123" -H "column_separator:,"
- -T naoshu.csv -X PUT http://doris01:8030/api/niaoshu/niaoshu/_stream_load
-
-
- curl --location-trusted -u root -H "label:niaoshu_mid1" -H "column_separator:,"
- -T niaoshu.csv-X PUT http://doris01:8030/api/niaoshu/niaoshu_mid/_stream_load
- ```
第三方导入方式
尖叫提示: doris中除了olap引擎,其他引擎不存储数据。
1.直接连接mysql中的数据表,建立一样的数据模型
- ```sql
- CREATE TABLE niaoshu.niaoshu_1(
- par_date int,
- a_id varchar(32) COMMENT 'aid',
- b_id varchar(32) COMMENT'bid',
- c_id varchar(32) COMMENT'cid',
- d_id varchar(32) COMMENT 'did'
- ENGINE=mysql
- PROPERTIES
- (
- "host" = "192.168.10.1",
- "port" = "3306",
- "user" = "root",
- "password" = "abcd123456",
- "database" = "niaoshu",
- "table" = "nioashu_mid"
- );
2.创建一个数据文件存储在HDFS上的 broker 外部表, 数据使用 “\t”分割,”\n” 换行
尖叫提示:==外部表只支持csv和textfile类型==
- ```sql
- CREATE EXTERNAL TABLE niaoshu.niaoshu_2(
- id VARCHAR(32) COMMENT 'id',
- a_type VARCHAR(32) COMMENT '学生课程类型',
- b_name VARCHAR(32) COMMENT '学生姓名'
- )ENGINE=broker
- PROPERTIES (
- "broker_name" = "niaoshu",
- "path" ="hdfs://hadoop:8020/user/hive/warehouse/wm_ods.db/niaoshu_test/*",
- "column_separator" = "\t",
- "line_delimiter" = "\n"
- )
- BROKER PROPERTIES (
- "username" = "hadoop",
- "password" = "abcd1234"
- )
- ```
统计hive中的数据;
以使用hive的外部表查找统计21条数据用时 1.78s。
3.使用brokers方式把数据导入doris中
建立动态分区表
- ```sql
- create table niaoshu.niaoshu3(
- uid int COMMENT '用户id',
- name varchar(526) COMMENT '昵称',
- phone varchar(32) COMMENT '手机号'
- )ENGINE=olap
- DUPLICATE KEY(par_date,uid)
- COMMENT 'hdfs中的数据导入doris-test'
- PARTITION BY RANGE(par_date)()
- DISTRIBUTED BY HASH(uid) BUCKETS 9
- PROPERTIES(
- "storage_medium" = "SSD",
- "dynamic_partition.time_unit" = "DAY",
- "dynamic_partition.end" = "7",
- "dynamic_partition.prefix" = "p",
- "dynamic_partition.buckets" = "64"
- );
-
- -- "dynamic_partition.start" = "-700",
- ```
尖叫提示:==属性中的字段必须发在建表中的最前面==
查看动态分区:
SHOW DYNAMIC PARTITION TABLES;
1.展示指定db下指定表的所有分区信息
SHOW PARTITIONS FROMexample_db.table_name;
2.展示指定db下指定表的指定分区的信息
SHOW PARTITIONS FROM example_db.table_nameWHERE PartitionName = "p1";
3.手动添加分区,需要先关闭动态分区:
ALTER TABLE dw_t_xby_wxuser_snap SET ("dynamic_partition.enable" ="false")
否则报错:Key columnsshould be a ordered prefix of the schema
分桶的字段必须是三个模型中的key。ex:DUPLICATE KEY
从hdfs 导入doris中, 6228517 ≈622W+ 用时2分钟左右,
plan_fragment_executor.cpp:555]Fragment 30b2646b7088479f-bd4d793a9b9dda65:(Active: 4s576ms, % non-child:0.00%)
– AverageThreadTokens: 1.00
– BytesReceived: 6.16 MB
– DeserializeRowBatchTimer: 21.636ms
– FirstBatchArrivalWaitTime: 0.000ns
– PeakReservation: 0
– PeakUsedReservation: 0
– RowsProduced: 140.09K
– SendersBlockedTimer: 0.000ns
– SendersBlockedTotalTimer(*): 0.000ns
BlockMgr:
– BlockWritesOutstanding: 0
– BlocksCreated: 0
– BlocksRecycled: 0
– BufferedPins: 0
– BytesWritten: 0
– MaxBlockSize: 8.00 MB
– MemoryLimit: 2.00 GB
– TotalBufferWaitTime: 0.000ns
– TotalEncryptionTime: 0.000ns
– TotalIntegrityCheckTime: 0.000ns
– TotalReadBlockTime: 0.000ns
DataBufferSender(dst_fragment_instance_id=30b2646b7088479f-bd4d793a9b9dda65):
EXCHANGE_NODE (id=8):(Active: 4s501ms, % non-child: 0.00%)
– ConvertRowBatchTime: 1.920ms
– MemoryUsed: 0
– RowsReturned: 140.09K
– RowsReturnedRate: 31.12 K/sec
使用show load查看load 任务,帮助命令:HELP SHOW LOAD;
定时每天导入hdfs数据进入doris中
- ```sql
- #!/bin/sh
-
- ### 每日执行生成唯一标签
- label_date=niaoshu_`date +%Y%m%d%H%M%S`
-
- export MYSQL_PWD='niaoshu123456'
-
- ### 用户拼接hadoop中的数据路径
- dt=`date -d '-1 day' +%Y%m%d`
-
- ### 用户写入doris中分区的时间
- par_date=`date -d '-1 day' +%Y-%m-%d`
-
- mysql -uroot -h doris01 -P 9030 -e"\
- use wm_dws_db;
-
- LOAD LABEL wm_dws_db.$label_date
- (
- DATA INFILE('hdfs://h_cluseter/user/hive/warehouse/naioshu.db/niaoshu3/par_date=$dt/*')
- INTO TABLE niaoshu3
- FORMAT AS 'orc'
- (userid,name,phone)
-
- SET(par_date=DATE_ADD(curdate(),interval -1day),uid=userid,name=name,phone=phone)
- )
-
- WITH BROKER 'niaoshu_broker_hdfs'
- (
- 'username'='niaoshu',
- 'password'='niaoshu123456',
- 'dfs.nameservices'='h_cluseter',
- 'dfs.ha.namenodes.emr-cluster'='nn1,nn2',
- 'dfs.namenode.rpc-address.emr-cluster.nn1'='hadoop01:8020',
- 'dfs.namenode.rpc-address.emr-cluster.nn2'='hadoop02:8020',
- 'dfs.client.failover.proxy.provider'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
- )
-
- PROPERTIES
- (
- 'timeout'='3600',
- 'max_filter_ratio'='0.001'
- );
- "
- ```
查看具体的执行效果
- $ mysql -hdoris01 -P 9030 -uroot-pniaoshu123456
-
- mysql> use wm_dws_db
-
- mysql> SHOW LOAD;
4.从kafka中导入数据进入到doris
①创建表
- ```sql
- CREATE TABLE niaoshu.niaoshu4(
- par_date date COMMENT '触发日期',
- uid bigint COMMENT '游客默认0',
- a_name varchar(32) COMMENT '模块名',
- b_time datetime COMMENT '触发时间',
- c_os varchar(32) COMMENT '操作系统',
- c_ip varchar(32) COMMENT '客户端地址'
- )
- DUPLICATE KEY(par_date,uid,a_name)
- PARTITION BY RANGE(par_date)()
- DISTRIBUTED BY HASH(par_date,uid,a_name) BUCKETS 32
- PROPERTIES(
- "storage_medium" = "SSD",
- "dynamic_partition.enable" = "true",
- "dynamic_partition.time_unit" = "DAY",
- "dynamic_partition.end" = "3",
- "dynamic_partition.prefix" = "p",
- "dynamic_partition.buckets" ="12",
- "replication_num" = "3"
- );
- ```
②创建导入任务
- ```sql
- CREATE ROUTINE LOADniaoshu.niaoshu04002 ON niaoshu04
- COLUMNS(p_tmp,uid,a_name,b_time,c_os,c_ip,par_date=from_unixtime(cast(p_tmpas int),'%Y-%m-%d'),event_time=from_unixtime(cast(p_tmp as int))),
- PROPERTIES
- (
- "desired_concurrent_number"="24",
- "strict_mode"= "false",
- "max_batch_interval"= "20",
- "max_batch_rows" = "300000",
- "max_batch_size" = "209715200",
- "format" = "json",
- "jsonpaths"="[\"$.date_time\",\"$.uid\",\"$.a_name\",\"$.b_time\",\"$.c_os\",\"$.c_ip\"]"
- )
- FROM KAFKA
- (
- "kafka_broker_list"="hadoop01:9092,hadoop02:9092,hadoop03:9092",
- "kafka_topic" ="testjson",
- "property.group.id"="niaoshu_data_group_1",
- "property.kafka_default_offsets" = "OFFSET_END",
- "property.enable.auto.commit"="true",
- "property.client.id"="niaoshu_id"
- );
- ```
查看加载
- ```sql
- SHOW ALL ROUTINE LOAD;
- ###具体命令和示例可以通过
- HELP SHOW ROUTINE LOAD TASK;
- HELP STOP ROUTINE LOAD;
- HELP PAUSE ROUTINE LOAD;
- HELP RESUME ROUTINE LOAD;
-
- ####修改表的字段类型
- ### 添加表字段
- alter table niaoshu add columnstudy_time varchar(64) comment '学习时间'
- ####重启任务
- RESUME ROUTINE LOAD for niaoshu04002;
-
- ###暂停任务
- PAUSE ROUTINE LOAD;
- ```
kafka to doris 成功
注:kafka to doris 数据表有近7000万数据的实时查询。
Hue中集成doris,安装mysql添加方式添加即可。