• 尚硅谷大数据项目《在线教育之离线数仓》笔记007


    视频地址:尚硅谷大数据项目《在线教育之离线数仓》_哔哩哔哩_bilibili

    目录

    第12章 报表数据导出

    P112

    01、创建数据表

    02、修改datax的jar包

    03、ads_traffic_stats_by_source.json文件

    P113

    P114

    P115

    P116

    P117

    P118

    P119

    P120

    P121

    P122【122_在线教育数仓开发回顾 04:23】


    第12章 报表数据导出

    P112

    01、创建数据表
    1. # 第12章 报表数据导出
    2. CREATE DATABASE IF NOT EXISTS edu_report DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    3. # 12.1.2 创建表
    4. # 01)各来源流量统计
    5. DROP TABLE IF EXISTS ads_traffic_stats_by_source;
    6. CREATE TABLE ads_traffic_stats_by_source
    7. (
    8. `dt` DATETIME COMMENT '统计日期',
    9. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    10. `source_id` VARCHAR(255) COMMENT '引流来源id',
    11. `source_site` VARCHAR(255) COMMENT '引流来源名称',
    12. `uv_count` BIGINT COMMENT '访客人数',
    13. `avg_duration_sec` BIGINT COMMENT '会话平均停留时长,单位为秒',
    14. `avg_page_count` BIGINT COMMENT '会话平均浏览页面数',
    15. `sv_count` BIGINT COMMENT '会话数',
    16. `bounce_rate` DECIMAL(16, 2) COMMENT '跳出率',
    17. PRIMARY KEY (`dt`, `recent_days`, `source_id`)
    18. ) COMMENT '各引流来源流量统计';
    19. # 02)页面浏览路径分析
    20. DROP TABLE IF EXISTS ads_traffic_page_path;
    21. CREATE TABLE ads_traffic_page_path
    22. (
    23. `dt` DATETIME COMMENT '统计日期',
    24. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    25. `source` VARCHAR(255) COMMENT '跳转起始页面id',
    26. `target` VARCHAR(255) COMMENT '跳转终到页面id',
    27. `path_count` BIGINT COMMENT '跳转次数',
    28. PRIMARY KEY (`dt`, `recent_days`, `source`, `target`)
    29. ) COMMENT '页面浏览路径分析';
    30. # 03)各引流来源销售状况统计
    31. DROP TABLE IF EXISTS ads_traffic_sale_stats_by_source;
    32. CREATE TABLE ads_traffic_sale_stats_by_source
    33. (
    34. `dt` DATETIME COMMENT '统计日期',
    35. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    36. `source_id` VARCHAR(255) COMMENT '引流来源id',
    37. `source_site` VARCHAR(255) COMMENT '引流来源名称',
    38. `order_total_amount` DECIMAL(16, 2) COMMENT '销售额',
    39. `order_user_count` BIGINT COMMENT '下单用户数',
    40. `pv_visitor_count` BIGINT COMMENT '引流访客数',
    41. `convert_rate` DECIMAL(16, 2) COMMENT '转化率',
    42. PRIMARY KEY (`dt`, `recent_days`, `source_id`)
    43. ) COMMENT '各引流来源销售状况统计';
    44. # 04)用户变动统计
    45. DROP TABLE IF EXISTS ads_user_user_change;
    46. CREATE TABLE ads_user_user_change
    47. (
    48. `dt` DATETIME COMMENT '统计日期',
    49. `user_churn_count` BIGINT COMMENT '流失用户数',
    50. `user_back_count` BIGINT COMMENT '回流用户数',
    51. PRIMARY KEY (`dt`)
    52. ) COMMENT '用户变动统计';
    53. # 05)用户留存率
    54. DROP TABLE IF EXISTS ads_user_user_retention;
    55. CREATE TABLE ads_user_user_retention
    56. (
    57. `dt` DATETIME COMMENT '统计日期',
    58. `create_date` VARCHAR(255) COMMENT '用户新增日期',
    59. `retention_day` INT COMMENT '截至当前日期留存天数',
    60. `retention_count` BIGINT COMMENT '留存用户数量',
    61. `new_user_count` BIGINT COMMENT '新增用户数量',
    62. `retention_rate` DECIMAL(16, 2) COMMENT '留存率',
    63. PRIMARY KEY (`dt`, `create_date`, `retention_day`)
    64. ) COMMENT '用户留存率';
    65. # 06)用户新增活跃统计
    66. DROP TABLE IF EXISTS ads_user_user_stats;
    67. CREATE TABLE ads_user_user_stats
    68. (
    69. `dt` DATETIME COMMENT '统计日期',
    70. `recent_days` BIGINT COMMENT '最近n日,1:最近1日,7:最近7日,30:最近30日',
    71. `new_user_count` BIGINT COMMENT '新增用户数',
    72. `active_user_count` BIGINT COMMENT '活跃用户数',
    73. PRIMARY KEY (`dt`, `recent_days`)
    74. ) COMMENT '用户新增活跃统计';
    75. # 07)用户行为漏斗分析
    76. DROP TABLE IF EXISTS ads_user_user_action;
    77. CREATE TABLE ads_user_user_action
    78. (
    79. `dt` DATETIME COMMENT '统计日期',
    80. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    81. `home_count` BIGINT COMMENT '浏览首页人数',
    82. `good_detail_count` BIGINT COMMENT '浏览商品详情页人数',
    83. `cart_count` BIGINT COMMENT '加入购物车人数',
    84. `order_count` BIGINT COMMENT '下单人数',
    85. `payment_count` BIGINT COMMENT '支付人数',
    86. PRIMARY KEY (`dt`, `recent_days`)
    87. ) COMMENT '用户行为漏斗分析';
    88. # 08)新增交易用户统计
    89. DROP TABLE IF EXISTS ads_user_new_buyer_stats;
    90. CREATE TABLE ads_user_new_buyer_stats
    91. (
    92. `dt` DATETIME COMMENT '统计日期',
    93. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    94. `new_order_user_count` BIGINT COMMENT '新增下单人数',
    95. `new_payment_user_count` BIGINT COMMENT '新增支付人数',
    96. PRIMARY KEY (`dt`, `recent_days`)
    97. ) COMMENT '新增交易用户统计';
    98. # 09)各年龄段下单用户数
    99. DROP TABLE IF EXISTS ads_user_order_user_count_by_age_group;
    100. CREATE TABLE ads_user_order_user_count_by_age_group
    101. (
    102. `dt` DATETIME COMMENT '统计日期',
    103. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    104. `age_group` VARCHAR(255) COMMENT '年龄段,18岁及以下、19-24岁、25-29岁、30-34岁、35-39岁、40-49岁、50岁及以上',
    105. `order_user_count` BIGINT COMMENT '下单人数',
    106. PRIMARY KEY (`dt`, `recent_days`, `age_group`)
    107. ) COMMENT '各年龄段下单用户数统计';
    108. # 10)各类别课程交易统计
    109. DROP TABLE IF EXISTS ads_course_trade_stats_by_category;
    110. CREATE TABLE ads_course_trade_stats_by_category
    111. (
    112. `dt` DATETIME COMMENT '统计日期',
    113. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    114. `category_id` VARCHAR(255) COMMENT '类别id',
    115. `category_name` VARCHAR(255) COMMENT '类别名称',
    116. `order_count` BIGINT COMMENT '订单数',
    117. `order_user_count` BIGINT COMMENT '订单人数' ,
    118. `order_amount` DECIMAL(16, 2) COMMENT '下单金额',
    119. PRIMARY KEY (`dt`, `recent_days`, `category_id`)
    120. ) COMMENT '各类别课程交易统计';
    121. # 11)各学科课程交易统计
    122. DROP TABLE IF EXISTS ads_course_trade_stats_by_subject;
    123. CREATE TABLE ads_course_trade_stats_by_subject
    124. (
    125. `dt` DATETIME COMMENT '统计日期',
    126. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    127. `subject_id` VARCHAR(255) COMMENT '学科id',
    128. `subject_name` VARCHAR(255) COMMENT '学科名称',
    129. `order_count` BIGINT COMMENT '订单数',
    130. `order_user_count` BIGINT COMMENT '订单人数' ,
    131. `order_amount` DECIMAL(16, 2) COMMENT '下单金额',
    132. PRIMARY KEY (`dt`, `recent_days`, `subject_id`)
    133. ) COMMENT '各学科课程交易统计';
    134. # 12)各课程交易统计
    135. DROP TABLE IF EXISTS ads_course_trade_stats_by_course;
    136. CREATE TABLE ads_course_trade_stats_by_course
    137. (
    138. `dt` DATETIME COMMENT '统计日期',
    139. `recent_days` BIGINT COMMENT '最近天数,1:最近 1 天,7:最近 7天,30:最近 30 天',
    140. `course_id` VARCHAR(255) COMMENT '课程id',
    141. `course_name` VARCHAR(255) COMMENT '课程名称',
    142. `order_count` BIGINT COMMENT '下单数',
    143. `order_user_count` BIGINT COMMENT '下单人数',
    144. `order_amount` DECIMAL(16, 2) COMMENT '下单金额',
    145. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    146. ) COMMENT '各课程交易统计';
    147. # 13)各课程评价统计
    148. DROP TABLE IF EXISTS ads_course_review_stats_by_course;
    149. CREATE TABLE ads_course_review_stats_by_course
    150. (
    151. `dt` DATETIME COMMENT '统计日期',
    152. `recent_days` BIGINT COMMENT '最近天数,1:最近 1 天,7:最近 7 天,30:最近 30 天',
    153. `course_id` VARCHAR(255) COMMENT '课程id',
    154. `course_name` VARCHAR(255) COMMENT '课程名称',
    155. `avg_stars` BIGINT COMMENT '用户平均评分',
    156. `review_user_count` BIGINT COMMENT '评价用户数',
    157. `praise_rate` DECIMAL(16, 2) COMMENT '好评率',
    158. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    159. ) COMMENT '各课程评价统计';
    160. # 14)各分类课程试听留存统计
    161. DROP TABLE IF EXISTS ads_sample_retention_stats_by_category;
    162. CREATE TABLE ads_sample_retention_stats_by_category
    163. (
    164. `dt` DATETIME COMMENT '统计日期',
    165. `retention_days` BIGINT COMMENT '留存天数,1-7 天',
    166. `category_id` VARCHAR(255) COMMENT '分类id',
    167. `category_name` VARCHAR(255) COMMENT '分类名称',
    168. `sample_user_count` BIGINT COMMENT '试听人数',
    169. `retention_rate` DECIMAL(16, 2) COMMENT '试听留存率',
    170. PRIMARY KEY (`dt`, `retention_days`, `category_id`)
    171. ) COMMENT '各分类课程试听留存统计';
    172. # 15)各学科试听留存统计
    173. DROP TABLE IF EXISTS ads_sample_retention_stats_by_subject;
    174. CREATE TABLE ads_sample_retention_stats_by_subject
    175. (
    176. `dt` DATETIME COMMENT '统计日期',
    177. `retention_days` BIGINT COMMENT '留存天数,1-7 天',
    178. `subject_id` VARCHAR(255) COMMENT '学科id',
    179. `subject_name` VARCHAR(255) COMMENT '学科名称',
    180. `sample_user_count` BIGINT COMMENT '试听人数',
    181. `retention_rate` DECIMAL(16, 2) COMMENT '试听留存率',
    182. PRIMARY KEY (`dt`, `retention_days`, `subject_id`)
    183. ) COMMENT '各学科试听留存统计';
    184. # 16)各课程试听留存统计
    185. DROP TABLE IF EXISTS ads_sample_retention_stats_by_course;
    186. CREATE TABLE ads_sample_retention_stats_by_course
    187. (
    188. `dt` DATETIME COMMENT '统计日期',
    189. `retention_days` BIGINT COMMENT '留存天数,1-7 天',
    190. `course_id` VARCHAR(255) COMMENT '课程id',
    191. `course_name` VARCHAR(255) COMMENT '课程名称',
    192. `sample_user_count` BIGINT COMMENT '试听人数',
    193. `retention_rate` DECIMAL(16, 2) COMMENT '试听留存率',
    194. PRIMARY KEY (`dt`, `retention_days`, `course_id`)
    195. ) COMMENT '各课程试听留存统计';
    196. # 17)交易综合指标
    197. DROP TABLE IF EXISTS ads_trade_stats;
    198. CREATE TABLE ads_trade_stats
    199. (
    200. `dt` DATETIME COMMENT '统计日期',
    201. `recent_days` BIGINT COMMENT '最近天数,1:最近1日,7:最近7天,30:最近30天',
    202. `order_total_amount` DECIMAL(16, 2) COMMENT '订单总额,GMV',
    203. `order_count` BIGINT COMMENT '订单数',
    204. `order_user_count` BIGINT COMMENT '下单人数',
    205. PRIMARY KEY (`dt`, `recent_days`)
    206. ) COMMENT '交易综合指标';
    207. # 18)各省份交易统计
    208. DROP TABLE IF EXISTS ads_trade_order_by_province;
    209. CREATE TABLE ads_trade_order_by_province
    210. (
    211. `dt` DATETIME COMMENT '统计日期',
    212. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    213. `province_id` VARCHAR(10) COMMENT '省份id',
    214. `province_name` VARCHAR(30) COMMENT '省份名称',
    215. `region_id` VARCHAR(30) COMMENT '大区id',
    216. `area_code` VARCHAR(255) COMMENT '地区编码',
    217. `iso_code` VARCHAR(255) COMMENT '国际标准地区编码',
    218. `iso_code_3166_2` VARCHAR(255) COMMENT '国际标准地区编码',
    219. `order_count` BIGINT COMMENT '订单数' ,
    220. `order_user_count` BIGINT COMMENT '下单人数',
    221. `order_total_amount` DECIMAL(16, 2) COMMENT '订单金额',
    222. PRIMARY KEY (`dt`, `recent_days`, `province_id`, `region_id`, `area_code`, `iso_code`, `iso_code_3166_2`)
    223. ) COMMENT '各省份交易统计';
    224. # 19)各试卷平均统计
    225. DROP TABLE IF EXISTS ads_examination_paper_avg_stats;
    226. CREATE TABLE ads_examination_paper_avg_stats
    227. (
    228. `dt` DATETIME COMMENT '统计日期',
    229. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    230. `paper_id` VARCHAR(255) COMMENT '试卷 id',
    231. `paper_title` VARCHAR(255) COMMENT '试卷名称',
    232. `avg_score` DECIMAL(16, 2) COMMENT '试卷平均分',
    233. `avg_during_sec` BIGINT COMMENT '试卷平均时长',
    234. `user_count` BIGINT COMMENT '试卷用户数',
    235. PRIMARY KEY (`dt`, `recent_days`, `paper_id`)
    236. ) COMMENT '各试卷平均统计';
    237. # 20)最近 1/7/30 日各试卷成绩分布
    238. DROP TABLE IF EXISTS ads_examination_course_avg_stats;
    239. CREATE TABLE ads_examination_course_avg_stats
    240. (
    241. `dt` DATETIME COMMENT '统计日期',
    242. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    243. `course_id` VARCHAR(255) COMMENT '课程id',
    244. `course_name` VARCHAR(255) COMMENT '课程名称',
    245. `avg_score` DECIMAL(16, 2) COMMENT '平均分',
    246. `avg_during_sec` BIGINT COMMENT '平均时长',
    247. `user_count` BIGINT COMMENT '用户数',
    248. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    249. ) COMMENT '各课程考试相关指标';
    250. # 21)最近 1/7/30 日各试卷分数分布统计
    251. DROP TABLE IF EXISTS ads_examination_user_count_by_score_duration;
    252. CREATE TABLE ads_examination_user_count_by_score_duration
    253. (
    254. `dt` DATETIME COMMENT '统计日期',
    255. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    256. `paper_id` VARCHAR(255) COMMENT '试卷 id',
    257. `score_duration` VARCHAR(255) COMMENT '分数区间',
    258. `user_count` BIGINT COMMENT '各试卷各分数区间用户数',
    259. PRIMARY KEY (`dt`, `recent_days`, `paper_id`, `score_duration`)
    260. ) COMMENT '各试卷分数分布统计';
    261. # 22)最近 1/7/30 日各题目正确率
    262. DROP TABLE IF EXISTS ads_examination_question_accuracy;
    263. CREATE TABLE ads_examination_question_accuracy
    264. (
    265. `dt` DATETIME COMMENT '统计日期',
    266. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    267. `question_id` VARCHAR(255) COMMENT '题目 id',
    268. `accuracy` DECIMAL(16, 2) COMMENT '题目正确率',
    269. PRIMARY KEY (`dt`, `recent_days`, `question_id`)
    270. ) COMMENT '各题目正确率';
    271. # 23)单章视频播放情况统计
    272. DROP TABLE IF EXISTS ads_learn_play_stats_by_chapter;
    273. CREATE TABLE ads_learn_play_stats_by_chapter
    274. (
    275. `dt` DATETIME COMMENT '统计日期',
    276. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    277. `chapter_id` VARCHAR(30) COMMENT '章节 id',
    278. `chapter_name` VARCHAR(200) COMMENT '章节名称',
    279. `video_id` VARCHAR(255) COMMENT '视频 id',
    280. `video_name` VARCHAR(255) COMMENT '视频名称',
    281. `play_count` BIGINT COMMENT '各章节视频播放次数',
    282. `avg_play_sec` BIGINT COMMENT '各章节视频人均观看时长',
    283. `user_count` BIGINT COMMENT '各章节观看人数',
    284. PRIMARY KEY (`dt`, `recent_days`, `chapter_id`, `video_id`)
    285. ) COMMENT '单章视频播放情况统计';
    286. # 24)各课程播放情况统计
    287. DROP TABLE IF EXISTS ads_learn_play_stats_by_course;
    288. CREATE TABLE ads_learn_play_stats_by_course
    289. (
    290. `dt` DATETIME COMMENT '统计日期',
    291. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    292. `course_id` VARCHAR(255) COMMENT '课程id',
    293. `course_name` VARCHAR(255) COMMENT '课程名称',
    294. `play_count` BIGINT COMMENT '各课程视频播放次数',
    295. `avg_play_sec` BIGINT COMMENT '各课程视频人均观看时长',
    296. `user_count` BIGINT COMMENT '各课程观看人数',
    297. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    298. ) COMMENT '各课程播放情况统计';
    299. # 25)各课程完课人数统计
    300. DROP TABLE IF EXISTS ads_complete_complete_user_count_per_course;
    301. CREATE TABLE ads_complete_complete_user_count_per_course
    302. (
    303. `dt` DATETIME COMMENT '统计日期',
    304. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    305. `course_id` VARCHAR(255) COMMENT '课程 id',
    306. `user_count` BIGINT COMMENT '各课程完课人数',
    307. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    308. ) COMMENT '各课程完课人数统计';
    309. # 26)完课综合指标
    310. DROP TABLE IF EXISTS ads_complete_complete_stats;
    311. CREATE TABLE ads_complete_complete_stats
    312. (
    313. `dt` DATETIME COMMENT '统计日期',
    314. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    315. `user_complete_count` BIGINT COMMENT '完课人数',
    316. `user_course_complete_count` BIGINT COMMENT '完课人次',
    317. PRIMARY KEY (`dt`, `recent_days`)
    318. ) COMMENT '完课综合指标';
    319. # 27)各课程人均完成章节视频数
    320. DROP TABLE IF EXISTS ads_complete_complete_chapter_count_per_course;
    321. CREATE TABLE ads_complete_complete_chapter_count_per_course
    322. (
    323. `dt` DATETIME COMMENT '统计日期',
    324. `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
    325. `course_id` VARCHAR(255) COMMENT '课程 id',
    326. `complete_chapter_count` BIGINT COMMENT '各课程用户平均完成章节数',
    327. PRIMARY KEY (`dt`, `recent_days`, `course_id`)
    328. ) COMMENT '各课程人均完成章节视频数';
    02、修改datax的jar包

    DataX

    1. GitHub - alibaba/DataX: DataX是阿里云DataWorks数据集成的开源版本。
    2. https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md
    3. https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md
    1. [atguigu@node001 ~]$ cd /opt/module/datax/
    2. [atguigu@node001 datax]$ python bin/datax.py -p"-Dexportdir=/warehouse/edu/ads/ads_traffic_stats_by_source/" job/ads_traffic_stats_by_source.json

    2023-09-05 10:59:01.854 [job-0] ERROR RetryUtil - Exception when calling callable, 即将尝试执行第1次重试.本次重试计划等待[1000]ms,实际等待[1001]ms, 异常Msg:[Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).].  -  具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server.]
    2023-09-05 10:59:03.860 [job-0] ERROR RetryUtil - Exception when calling callable, 即将尝试执行第2次重试.本次重试计划等待[2000]ms,实际等待[2000]ms, 异常Msg:[Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).].  -  具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server.]
    2023-09-05 10:59:07.865 [job-0] ERROR RetryUtil - Exception when calling callable, 即将尝试执行第3次重试.本次重试计划等待[4000]ms,实际等待[4000]ms, 异常Msg:[Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).].  -  具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server.]

    解决办法:已检查N遍,账号密码没有问题,将/opt/module/datax/plugin/writer/mysqlwriter/libs与/opt/module/datax/plugin/reader/mysqlreader/libs等两个lib目录下的mysql-connector-java-5.1.34.jar包替换为mysql-connector-java-8.0.29.jar。

    03、ads_traffic_stats_by_source.json文件

    经DataX智能分析,该任务最可能的错误原因是:
    com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-01], Description:[获取表字段相关信息失败.].  - 获取表:ads_traffic_stats_by_source 的字段的元信息时失败. 请联系 DBA 核查该库、表信息. - java.sql.SQLSyntaxErrorException: Unknown column 'channel' in 'field list'
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
            at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
            at com.mysql.cj.jdbc.StatementImpl.executeQuery(StatementImpl.java:1201)
            at com.alibaba.datax.plugin.rdbms.util.DBUtil.getColumnMetaData(DBUtil.java:563)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:125)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:140)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.doPretreatment(OriginalConfPretreatmentUtil.java:35)
            at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Job.init(CommonRdbmsWriter.java:41)
            at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Job.init(MysqlWriter.java:31)
            at com.alibaba.datax.core.job.JobContainer.initJobWriter(JobContainer.java:704)
            at com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:304)
            at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:113)
            at com.alibaba.datax.core.Engine.start(Engine.java:92)
            at com.alibaba.datax.core.Engine.entry(Engine.java:171)
            at com.alibaba.datax.core.Engine.main(Engine.java:204)

            at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:33)
            at com.alibaba.datax.plugin.rdbms.util.DBUtil.getColumnMetaData(DBUtil.java:575)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:125)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:140)
            at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.doPretreatment(OriginalConfPretreatmentUtil.java:35)
            at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Job.init(CommonRdbmsWriter.java:41)
            at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Job.init(MysqlWriter.java:31)
            at com.alibaba.datax.core.job.JobContainer.initJobWriter(JobContainer.java:704)
            at com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:304)
            at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:113)
            at com.alibaba.datax.core.Engine.start(Engine.java:92)
            at com.alibaba.datax.core.Engine.entry(Engine.java:171)
            at com.alibaba.datax.core.Engine.main(Engine.java:204)

    1. /opt/module/datax/job/ads_traffic_stats_by_source.json
    2. {
    3. "job": {
    4. "content": [
    5. {
    6. "reader": {
    7. "name": "hdfsreader",
    8. "parameter": {
    9. "column": [
    10. "*"
    11. ],
    12. "defaultFS": "hdfs://node001:8020",
    13. "encoding": "UTF-8",
    14. "fieldDelimiter": "\t",
    15. "fileType": "text",
    16. "nullFormat": "\\N",
    17. "path": "${exportdir}"
    18. }
    19. },
    20. "writer": {
    21. "name": "mysqlwriter",
    22. "parameter": {
    23. "column": [
    24. "dt",
    25. "recent_days",
    26. "source_id",
    27. "source_site",
    28. "uv_count",
    29. "avg_duration_sec",
    30. "avg_page_count",
    31. "sv_count",
    32. "bounce_rate"
    33. ],
    34. "connection": [
    35. {
    36. "jdbcUrl": "jdbc:mysql://node001:3306/edu_report?useUnicode=true&characterEncoding=utf-8",
    37. "table": [
    38. "ads_traffic_stats_by_source"
    39. ]
    40. }
    41. ],
    42. "username": "root",
    43. "password": "123456",
    44. "writeMode": "replace"
    45. }
    46. }
    47. }
    48. ],
    49. "setting": {
    50. "errorLimit": {
    51. "percentage": 0.02,
    52. "record": 0
    53. },
    54. "speed": {
    55. "channel": 3
    56. }
    57. }
    58. }
    59. }

    P113

    12.2.2 DataX配置文件生成脚本

    P114

    第13章 数据仓库工作流调度

    Apache DolphinScheduler是一个分布式、易扩展的可视化DAG工作流任务调度平台。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中开箱即用。

    P115

    第2章 DolphinScheduler部署说明

    第3章 DolphinScheduler集群模式部署

    3.6 一键部署DolphinScheduler

    1. [atguigu@node001 apache-dolphinscheduler-2.0.3-bin]$ jpsall
    2. ================ node001 ================
    3. 5360 QuorumPeerMain
    4. 2832 NameNode
    5. 9296 WorkerServer
    6. 3411 JobHistoryServer
    7. 5988 RunJar
    8. 9668 ApiApplicationServer
    9. 6100 RunJar
    10. 9414 LoggerServer
    11. 3000 DataNode
    12. 9545 AlertServer
    13. 10540 Jps
    14. 7020 NodeManager
    15. ================ node002 ================
    16. 5296 NodeManager
    17. 5984 WorkerServer
    18. 6032 LoggerServer
    19. 6231 Jps
    20. 4745 QuorumPeerMain
    21. 5178 ResourceManager
    22. 4986 DataNode
    23. ================ node003 ================
    24. 3985 NodeManager
    25. 4658 LoggerServer
    26. 4884 Jps
    27. 1861 DataNode
    28. 3594 QuorumPeerMain
    29. 1967 SecondaryNameNode
    30. [atguigu@node001 apache-dolphinscheduler-2.0.3-bin]$

    P116

    3.7 DolphinScheduler启停命令

    1. [atguigu@node001 apache-dolphinscheduler-2.0.3-bin]$ cd /opt/module/dolphinScheduler/ds-2.0.3/
    2. [atguigu@node001 ds-2.0.3]$ ll
    3. 总用量 60
    4. drwxrwxr-x 2 atguigu atguigu 4096 96 11:21 bin
    5. drwxrwxr-x 5 atguigu atguigu 4096 96 11:21 conf
    6. -rwxrwxr-x 1 atguigu atguigu 5190 96 11:22 install.sh
    7. drwxrwxr-x 2 atguigu atguigu 20480 96 11:22 lib
    8. drwxrwxr-x 2 atguigu atguigu 4096 96 11:23 logs
    9. drwxrwxr-x 2 atguigu atguigu 4096 96 11:22 pid
    10. drwxrwxr-x 2 atguigu atguigu 4096 96 11:22 script
    11. drwxrwxr-x 3 atguigu atguigu 4096 96 11:22 sql
    12. drwxrwxr-x 8 atguigu atguigu 4096 96 11:22 ui
    13. [atguigu@node001 ds-2.0.3]$ cd bin/
    14. [atguigu@node001 bin]$ ll
    15. 总用量 20
    16. -rwxrwxr-x 1 atguigu atguigu 6770 96 11:21 dolphinscheduler-daemon.sh
    17. -rwxrwxr-x 1 atguigu atguigu 2427 96 11:21 start-all.sh
    18. -rwxrwxr-x 1 atguigu atguigu 3332 96 11:21 status-all.sh
    19. -rwxrwxr-x 1 atguigu atguigu 2428 96 11:21 stop-all.sh
    20. [atguigu@node001 bin]$

    node003没有运行WorkerServer,资源不够,将资源改为8g就能运行了,但没必要。

    P117

    启动hadoop、zookeeper、hive、hive-service2、ds。

    1. [atguigu@node001 ~]$ myhadoop.sh start
    2. [atguigu@node001 ~]$ zookeeper.sh start
    3. [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive &
    4. [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive --service hiveserver2 &
    5. [atguigu@node001 ~]$ /opt/module/dolphinScheduler/ds-2.0.3/bin/start-all.sh
    1. [atguigu@node001 ~]$ myhadoop.sh start
    2. ================ 启动 hadoop集群 ================
    3. ---------------- 启动 hdfs ----------------
    4. Starting namenodes on [node001]
    5. Starting datanodes
    6. Starting secondary namenodes [node003]
    7. --------------- 启动 yarn ---------------
    8. Starting resourcemanager
    9. Starting nodemanagers
    10. --------------- 启动 historyserver ---------------
    11. [atguigu@node001 ~]$ zookeeper.sh start
    12. ---------- zookeeper node001 启动 ----------
    13. ZooKeeper JMX enabled by default
    14. Using config: /opt/module/zookeeper/zookeeper-3.5.7/bin/../conf/zoo.cfg
    15. Starting zookeeper ... STARTED
    16. ---------- zookeeper node002 启动 ----------
    17. ZooKeeper JMX enabled by default
    18. Using config: /opt/module/zookeeper/zookeeper-3.5.7/bin/../conf/zoo.cfg
    19. Starting zookeeper ... STARTED
    20. ---------- zookeeper node003 启动 ----------
    21. ZooKeeper JMX enabled by default
    22. Using config: /opt/module/zookeeper/zookeeper-3.5.7/bin/../conf/zoo.cfg
    23. Starting zookeeper ... STARTED
    24. [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive &
    25. [1] 3741
    26. [atguigu@node001 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
    27. [atguigu@node001 ~]$ nohup /opt/module/hive/hive-3.1.2/bin/hive --service hiveserver2 &
    28. [2] 3912
    29. [atguigu@node001 ~]$ nohup: 忽略输入并把输出追加到"nohup.out"
    30. [atguigu@node001 ~]$ /opt/module/dolphinScheduler/ds-2.0.3/bin/start-all.sh
    31. node001:default
    32. ...

    DolphinScheduler工作流运行之后在工作流实例中查不到是什么原因?将node001的运行内存从4G调为8G即可。

    P118

    第5章 DolphinScheduler进阶

    5.1 工作流传参

    DolphinScheduler支持对任务节点进行灵活的传参,任务节点可通过${参数名}引用参数值。

    由此可得,优先级由高到低:本地参数 > 全局参数 > 上游任务传递的参数。

    5.1.5 参数优先级

    3)结论

    (1)本地参数 > 全局参数 > 上游任务传递的参数;

    (2)多个上游节点均传递同名参数时,下游节点会优先使用值为非空的参数;

    (3)如果存在多个值为非空的参数,则按照上游任务的完成时间排序,选择完成时间最早的上游任务对应的参数。

    P119

    5.2 引用依赖资源

    P120

    13.2 数据准备

    启动hadoop、zookeeper、kafka、maxwell、f1、f2、f3。

    P121

    13.3 工作流调度实操

    [2023-09-06 17:15:26,824] ERROR [Broker id=0] Received LeaderAndIsrRequest with correlation id 1 from controller 1 epoch 33 for partition __consumer_offsets-44 (last update controller epoch 33) but cannot become follower since the new leader -1 is unavailable. (state.change.logger)
    [2023-09-06 17:15:26,824] ERROR [Broker id=0] Received LeaderAndIsrRequest with correlation id 1 from controller 1 epoch 33 for partition __consumer_offsets-32 (last update controller epoch 33) but cannot become follower since the new leader -1 is unavailable. (state.change.logger)
    [2023-09-06 17:15:26,824] ERROR [Broker id=0] Received LeaderAndIsrRequest with correlation id 1 from controller 1 epoch 33 for partition __consumer_offsets-41 (last update controller epoch 33) but cannot become follower since the new leader -1 is unavailable. (state.change.logger)

    [2023-09-06 19:32:27,802] ERROR [Controller id=0 epoch=34] Controller 0 epoch 34 failed to change state for partition __transaction_state-27 from OfflinePartition to OnlinePartition (state.change.logger)
    kafka.common.StateChangeFailedException: Failed to elect leader for partition __transaction_state-27 under strategy OfflinePartitionLeaderElectionStrategy(false)
        at kafka.controller.ZkPartitionStateMachine.$anonfun$doElectLeaderForPartitions$7(PartitionStateMachine.scala:424)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at kafka.controller.ZkPartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:421)
        at kafka.controller.ZkPartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:332)
        at kafka.controller.ZkPartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:238)
        at kafka.controller.ZkPartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:158)
        at kafka.controller.PartitionStateMachine.triggerOnlineStateChangeForPartitions(PartitionStateMachine.scala:74)
        at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:59)
        at kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:536)
        at kafka.controller.KafkaController.processBrokerChange(KafkaController.scala:1594)
        at kafka.controller.KafkaController.process(KafkaController.scala:2484)
        at kafka.controller.QueuedEvent.process(ControllerEventManager.scala:52)
        at kafka.controller.ControllerEventManager$ControllerEventThread.process$1(ControllerEventManager.scala:130)
        at kafka.controller.ControllerEventManager$ControllerEventThread.$anonfun$doWork$1(ControllerEventManager.scala:133)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
        at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:133)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
    [2023-09-06 19:32:27,805] INFO [Controller id=0 epoch=34] Changed partition __consumer_offsets-22 from OfflinePartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=37, isr=List(1), zkVersion=37) (state.change.logger)

    maxwell 报错: java.lang.RuntimeException: error: unhandled character set ‘utf8mb3‘

    1. maxwell 报错: java.lang.RuntimeException: error: unhandled character set ‘utf8mb3‘_你的482的博客-CSDN博客
    2. Maxwell安装使用 - 掘金

    这个问题是因为MySQL从 5.5.3 开始,用 utf8mb4 编码来实现完整的 UTF-8,其中 mb4 表示 most bytes 4,最多占用4个字节。而原来的utf8则被utf8mb3则代替。 一种解决方案是,将MySQL降级,重新安装5.5.3以下的版本。 另一种方法则是修改maxwell源码。 解压打开,找到有问题的类:com.zendesk.maxwell.schema.columndef.StringColumnDef,加上能识别utf8mb3的语句,重新打包。 打包好的maxwell-1.19.0.jar替换maxwell/lib/maxwell-1.19.0.jar,重启即可。

    启动hadoop、zookeeper、kafka、maxwell、f1.sh、f2.sh、f3.sh。

    关闭采集的相关组件:kafka、flume(f1、f2、f3)、maxwell;启动hadoop、hive、zookeeper、dolphinscheduler ...

    忘记启动zookeeper了...

    Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
    [ERROR] 2023-09-07 14:46:32.033 org.springframework.boot.SpringApplication:[843] - Application run failed
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'monitorServiceImpl': Unsatisfied dependency expressed through field 'registryClient'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'registryClient': Invocation of init method failed; nested exception is org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper connect timeout

    ...

    datax将数据同步至hdfs里面,mysql_to_hdfs_full.sh;

    数据导入到ods层中,hdfs_to_ods_db.sh、

    ods_to_dwd.sh。

    export HADOOP_HOME=/opt/module/hadoop/hadoop-3.1.3
    export HADOOP_CONF_DIR=/opt/module/hadoop/hadoop-3.1.3/etc/hadoop
    export SPARK_HOME=/opt/module/spark/spark-3.0.0-bin-hadoop3.2
    export JAVA_HOME=/opt/module/jdk/jdk1.8.0_212
    export HIVE_HOME=/opt/module/hive/hive-3.1.2
    export DATAX_HOME=/opt/module/datax

    export PATH=$HADOOP_HOME/bin:$SPARK_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$DATAX_HOME/bin:$PATH

    串联

    P122【122_在线教育数仓开发回顾 04:23】

  • 相关阅读:
    JS课程记录-黑马
    java学习day22(网络编程)网络编程
    mybatis单框架之动态sql
    rust枚举
    计算机网络
    优先发展非化石能源
    【前端设计模式】之责任链模式
    C/CPP基础练习题多维数组,矩阵转置,杨辉三角详解
    智慧燃气管网监测系统功能简要介绍
    Reka团队打造前沿多模态语言模型,展现卓越性能
  • 原文地址:https://blog.csdn.net/weixin_44949135/article/details/132685054