大数据培训企业开发案例实时读取本地文件到HDFS案例 - 码农知识堂

大数据培训企业开发案例实时读取本地文件到HDFS案例
实时读取本地文件到HDFS案例

1）案例需求：实时监控Hive日志，并上传到HDFS中

2）需求分析：

大数据培训

3）实现步骤：
1. Flume要想将数据输出到HDFS，必须持有Hadoop相关jar包
将commons-configuration-1.6.jar、

hadoop-auth-2.7.2.jar、

hadoop-common-2.7.2.jar、

hadoop-hdfs-2.7.2.jar、

commons-io-2.4.jar、

htrace-core-3.1.0-incubating.jar

拷贝到/opt/module/flume/lib文件夹下。
- 创建flume-file-hdfs.conf文件
创建文件

[atguigu@hadoop102 job]$ touch flume-file-hdfs.conf

注：要想读取Linux系统中的文件，就得按照Linux命令的规则执行命令。由于Hive日志在Linux系统中所以读取文件的类型选择：exec即execute执行的意思。表示执行Linux命令来读取文件。大数据培训

[atguigu@hadoop102 job]$ vim flume-file-hdfs.conf

添加如下内容

# Name the components on this agent

a2.sources = r2

a2.sinks = k2

a2.channels = c2

# Describe/configure the source

a2.sources.r2.type = exec

a2.sources.r2.command = tail -F /opt/module/hive/logs/hive.log

a2.sources.r2.shell = /bin/bash -c

# Describe the sink

a2.sinks.k2.type = hdfs

a2.sinks.k2.hdfs.path = hdfs://hadoop102:9000/flume/%Y%m%d/%H

#上传文件的前缀

a2.sinks.k2.hdfs.filePrefix = logs-

#是否按照时间滚动文件夹

a2.sinks.k2.hdfs.round = true

#多少时间单位创建一个新的文件夹

a2.sinks.k2.hdfs.roundValue = 1

#重新定义时间单位

a2.sinks.k2.hdfs.roundUnit = hour

#是否使用本地时间戳

a2.sinks.k2.hdfs.useLocalTimeStamp = true

#积攒多少个Event才flush到HDFS一次

a2.sinks.k2.hdfs.batchSize = 1000

#设置文件类型，可支持压缩

a2.sinks.k2.hdfs.fileType = DataStream

#多久生成一个新的文件

a2.sinks.k2.hdfs.rollInterval = 60

#设置每个文件的滚动大小

a2.sinks.k2.hdfs.rollSize = 134217700

#文件的滚动与Event数量无关

a2.sinks.k2.hdfs.rollCount = 0

# Use a channel which buffers events in memory

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel

a2.sources.r2.channels = c2

a2.sinks.k2.channel = c2

大数据培训
- 执行监控配置
[atguigu@hadoop102 flume]$ bin/flume-ng agent –conf conf/ –name a2 –conf-file job/flume-file-hdfs.conf
- 开启Hadoop和Hive并操作Hive产生日志
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

[atguigu@hadoop102 hive]$ bin/hive

hive (default)>
- 在HDFS上查看文件。
相关阅读:
公共供水管网漏损治理智能化管理系统解决方案
 牛掰！“基础-中级-高级”Java程序员面试集结，看完献出我的膝盖
 小程序压缩
 C语言，指针的一些运算
 StarRocks简介及安装
 golang同步原语——sync.Mutex
Dolphinscheduler3.0源码分析之XxlJob优化之路
 rk3588编译atlas200
Linux环境下为应用程序建立快捷方式
 计算机毕业设计基于SpringBoot餐厅点餐系统的设计与实现 Java实战项目附源码+文档+视频讲解
原文地址：https://blog.csdn.net/zjjcchina/article/details/127841407