1.配置core-site-xml
fs-defaultFS
hdfs://haddop102:9820
hadoop.tmp.dir
/opt/module/hadoop-3.1.3/data
hadoop.http.staticuser.user
atguigu
hadoop.proxyuser.atguigu.hosts _ hadoop.proxyuser.atguigu.groups _ hadoop.proxyuser.atguigu.groups \*2.配置htfs-site-xml
dfs:namenode.http-address
hadoop102:9870
dfs.namenode.secondary.http-address hadoop104:9868
3.配置yarn。
yarn.nodemanger.aux-services
mapreduce_shuffle
yarn.resuorcemanger.hostname
hadoop103
yarn.nomemanger.env-whitelist
JAVA_HOME,HADOOP_COMMON.HADOOP_HDFS,HADOOP_CONF_DIR,CLASSPATH_REEPEND_DISSTCTCACHE,HADOOP_YARN,HADOOP_MAP
yarn.log-aggregation-enable
true
yarn.log.server.url
http://hadoop102:19888/jobhistory/logs
yarn.log-aggregation.retain-seconds
604800
yarn.scheduler.minimum-allocation-mb 512 yarn.scheduler.maximum-allocation-mb 4096 yarn.nodemanager.resource.memory-mb 4096 yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.vmem-check-enabled false4.配置mapred-site-xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop102:10020
mapreduce.jobhistory.address
hadoop102:19888
5.配置workers
vim /opt/module/hadoop-3.1.3/etc/hadoop/workers
hadoop102
hadoop103
hadoop104
6.分发配置文件:
[atguigu@hadoop102 hadoop]$xsync /opt/module/hadoop-3.1.3/etc/hadoop/
7.启动集群。
格式化:
[atguigu@hadoop102 ~]$hdfs namenode -format
//启动hdfs
[atguigu@hadoop102 hadoop-3.1.3]$ /sbin start-dfs.sh
//启动yarn
[atguigu@hadoop102 hadoop-3.1.3]$ /sbin /start-yarn.sh
访问web端的htfs功能。
访问web端的resoucemanger的
(4)Web端查看HDFS的NameNode
(a)浏览器中输入:http://hadoop102:9870
(b)查看HDFS上存储的数据信息
(5)Web端查看YARN的ResourceManager
(a)浏览器中输入:http://hadoop103:8088
(b)查看YARN上运行的Job信息.
集群的基本测试。
在namenode 网页端创建文件夹
[atguigu@hadoop102~]$ hadoop fs -mkdir /input
把文件上传到文件夹
[atguigu@hadoop102 ~ ]$hadoop fs - put /home/atguigu/wcincuput/word.txt /input
//查看上传的文件的储存路径。
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-440821944-192.168.16.102-1599035131869/current/finalized/subdir0/subdir0
执行mapreduce中的wordcount程序
[atguigu@hadoop102 hadoop-3.1.3]$hadoop jar share/hadoop/mapreduce/hadoop-maptreduce-examples-3.1.3 .jar wordcount /input /output 输出的文件output不能存在。
8.集群启动的方法。
1)单个启动集群。
启动HDFS和停止HDFS
[atguigu@hadoop102 ~]$hdfs --daemon start/stop namenode/datanode/secondarynode
启动和关闭YARN
[atguigu@hadoop 102 ~]$yarn --daemon start /stop resourcemanger/nodemanger
启动和关闭历服务器
[atguigu@hadoop 102 ~]$mapred --daemon start/stop historyserver
2)模块化启动hdfs和yarn.
启动和关闭hdfs
[atguigu@hadoop 102 ~]$start/stop-dfs.sh
启动和关闭yarn
[atguigu@hadoop102 ~ ]$start/stop-yarn.sh
模块启动和关闭历史服务器
[atguigu@hadoop102 ~]$mapred --daemon stop/satrt historyserver
[atguigu@hadoop102 ~ ]$hadoop fs -mkdir input
9.//查看java进程的脚本
$cd /home/atguigu/bin/
$cd /home/atguigu/bin/
vim /jpsall
chmod 755 jpsall
#!bin/bash
for $host in hadoop102 hadoop103 hadoop104
do
echo “=$host=”
ssh $host jps $@ | grep -v jps
done
//关闭和开启hdfs和yarn还有历史服务器的脚本
$touch myhadoop.sh
$vim hadoop.sh
#!/bin/bash
if [ $# -lt 1];
then
echo “No Args Input”
exit;
fi
case $1 in " start")
echo " =启动集群==="
echo “启动HDFS”
ssh hadoop102 “/opt/module/hadoop-3.1.3/sbin/start-dfs.sh”
echo “启动YARN”
ssh hadoop103 “/opt/module/hadoop-3.1.3/sbin/start-yarn.sh”
echo “===启动历史服务器=”
ssh hadoop102 " /opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
“stop”)
echo “关闭集群=======”
echo “关闭历史服务器==”
ssh hadoop102 “/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver”
echo “关闭YARN=====”
ssh hadoop103 “/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh”
echo " ====关闭HDFS="
ssh hadoop102 " /opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo “输入有误”
exit
;;
easc
$chmod 755 myhadoop.sh
//分发到其他两台服务器
$xsync /home/atguigu/bin/
//配置集群时间同步
10.配置时间同步具体实操:
1**)时间服务器配置(必须root用户)**
(0)查看所有节点ntpd服务状态和开机自启动状态
[atguigu@hadoop102 ~]$ sudo systemctl status ntpd
[atguigu@hadoop102 ~]$ sudo systemctl is-enabled ntpd
(1)在所有节点关闭ntp服务和自启动
[atguigu@hadoop102 ~]$ sudo systemctl stop ntpd
[atguigu@hadoop102 ~]$ sudo systemctl disable ntpd
(2)修改hadoop102的ntp.conf配置文件
[atguigu@hadoop102 ~]$ sudo vim /etc/ntp.conf
修改内容如下
a)修改1(授权192.168.1.0-192.168.1.255网段上的所有机器可以从这台机器上查询和同步时间)
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
为restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
b)修改2(集群在局域网中,不使用其他互联网上的时间)
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
为
**#**server 0.centos.pool.ntp.org iburst
**#**server 1.centos.pool.ntp.org iburst
**#**server 2.centos.pool.ntp.org iburst
**#**server 3.centos.pool.ntp.org iburst
c)添加3(当该节点丢失网络连接,依然可以采用本地时间作为时间服务器为集群中的其他节点提供时间同步)
(3)修改hadoop102的/etc/sysconfig/ntpd 文件
[atguigu@hadoop102 ~]$ sudo vim /etc/sysconfig/ntpd
增加内容如下(让硬件时间与系统时间一起同步)
SYNC_HWCLOCK=yes
(4)重新启动ntpd服务
[atguigu@hadoop102 ~]$ sudo systemctl start ntpd
(5)设置ntpd服务开机启动
[atguigu@hadoop102 ~]$ sudo systemctl enable ntpd