大数据安装部署

一、基础环境搭建

1.安装VMware软件

2.CentOS系统安装

3.配置网络并关闭防火墙

一、基础环境搭建

1.安装VMware软件

在这里展示是vmware10的版本，官网可以下载，

镜像文件：CentOS-6.8-x86_64-bin-DVD1.iso，官网下载

2.CentOS系统安装

2.1创建新的虚拟机

2.选择自定义，然后下一步

3.不用修改，直接下一步

4.选择稍后安装操作系统，下一步

5.选择linux，centos64位，下一步

6.虚拟机名称起hadoop01,位置选择D盘位置，建议不要选择C盘,后期占内存大,下一步

7.直接下一步

8.建议选择2g内存，下一步

9.选择仅主机模式网络，下一步

10.选择系统自动推荐的，下一步

11.选择创建新虚拟磁盘，下一步

12.大小选择20g，将虚拟磁盘拆分多个文件，下一步

13.不用修改，直接下一步

14.不用修改，点击完成

15.编辑虚拟机设置

16.选择ISO映像文件，镜像文件：CentOS-6.8-x86_64-bin-DVD1.iso，官网下载

17.选择Install or upgrade an existing system,回车

18. 选择skip，回车

19.选择next，选择语言页面选择默认下一步，键盘默认下一步

20.Basic Storage Devices,选择Next

21.选择Yes，discrard any data

22.不用修改主机名，直接下一步

23.不用修改时间地区，直接下一步

24.设置密码，下一步,

25.选择Use Anyway,

26.选择Use All Space ,下一步

27.选择Write changes to disk

28.默认安装桌面版本，下一步

29.等待过程有些久

30.选择Reboot

31.直接选择Forward，接下来直接选择Forward

32.选择Yes

33.选择Finish，然后选择yes

3.配置网络并关闭防火墙

1.登录用户和密码，选择log in

2.打开终端，open in Terminal

3.配置虚拟机静态ip


[root@hadoop01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=00:0C:29:1C:AE:A7
TYPE=Ethernet
UUID=a01f4ce4-e877-4696-aa24-caeef5395b9f
ONBOOT=yes    #修改成yes
NM_CONTROLLED=yes
BOOTPROTO=static    #修改成静态
IPADDR=192.168.86.101    #子网ip，101是自设的
NETMASK=255.255.255.0    #子网掩码
GETWAY=192.168.86.1    #子网ip，最后一位是固定的

4.配置好了重启网络

[root@hadoop01 ~]# service network restart

5.查看网络ip


[root@hadoop01 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:1C:AE:A7  
          inet addr:192.168.86.101  Bcast:192.168.86.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe1c:aea7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2093404 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2229452 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1148990236 (1.0 GiB)  TX bytes:2157139065 (2.0 GiB)
 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:660419 errors:0 dropped:0 overruns:0 frame:0
          TX packets:660419 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:92983459 (88.6 MiB)  TX bytes:92983459 (88.6 MiB)

6.查看防火墙状态

[root@hadoop01 ~]# service iptables status

7.关闭防火墙（暂时关闭，重启后会失效）

[root@hadoop01 ~]# service iptables stop

8.检查防火墙状态


[root@hadoop01 ~]# service iptables status
iptables: Firewall is not running.    #出现这个表示关闭的防火墙

9.永久关闭防火墙


[root@hadoop01 ~]# chkconfig --list iptables    
iptables       	0:off	1:off	2:on	3:on	4:on	5:on	6:off
[root@hadoop01 ~]# chkconfig iptables off    #永久关闭
[root@hadoop01 ~]# chkconfig --list iptables    
iptables       	0:off	1:off	2:off	3:off	4:off	5:off	6:off    #永久关闭成功

4.配置主机名和ip的映射关系

1.打开主机名的配置文件，进行修改


[root@hadoop01 ~]# vim /etc/sysconfig/network
 
NETWORKING=yes
HOSTNAME=hadoop01

2.修改ip的映射关系


[root@hadoop01 ~]# vim /etc/hosts
 
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.86.101 hadoop01
192.168.86.102 hadoop02
192.168.86.103 hadoop03

3.重启虚拟机

[root@hadoop01 software]# reboot

4.检查主机名


[root@hadoop01 software]# hostname
hadoop01
[root@hadoop01 software]# ping hadoop01
PING hadoop01 (192.168.86.101) 56(84) bytes of data.
64 bytes from hadoop01 (192.168.86.101): icmp_seq=1 ttl=64 time=0.057 ms
64 bytes from hadoop01 (192.168.86.101): icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from hadoop01 (192.168.86.101): icmp_seq=3 ttl=64 time=0.049 ms
^C
--- hadoop01 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2795ms
rtt min/avg/max/mdev = 0.047/0.051/0.057/0.004 ms

5.安装配置JDK

在这里使用的是jdk-8u144-linux-x64.tar.gz，官网下载

1.建立专门放置安装包


[root@hadoop01 ~]# mkdir /opt/software/
[root@hadoop01 ~]# cd /opt/software/
[root@hadoop01 software]# ll
total 0

2.上传jdk到指定路径

可以使用xshell或者其他也可以，上传文件


[root@hadoop01 software]# ls
jdk-8u144-linux-x64.tar.gz

3.建立一个解压后的文件夹


[root@hadoop01 software]# mkdir /opt/module/
[root@hadoop01 software]# cd /opt/module/
[root@hadoop01 module]# ll
total 0

4.解压jdk


[root@hadoop01 software]# tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module/
[root@hadoop01 module]# java -version    #安装完成后，查看jdk版本
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

5.如果版本号不对应


[root@hadoop01 software]# rpm -qa | grep java
tzdata-java-2016c-1.el6.noarch
java-1.7.0-openjdk-1.7.0.99-2.6.5.1.e16.x86_64
java-1.6.0-openjdk-1.6.0.38-1.13.10.4.e16.x86_64
[root@hadoop01 software]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.e16.x86_64    #卸载安装包
[root@hadoop01 software]# rpm -e --nodeps java-1.6.0-openjdk-1.6.0.38-1.13.10.4.e16.x86_64
[root@hadoop01 software]# rpm -qa | grep java
tzdata-java-2016c-1.el6.noarch

6.配置环境变量


[root@hadoop01 software]# vim /etc/profile    #添加jdk路径
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
[root@hadoop01 software]# source /etc/profile

7.再次查看jdk版本


[root@hadoop01 software]# vim /etc/profile
[root@hadoop01 software]# java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

6.克隆虚拟机

1.右键打开克隆

2. 点击下一页

3.选择创建完整克隆

4.名称和路径

5.选择继续

重复一遍克隆一台虚拟机

名称为haoop03的虚拟机

6.并且启动

7.配置网络环境

1.查看网卡信息


[root@hadoop02 ~]# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:0C:29:96:83:5A  
          inet addr:192.168.86.102  Bcast:192.168.86.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe96:835a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1648800 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1521767 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1229868487 (1.1 GiB)  TX bytes:187659267 (178.9 MiB)
 
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:72557 errors:0 dropped:0 overruns:0 frame:0
          TX packets:72557 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5452856 (5.2 MiB)  TX bytes:5452856 (5.2 MiB)

2.把网卡信息中eth1修改成eth0


[root@hadoop02 ~]# vim /etc/sysconfig/network-scripts/ifcfg-eth0
 
DEVICE=eth0
HWADDR=00:0C:29:96:83:5A    #查询ifconfig的eth1第一行的物理地址
TYPE=Ethernet
UUID=a01f4ce4-e877-4696-aa24-caeef5395b9f
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.86.102    #修改hadoop02的IP地址
NETMASK=255.255.255.0
GETWAY=192.168.86.1
#只需要修改物理地址和ip地址就可以

3.修改网卡文件


[root@hadoop02 ~]# vim /etc/udev/rules.d/70-persistent-net.rules 
 
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
 
 
# PCI device 0x8086:0x100f (e1000)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:0c:29:96:83:5a", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

4.修改主机名


[root@hadoop02 ~]# vim /etc/sysconfig/network
 
NETWORKING=yes
HOSTNAME=hadoop02

5.重启虚拟机，主机名生效

[root@hadoop02 ~]# reboot

6.修改hadoop03网卡同上

二、hadoop安装

在这里hadoop-2.7.3.tar.gz，官网下载

1.安装hadoop，配置环境变量

1.上传hadoop-2.7.3.tar.gz到指定位置


[root@hadoop01 software]# ls
hadoop-2.7.3.tar.gz

2.解压文件

[root@hadoop01 software]# tar -zxvf hadoop-2.7.3.tar.gz -C /opt/module/

3.配置hadoop-env.sh

在esc状态下：set nu显示行号

打开hadoop-env.sh修改jdk路径


[root@hadoop01 ~]# cd /opt/module/hadoop-2.7.3/etc/hadoop
[root@hadoop01 hadoop]# vim hadoop-env.sh 
25 export JAVA_HOME=/opt/module/jdk1.8.0_144

4.添加hadoop的路径


[root@hadoop01 hadoop]# vim /etc/profile    #最后一行添加
export HADOOP_HOME=/opt/module/hadoop-2.7.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

5.修改让文件生效

[root@hadoop01 software]# source /etc/profile

6.hadoop02，hadoop03重复3-5的步骤配置就可以

2.ssh免密码登录

1.hadoop01生成公钥和私钥


[root@hadoop01 ~]# cd .ssh
[root@hadoop01 .ssh]# pwd
/root/.ssh
[root@hadoop01 .ssh]# ssh-keygen -t rsa
[root@hadoop01 .ssh]# ssh-copy-id hadoop01    #yes回车，输入密码
[root@hadoop01 .ssh]# ssh-copy-id hadoop02
[root@hadoop01 .ssh]# ssh-copy-id hadoop03
[root@hadoop01 .ssh]# ssh-copy-id localhost

2.hadoop02，hadoop03重复步骤配置就可以

3.编写集群同步脚本

1.创建目录

[root@hadoop01 ~]# mkdir bin

2.创建文件

[root@hadoop01 bin]# touch xsync

3.编写集群同步脚本


[root@hadoop01 bin]# vim xsync
 
#!/bin/bash
#1 获取输入参数个数，如果没有参数，直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi
 
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
 
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
 
#4 获取当前用户名称
user=`whoami`
 
#5 循环
for((host=1; host<4; host++)); do
        #echo $pdir/$fname $user@hadoop$host:$pdir
        echo --------------- hadoop0$host ----------------
        rsync -rvl $pdir/$fname $user@hadoop0$host:$pdir
done

4.文件加上权限

[root@hadoop01 bin]# chmod 777 xsync

5.同步目录

[root@hadoop01 bin]# /root/bin/xsync /root/bin

4.配置hdfs集群

1.配置core-site.xml


[root@hadoop01 hadoop]# vim core-site.xml 
<configuration>
        <!-- 指定HDFS中NameNode的地址 -->
        <property>
                <name>fs.defaultFS</name>
        <value>hdfs://hadoop01:9000</value>
        </property>
 
        <!-- 指定hadoop运行时产生文件的存储目录 -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/module/hadoop-2.7.3/data/tmp</value>
        </property>
</configuration>
~

2.配置hdfs-site.xml


[root@hadoop01 hadoop]# vim hdfs-site.xml 
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
 
        <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop01:50090</value>
    </property>
</configuration>

3.配置slaves


[root@hadoop01 hadoop]# vim slaves 
 
hadoop01
hadoop02
hadoop03

5.配置yarn集群

1.配置yarn-env.sh


[root@hadoop01 hadoop]# vim yarn-env.sh
23 export JAVA_HOME=/opt/module/jdk1.8.0_144

2.配置yarn-site.xml


[root@hadoop01 hadoop]# vim yarn-site.xml
<configuration>
 
        <!-- reducer获取数据的方式 -->
        <property>
                 <name>yarn.nodemanager.aux-services</name>
                 <value>mapreduce_shuffle</value>
        </property>
 
        <!-- 指定YARN的ResourceManager的地址 -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <!-- 日志聚集功能使能 -->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- 日志保留时间设置7天 -->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
 
</configuration>

3.配置mapred-env.sh


[root@hadoop01 hadoop]# vim mapred-env.sh 
export JAVA_HOME=/opt/module/jdk1.8.0_144
 
 
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
 
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

4.配置mapred-site.xml


[root@hadoop01 hadoop]# mv mapred-site.xml.template mapred-site.xml #修改名字
[root@hadoop01 hadoop]# vim mapred-site.xml
<configuration>
        <!-- 指定mr运行在yarn上 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hadoop01:10020</value>
        </property>
        <property>
                 <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop01:19888</value>
        </property>
 
</configuration>

5.同步集群到hadoop02,hadoop03

[root@hadoop01 hadoop]# /root/bin/xsync /opt/module/hadoop-2.7.3/

6.启动集群测试

1.第一次启动集群，格式化namenode

[root@hadoop01 hadoop-2.7.3]# bin/hdfs namenode -format

2.启动进程

[root@hadoop01 hadoop-2.7.3]# sbin/start-dfs.sh

3.查看进程


[root@hadoop01 hadoop-2.7.3]# jps
10496 Jps
28469 SecondaryNameNode
28189 NameNode
28286 DataNode
[root@hadoop02 ~]# jps
27242 Jps
3614 DataNode
[root@hadoop03 ~]# jps
27242 Jps
3614 DataNode

4.访问端口号50070

5.启动yarn集群

[root@hadoop01 hadoop-2.7.3]# sbin/start-yarn.sh

6.查看yarn进程


[root@hadoop01 hadoop-2.7.3]# jps
49155 NodeManager
28469 SecondaryNameNode
48917 ResourceManager
10600 Jps
28189 NameNode
28286 DataNode
[root@hadoop02 ~]# jps
3736 NodeManager
27242 Jps
3614 DataNode
[root@hadoop03 ~]# jps
3736 NodeManager
27242 Jps
3614 DataNode

7.查看yarn端口号8088

7.配置本地映射关系

1.在Windows找到hosts文件

2.进行修改，添加ip和主机名


# Copyright (c) 1993-2009 Microsoft Corp.
#
# This is a sample HOSTS file used by Microsoft TCP/IP for Windows.
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one
# space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
#      102.54.94.97     rhino.acme.com          # source server
#       38.25.63.10     x.acme.com              # x client host
 
# localhost name resolution is handled within DNS itself.
#	127.0.0.1       localhost
#	::1             localhost
# 最后一行添加
192.168.86.101 hadoop01
192.168.86.102 hadoop02
192.168.86.103 hadoop03

8.配置本地yum源

1.创建目录


[root@hadoop01 ~]# mkdir /mnt/cdrom
[root@hadoop01 ~]# cd /mnt
[root@hadoop01 mnt]# ll
total 4
dr-xr-xr-x. 7 root root 4096 May 23  2016 cdrom

2.挂载光驱


[root@hadoop01 mnt]# mount -t auto /dev/cdrom /mnt/cdrom
[root@hadoop01 mnt]# cd /etc/yum.repos.d/
[root@hadoop01 yum.repos.d]# mkdir bak
[root@hadoop01 yum.repos.d]# mv CentOS-* bak

3.创建配置CentOS-DVD.repo


[root@hadoop01 yum.repos.d]# touch CentOS-DVD.repo 
[root@hadoop01 yum.repos.d]# vim CentOS-DVD.repo 
 
[centos6-dvd]
name=Welcome to local source yum
baseurl=file:///mnt/cdrom
enabled=1
gpgcheck=0

4.加载yum源


[root@hadoop01 yum.repos.d]# yum clean all
[root@hadoop01 yum.repos.d]# yum repolist all

9.配置hadoop历史日志

1.配置历史服务器

1. 配置mapred-site.xml


[root@hadoop01 hadoop]# vim mapred-site.xml
<configuration>
        <!-- 指定mr运行在yarn上 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hadoop01:10020</value>
        </property>
        <property>
                 <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop01:19888</value>
        </property>
 
</configuration>

2. 查看启动历史服务器文件目录


[root@hadoop01 hadoop-2.7.3]# ls sbin/ | grep mr
mr-jobhistory-daemon.sh

3.启动历史服务器

[root@hadoop01 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

4.查看历史服务器是否启动

[root@hadoop01 hadoop-2.7.2]$ jps

5.查看jobhistory，端口号19888

http://hadoop01:19888/jobhistory

2.配置日志的聚集

1.配置yarn-site.xml


[root@hadoop01 hadoop]# vim yarn-site.xml
<configuration>
 
        <!-- reducer获取数据的方式 -->
        <property>
                 <name>yarn.nodemanager.aux-services</name>
                 <value>mapreduce_shuffle</value>
        </property>
 
        <!-- 指定YARN的ResourceManager的地址 -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <!-- 日志聚集功能使能 -->
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <!-- 日志保留时间设置7天 -->
        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>604800</value>
        </property>
 
</configuration>

2.关闭nodemanager 、resourcemanager和historymanager


[root@hadoop01 hadoop-2.7.3]$ sbin/yarn-daemon.sh stop resourcemanager
[root@hadoop01 hadoop-2.7.3]$ sbin/yarn-daemon.sh stop nodemanager
[root@hadoop01 hadoop-2.7.3]$ sbin/mr-jobhistory-daemon.sh stop historyserver

3. 启动nodemanager 、resourcemanager和historymanager


[root@hadoop01 hadoop-2.7.3]$ sbin/yarn-daemon.sh start resourcemanager
[root@hadoop01 hadoop-2.7.3]$ sbin/yarn-daemon.sh start nodemanager
[root@hadoop01 hadoop-2.7.3]$ sbin/mr-jobhistory-daemon.sh start historyserver

4.删除hdfs上已经存在的hdfs文件

[root@hadoop01 hadoop-2.7.3]$ bin/hdfs dfs -rm -R /user/root/output

5.执行wordcount程序

[root@hadoop01 hadoop-2.7.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/root/input /user/root/output

6.查看日志，端口号19888

三、安装hive

1.mysql安装

1.安装mysql

[root@hadoop01 ~]# yum install mysql-server -y

2.启动mysql

[root@hadoop01 ~]# service mysql start

3.初始化密码，不设置密码，直接回车

[root@hadoop01 ~]# /usr/bin/mysql_secure_installation

4.启动并登录


[root@hadoop01 ~]# service mysqld restart
[root@hadoop01 ~]# mysql -u root -p123456

5.用远程端连接mysql


mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123456' WITH
GRANT OPTION;
Query OK, 0 rows affected (0.01 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

6.退出mysql


mysql> exit
Bye
[root@hadoop01 ~]#

2.安装hive

1.上传hive的安装包

apache-hive-2.1.1-bin.tar.gz在这里是用的这个版本

[root@hadoop01 software]# ls    #查看对应安装包是否上传

2.解压安装包

[root@hadoop01 software]# tar -zxvf apache-hive-2.1.1-bin.tar.gz -C /opt/module/

3.修改hive-env.sh文件并加入环境变量


[root@hadoop01 module]# mv apache-hive-2.1.1-bin/ hive
[root@hadoop01 module]# cd hive/conf/
[root@hadoop01 conf]# mv hive-env.sh.template hive-env.sh
[root@hadoop01 conf]# vim hive-env.sh
 47 # Set HADOOP_HOME to point to a specific hadoop install directory
 48 HADOOP_HOME=/opt/module/hadoop-2.7.3
 49 
 50 # Hive Configuration Directory can be controlled by:
 51 export HIVE_CONF_DIR=/opt/module/hive/conf

3.配置mysql的数据源

1.上传mysql的安装包

mysql-connector-java-5.1.27.tar.gz，官网下载

2.复制mysql驱动jar包到hive环境中

[root@hadoop01 mysql-connector-java-5.1.27]# cp mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib

3.配置hive的mysql数据源


[root@hadoop01 mysql-connector-java-5.1.27]# cd /opt/module/hive/conf/
[root@hadoop01 conf]# vim hive-site.xml 
 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExist=true</value>
                <description>JDBC connect string for a JDBC metastore</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
                <description>Driver class name for a JDBC metastore</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
                <description>username to use against metastore database</description>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123456</value>
                <description>password to use against metastore database</description>
        </property>
</configuration>

4.在HDFS上创建/tmp和/user/hive/warehouse两个目录并修改他们的同组权限可写


[root@hadoop01 hadoop-2.7.3]$ bin/hadoop fs -mkdir /tmp
[root@hadoop01 hadoop-2.7.3]$ bin/hadoop fs -mkdir -p /user/hive/warehouse
[root@hadoop01 hadoop-2.7.3]$ bin/hadoop fs -chmod g+w /tmp
[root@hadoop01 hadoop-2.7.3]$ bin/hadoop fs -chmod g+w /user/hive/warehouse

5.初始化元数据库，在hive安装目录的bin下

[root@hadoop01 hive]# bin/schematool -dbType mysql -initSchema

6.登录hive的shell进行测试


[root@hadoop01 hive]# bin/hive
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/module/jdk1.8.0_144/bin:/opt/module/hadoop-2.7.3/bin:/opt/module/hadoop-2.7.3/sbin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
 
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> exit;
[root@hadoop01 hive]#

四、配置时间同步

1.配置NTP时间同步服务器

1.在三台服务器中设置相同的上海时区


[root@hadoop01 ~] # cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@hadoop02 ~] # cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@hadoop03 ~] # cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

2.hadoop01机器上配置

hadoop01作为主时间同步服务器，其它机器时间以它进行时间同步。


[root@hadoop01 ~]# vim /etc/ntp.conf
 10 
 11 # Permit all access over the loopback interface.  This could
 12 # be tightened as well, but to do so would effect some of
 13 # the administrative functions.
 14 restrict 192.168.86.101 nomodify notrap nopeer noquery    #修改ip地址
 15 restrict 127.0.0.1
 16 restrict -6 ::1
 17 
 18 # Hosts on local network are less restricted.
 19 restrict 192.168.86.1 mask 255.255.255.0 nomodify notrap
 20 
 21 # Use public servers from the pool.ntp.org project.
 22 # Please consider joining the pool (http://www.pool.ntp.org/join.html).
 23 #server 0.centos.pool.ntp.org iburst
 24 #server 1.centos.pool.ntp.org iburst
 25 #server 2.centos.pool.ntp.org iburst
 26 #server 3.centos.pool.ntp.org iburst
 27 server 127.127.1.0
 28 fudge 127.127.1.0 stratum 10

3.在hadoop02/hadoop03上分别配置


  6 # Permit time synchronization with our time source, but do not
  7 # permit the source to query or modify the service on this system.
  8 restrict default kod nomodify notrap nopeer noquery
  9 restrict -6 default kod nomodify notrap nopeer noquery
 10 
 11 # Permit all access over the loopback interface.  This could
 12 # be tightened as well, but to do so would effect some of
 13 # the administrative functions.
 14 restrict 192.168.86.101 nomodify notrap nopeer noquery
 15 restrict 127.0.0.1
 16 restrict -6 ::1
 17 
 18 # Hosts on local network are less restricted.
 19 trict 192.168.86.1 mask 255.255.255.0 nomodify notrap
 20 
 21 # Use public servers from the pool.ntp.org project.
 22 # Please consider joining the pool (http://www.pool.ntp.org/join.html).
 23 #server 0.centos.pool.ntp.org iburst
 24 #server 1.centos.pool.ntp.org iburst
 25 #server 2.centos.pool.ntp.org iburst
 26 #server 3.centos.pool.ntp.org iburst
 27 server 192.168.86.101
 28 Fudge 192.168.86.101 stratum 10

2.启动服务

1.在三台虚拟机中启动ntpd服务器


[root@hadoop01 ~]#service ntpd start
[root@hadoop02 ~]#service ntpd start
[root@hadoop03 ~]#service ntpd start

2.在hadoop02或hadoop03中修改任意一台机器时间

[root@hadoop02 ~]# date -s "2017-9-11 11:11:11"

3.在这台机器中向时间服务器hadoop01发送同步请求(手动同步测试)

[root@hadoop02 ~]# ntpdate 192.168.86.101

查看时间同步结果:

[root@hadoop02 ~]# date

4.在hadoop02或hadoop03中修改任意一台机器时间

[root@hadoop02 ~]# date -s "2017-9-11 11:11:11"

5.十分钟后查看机器是否与时间服务器同步(自动同步测试)

[root@hadoop02 ~]# date

五、安装Zookeeper

1.安装

1.上传安装包

zookeeper-3.4.10.tar.gz

2.解压安装包


[root@hadoop01 software]# tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/
[root@hadoop01 software]# cd /opt/module/
[root@hadoop01 module]# mv zookeeper-3.4.10/ zookeeper
[root@hadoop01 conf]# cd /opt/module/zookeeper/conf

2.配置文件

1.修改zoo_sample.cfg名称

[root@hadoop01 conf]# mv zoo_sample.cfg zoo.cfg

2.创建一个存放zk数据的目录:

[root@hadoop01 zookeeper]# mkdir /opt/module/zookeeper/data

3.配置zoo.cfg


[root@hadoop01 zookeeper]# vim conf/zoo.cfg 
 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/module/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
#配置集群的机器
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888

4.测试


[root@hadoop01 data]# cd /opt/module/zookeeper/data/
[root@hadoop01 data]# touch myid
[root@hadoop01 data]# echo 1 > myid
[root@hadoop01 data]# cat myid
1

3.把集群文件同步到其它机器

1.同步文件

[root@hadoop01 zookeeper]# /root/bin/xsync /opt/module/zookeeper

2.写入


[root@hadoop01 zookeeper]# cd /opt/module/zookeeper/data/
[root@hadoop01 data]# cat myid
1
[root@hadoop02 data]# echo 2 > myid
[root@hadoop02 data]# cat myid
2
[root@hadoop03 module]# cd /opt/module/zookeeper/data/
[root@hadoop03 data]# echo 3 > myid
[root@hadoop03 data]# cat myid
3

4.集群的启动与关闭


[root@hadoop01 data]# cd /opt/module/zookeeper/
[root@hadoop02 data]# cd /opt/module/zookeeper/
[root@hadoop03 data]# cd /opt/module/zookeeper/
在三台机器中执行命令:
# bin/zkServer.sh start  ---启动命令
# bin/zkServer.sh stop  ---停止命令
# bin/zkServer.sh status  ---查看状态命令

5.检查集群启动的状态


[root@hadoop01 zookeeper]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Mode: follower
 
[root@hadoop02 zookeeper]#  bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Mode: leader
 
[root@hadoop03 zookeeper]#  bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Mode: follower

六、安装hbase

1.安装

1.上传安装包

hbase-1.3.1-bin.tar.gz

2.解压

[root@hadoop01 software]# tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module/

2.配置

1.hbase-env.sh修改内容：


[root@hadoop01 conf]# pwd
/opt/module/hbase/conf
[root@hadoop01 conf]# vim hbase-env.sh
 27 export JAVA_HOME=/opt/module/jdk1.8.0_144
129 export HBASE_MANAGES_ZK=false

2.hbase-site.xml修改内容


[root@hadoop01 conf]# vim hbase-site.xml 
<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://hadoop01:9000/hbase</value>
        </property>
 
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
 
   <!-- 0.98后的新变动，之前版本没有.port,默认端口为60000 -->
        <property>
                <name>hbase.master.port</name>
                <value>16000</value>
        </property>
 
        <property>
                <name>hbase.zookeeper.quorum</name>
             <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
        </property>
 
        <property>
                <name>hbase.zookeeper.property.dataDir</name>
             <value>/opt/module/zookeeper/data</value>
        </property>
</configuration>

3.配置regionservers


[root@hadoop01 conf]# vim regionservers 
 
hadoop01
hadoop02
hadoop03

4.软连接hadoop配置文件到hbase


[root@hadoop01 module]$ ln -s /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml 
/opt/module/hbase/conf/core-site.xml
[root@hadoop01 module]$ ln -s /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml 
/opt/module/hbase/conf/hdfs-site.xml

3.同步的其他集群

[root@hadoop02 module]$ xsync hbase/

4.HBase服务的启动

1.启动


[root@hadoop02 hbase]# bin/start-hbase.sh
starting master, logging to /opt/module/hbase/bin/../logs/hbase-root-master-hadoop02.out
hadoop03: starting regionserver, logging to /opt/module/hbase/bin/../logs/hbase-root-regionserver-hadoop03.out
hadoop01: starting regionserver, logging to /opt/module/hbase/bin/../logs/hbase-root-regionserver-hadoop01.out
hadoop02: starting regionserver, logging to /opt/module/hbase/bin/../logs/hbase-root-regionserver-hadoop02.out

2.关闭

[root@hadoop02 hbase]$ bin/stop-hbase.sh

5.打开端口号16010

六、安装spark

1.安装

1.上传安装包

spark-2.1.1-bin-hadoop2.7.tgz，官网下载

2.解压

[root@hadoop01 software]# tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz -C /opt/module/

2.配置

1.配置slaves


[root@hadoop01 module]# mv spark-2.1.1-bin-hadoop2.7 spark
[root@hadoop01 conf]# mv slaves.template slaves 
[root@hadoop01 conf]# vim slaves 
 
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
 
# A Spark Worker will be started on each of the machines listed below.
hadoop01
hadoop02
hadoop03

2.修改spark-env.sh文件


[root@hadoop01 conf]# vim spark-env.sh
SPARK_MASTER_HOST=hadoop01
SPARK_MASTER_PORT=7077

3.配置spark-config.sh文件


[root@hadoop01 sbin]# vim spark-config.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

3.同步其他集群

[root@hadoop01 module]$ xsync spark/

4.启动


[root@hadoop01 spark]$ sbin/start-all.sh
[root@hadoop01 spark]$ util.sh 
================root@hadoop01================
3330 Jps
3238 Worker
3163 Master
================root@hadoop02================
2966 Jps
2908 Worker
================root@hadoop03================
2978 Worker
3036 Jps

5.查看UI页面，端口号8080

5.JobHistoryServer配置

1.修改spark-default.conf.template名称

[root@hadoop01 conf]$ mv spark-defaults.conf.template spark-defaults.conf

2.修改spark-default.conf文件，开启Log：

注意：HDFS上的目录需要提前存在。

没有就创建目录

[root@hadoop01 conf]# hdfs dfs -mkdir directory


[root@hadoop01 conf]# vim spark-defaults.conf 
 
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
 
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
 
# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hadoop01:9000/directory

3.修改spark-env.sh文件，添加如下配置：


[root@hadoop01 conf]# vim spark-env.sh 
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 
-Dspark.history.retainedApplications=30 
-Dspark.history.fs.logDirectory=hdfs://hadoop01:9000/directory"

4.同步到其他集群

[root@hadoop01 conf]# xsync /opt/module/spark/conf

5.启动历史服务

[root@hadoop01 spark]# sbin/start-history-server.sh

6.再次执行任务


[root@hadoop102 spark]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop102:7077 \
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100

7.查看历史服务，端口号18080

七、安装flume

1.安装

1.上传安装包

apache-flume-1.7.0-bin.tar.gz，官网下载

2.解压

[root@hadoop01 software]# tar -zxvf apache-flume-1.7.0-bin.tar.gz -C /opt/module/

3.修改apache-flume-1.7.0-bin的名称为flume

[root@hadoop01 software]# mv apache-flume-1.7.0-bin flume

2.配置

1.将flume/conf下的flume-env.sh.template文件修改为flume-env.sh，并配置flume-env.sh文件


[root@hadoop01 software]# mv flume-env.sh.template flume-env.sh
[root@hadoop01 software]# vim flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

3.同步

[root@hadoop01 flume]# /root/bin/xsync flume/

相关阅读:
OA项目之会议通知（查询&是否参会&反馈详情）
[Python]Pipenv虛擬環境的嘗試與Bug解除
 Postgresql 主从复制+主从切换（流复制）
超好用的大数据分析平台分享，SuccBI 一站式大数据分析平台
 安装window的Charles抓包工具
 网络安全形势迫在眉睫！云WAF保护私有云安全！
Flutter的Event Loop
前端设计模式——过滤器模式
 组件协作模式
 DNSPod十问党霏霏：充电桩是披着高科技外皮的传统基建？
原文地址：https://blog.csdn.net/m0_55834564/article/details/126816641