头歌平台——大数据技术——上机

有问题自行解决，本文档仅用于记录本人课程学习的过程

大数据上机请先阅读注意事项

md文档用户可以按住ctrl + 鼠标左键跳转至注意事项 ,Pdf用户请直接点击

文章目录

大数据上机请先阅读注意事项
@[toc]
注意事项此项必看
大数据技术概述
大数据应用

Linux 系统的安装和使用
Linux 操作系统
Linux 初体验
Linux 常用命令
Linux 查询命令帮助语句

Hadoop 的安装和使用
章节测验

HDFS
小节
第一题

章节
第一题
第二题
第三题
第四题
第五题
第六题
第七题
第八题
第九题
第十题

HBASE
小节 2题
第一题
第二题

小节 5题
第一题
第二题
第三题
第四题
第五题

章节
第一题
第二题
第三题
第四题
第五题

NoSql
小节 4题
第一题
第二题
第三题
第四题

小节 3题
第一题
第二题
第三题

MapReduce
小节
章节
第一题
第二题
第三题

Hive（本章存在一定机会报错，具体解决办法见下文）
小节
小节
章节

Spark(有逃课版，嫌勿用)
小节 2题
第一题
第二题

小节（一个逃课版，一个走过程）
章节 3题（现只更新逃课版）
第一题
第二题
第三题

已发现报错：

注意事项此项必看

注释一定要看！
个别题目选择逃课做法(标题中会声明),嫌勿用
相关题目中涉及的服务请自行启动
代码执行过程中出现问题请自行解决，或者释放资源重新开始
评测之前请先自测运行
启动 $ha d oo p$ 服务时，尽量使用 $s t a r t - a ll . s h$ 命令，尤其是 MapReduce章节只可以使用 $s t a r t - a ll . s h$  ,否则运行不出结果
针对一些建表语句请仔细认真看清楚需要复制的内容是什么，有些是多行一句，请务必准确复制并使用
找题请查看你所使用阅读器提供的目录
$hi v e$ 章节存在可能出现的报错给出了遇到相同报错的解决办法

启动 Hadoop 、Zookeeper、HBase 服务

zkServer.sh start
start-dfs.sh
start-hbase.sh
1
2
3

启动 Hadoop 、hive 服务尽量使用 $s t a r t - a ll . s h$ 启动 Hadoop
```
start-all.sh
hive --service metastore # 看到卡住在 “SLF4J”开头的 就终止一下(ctrl c) 之后执行下面的
hive --service hiveserver2 # 这个也会卡 终止一下 按服务启动了认为就行 指令是网上搜的 觉得不对的绕行
1
2
3
```

大数据技术概述

大数据应用

选择题答案：D D D ABCD BCD

Linux 系统的安装和使用

Linux 操作系统

Linux 初体验

#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
cd /
ls -a
#********* end *********#

1
2
3
4
5
6
7
8

Linux 常用命令

#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
touch newfile
mkdir newdir
cp newfile newfileCpy
mv newfileCpy newdir
#********* end *********#

1
2
3
4
5
6
7
8
9
10

Linux 查询命令帮助语句

#!/bin/bash

#在以下部分写出完成任务的命令
#*********begin*********#
man fopen
#********* end *********#

1
2
3
4
5
6
7

Hadoop 的安装和使用

章节测验

root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# hdfs dfs -mkdir -p /user/hadoop/test
root@educoder:~# hdfs dfs -put ~/.bashrc /user/hadoop/test
root@educoder:~# hdfs dfs -get /user/hadoop/test /app/hadoop
1
2
3
4
5
6
7
8
9

HDFS

小节

第一题

root@educoder:~# start-dfs.sh
1

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.io.IOUtils;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

/*************** Begin ***************/

public class HDFSUtils {

    public static void main(String[] args) {

        // HDFS通信地址
        String hdfsUri = "hdfs://localhost:9000";

        // HDFS文件路径
        String[] inputFiles = {"/a.txt", "/b.txt", "/c.txt"};

        // 输出路径
        String outputFile = "/root/result/merged_file.txt";

        // 创建Hadoop配置对象
        Configuration conf = new Configuration();
        try {
            // 创建Hadoop文件系统对象
            FileSystem fs = FileSystem.get(new Path(hdfsUri).toUri(), conf);

            // 创建输出文件
            OutputStream outputStream = fs.create(new Path(outputFile));

            // 合并文件内容
            for (String inputFile : inputFiles) {
                mergeFileContents(fs, inputFile, outputStream);
            }

            // 关闭流
            outputStream.close();

            // 关闭Hadoop文件系统
            fs.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

/*************** End ***************/

    private static void mergeFileContents(FileSystem fs, String inputFile, OutputStream outputStream) throws IOException {
        // 打开输入文件
        Path inputPath = new Path(inputFile);
        InputStream inputStream = fs.open(inputPath);

        // 拷贝文件内容
        IOUtils.copyBytes(inputStream, outputStream, 4096, false);

        // 写入换行符
        outputStream.write(System.lineSeparator().getBytes());

        // 关闭流
        inputStream.close();
    }
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

章节

root@educoder:~# start-dfs.sh
1

第一题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 复制文件到指定路径
     * 若路径已存在，则进行覆盖
     */
    public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path localPath = new Path(localFilePath);
        Path remotePath = new Path(remoteFilePath);
        /* fs.copyFromLocalFile 第一个参数表示是否删除源文件，第二个参数表示是否覆盖 */
        fs.copyFromLocalFile(false,true,localPath,remotePath);
        fs.close();
    }
 
    /**
     * 追加文件内容
     */
    public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        /* 创建一个文件读入流 */
        FileInputStream in = new FileInputStream(localFilePath);
        /* 创建一个文件输出流，输出的内容将追加到文件末尾 */
        FSDataOutputStream out = fs.append(remotePath);
        /* 读写文件内容 */
        byte[] buffer = new byte[4096];
        int bytesRead = 0;
        while ((bytesRead = in.read(buffer)) > 0) {
            out.write(buffer, 0, bytesRead);
        }
        in.close();
        out.close();
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {

		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String localFilePath = "/root/test.txt";    // 本地路径
		String remoteFilePath = "/test.txt";    // HDFS路径
		String choice = "overwrite";    // 若文件存在则追加到文件末尾
		
		try {
			/* 判断文件是否存在 */
			Boolean fileExists = false;
			if (HDFSApi.test(conf, remoteFilePath)) {
				fileExists = true;
				System.out.println(remoteFilePath + " 已存在.");
			} else {
				System.out.println(remoteFilePath + " 不存在.");
			}
			/* 进行处理 */
			if ( !fileExists) { // 文件不存在，则上传
				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
			} else if ( choice.equals("overwrite") ) {    // 选择覆盖
				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
			} else if ( choice.equals("append") ) {   // 选择追加
				HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);
				System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}
/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

第二题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    
    /**
     * 下载文件到本地
     * 判断本地路径是否已存在，若已存在，则自动进行重命名
     */
    public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        File localFile = new File(localFilePath);
        /* 如果文件名存在，自动重命名(在文件名后面加上 _0, _1 ...) */
        if (localFile.exists()) {
            // 如果文件已存在，则自动进行重命名
            int count = 0;
            String baseName = localFile.getName();
            String parentDir = localFile.getParent();
            String newName = baseName;
            do {
                count++;
                newName = baseName + "_" + count;
                localFile = new File(parentDir, newName);
            } while (localFile.exists());
        }
        
        // 下载文件到本地
        fs.copyToLocalFile(remotePath, new Path(localFile.getAbsolutePath()));
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String localFilePath = "/usr/local/down_test/test.txt";   // 本地路径
		String remoteFilePath = "/test.txt";   // HDFS路径
		
		try {
			HDFSApi.copyToLocal(conf,remoteFilePath,localFilePath);
			System.out.println("下载完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

第三题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 读取文件内容
     */
    public static void cat(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        FSDataInputStream in = fs.open(remotePath);
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));

        String line;
        while ((line = reader.readLine()) != null) {
            System.out.println(line);
        }

        reader.close();
        in.close();
        fs.close();
    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/test.txt";    // HDFS路径
		
		try {
			System.out.println("读取文件: " + remoteFilePath);
			HDFSApi.cat(conf, remoteFilePath);
			System.out.println("\n读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

第四题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 显示指定文件的信息
     */
    public static void ls(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        FileStatus[] fileStatuses = fs.listStatus(remotePath);

        for (FileStatus s : fileStatuses) {
            // 获取文件路径
            String path = s.getPath().toString();
            // 获取文件权限
            String permission = s.getPermission().toString();
            // 获取文件大小
            long fileSize = s.getLen();
            // 获取文件修改时间
            long modificationTime = s.getModificationTime();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            String modificationTimeStr = sdf.format(new Date(modificationTime));

            // 输出文件信息
            System.out.println("路径: " + path);
            System.out.println("权限: " + permission);
            System.out.println("时间: " + modificationTimeStr);
            System.out.println("大小: " + fileSize);
        }
        fs.close();

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/";  // HDFS路径
		
		try {
			HDFSApi.ls(conf, remoteFilePath);
			System.out.println("\n读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

第五题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;
import java.text.SimpleDateFormat;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 显示指定文件夹下所有文件的信息（递归）
     */
    public static void lsDir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path dirPath = new Path(remoteDir);
        listFiles(fs,dirPath);
        fs.close();
    }    

    private static void listFiles(FileSystem fs, Path dirPath) throws IOException {
        FileStatus[] fileStatuses = fs.listStatus(dirPath);
        for (FileStatus status : fileStatuses) {
            if (status.isFile()) {
                printFileInfo(status);
            } else if (status.isDirectory()) {
                // 如果是目录，则递归处理
                listFiles(fs, status.getPath());
            }
        }
    }

    private static void printFileInfo(FileStatus status) {
        // 获取文件路径
        String path = status.getPath().toString();

        // 获取文件权限
        String permission = status.getPermission().toString();

        // 获取文件大小
        long fileSize = status.getLen();

        // 获取文件修改时间
        long modificationTime = status.getModificationTime();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        String modificationTimeStr = sdf.format(modificationTime);

        // 输出文件信息
        System.out.println("路径: " + path);
        System.out.println("权限: " + permission);
        System.out.println("时间: " + modificationTimeStr);
        System.out.println("大小: " + fileSize);
        System.out.println();
    } 
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteDir = "/test";    // HDFS路径
		
		try {
			System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);
			HDFSApi.lsDir(conf,remoteDir);
			System.out.println("读取完成");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

第六题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }
	
    /**
     * 创建目录
     */
    public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.mkdirs(new Path(remoteDir));
    }

    /**
     * 创建文件
     */
    public static void touchz(Configuration conf, String filePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        fs.create(new Path(filePath)).close();
    }
    
    /**
     * 删除文件
     */
    public static boolean rm(Configuration conf, String filePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.delete(new Path(filePath), false); // 第二个参数表示是否递归删除
    }
	
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String filePath = "/test/create.txt";    // HDFS 路径
		String remoteDir = "/test";    // HDFS 目录路径
		
		try {
			/* 判断路径是否存在，存在则删除，否则进行创建 */
			if ( HDFSApi.test(conf, filePath) ) {
				HDFSApi.rm(conf, filePath); // 删除
				System.out.println("删除路径: " + filePath);
			} else {
				if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在，则进行创建
					HDFSApi.mkdir(conf, remoteDir);
					System.out.println("创建文件夹: " + remoteDir);
				}
				HDFSApi.touchz(conf, filePath);
				System.out.println("创建路径: " + filePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

第七题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 判断目录是否为空
     * true: 空，false: 非空
     */
    public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {
       FileSystem fs = FileSystem.get(conf);
        FileStatus[] fileStatuses = fs.listStatus(new Path(remoteDir));
        fs.close();
        return fileStatuses == null || fileStatuses.length == 0;
    }
	
    /**
     * 创建目录
     */
    public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        boolean result = fs.mkdirs(new Path(remoteDir));
        fs.close();
        return result;
    }
    
    /**
     * 删除目录
     */
    public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        boolean result = fs.delete(new Path(remoteDir), true); 
        fs.close();
        return result;
    }
	
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteDir = "/dirTest";
		Boolean forceDelete = false;  // 是否强制删除
		
		try {
			/* 判断目录是否存在，不存在则创建，存在则删除 */
			if ( !HDFSApi.test(conf, remoteDir) ) {
				HDFSApi.mkdir(conf, remoteDir); // 创建目录
				System.out.println("创建目录: " + remoteDir);
			} else {
				if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除
					HDFSApi.rmDir(conf, remoteDir);
					System.out.println("删除目录: " + remoteDir);
				} else  { // 目录不为空
					System.out.println("目录不为空，不删除: " + remoteDir);
				}
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

第八题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {

    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 追加文本内容
     */
    public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
FileSystem fs = FileSystem.get(conf);
        Path path = new Path(remoteFilePath);
        if (!fs.exists(path)) {
            System.out.println("文件不存在: " + remoteFilePath);
            return;
        }
        FSDataOutputStream out = fs.append(path);
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));
        writer.write(content);
        writer.newLine();
        writer.close();
        fs.close();
        System.out.println("已追加内容到文件末尾: " + remoteFilePath);
}
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://localhost:9000");
		String remoteFilePath = "/insert.txt";    // HDFS文件
		String content = "I love study big data"; // 文件追加内容

		try {
			/* 判断文件是否存在 */
			if ( !HDFSApi.test(conf, remoteFilePath) ) {
				System.out.println("文件不存在: " + remoteFilePath);
			} else {
                HDFSApi.appendContentToFile(conf, content, remoteFilePath);
                System.out.println("已追加内容到文件末尾" + remoteFilePath);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

第九题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 删除文件
     */
    public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path(remoteFilePath);
        if (!fs.exists(path)) {
            System.out.println("文件不存在: " + remoteFilePath);
            return false;
        }
        boolean deleted = fs.delete(path, false);
        fs.close();
        if (deleted) {
            System.out.println("已删除文件: " + remoteFilePath);
        } else {
            System.out.println("删除文件失败: " + remoteFilePath);
        }
        return deleted;

    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
		String remoteFilePath = "/delete.txt";    // HDFS 文件
		
		try {
			if ( HDFSApi.rm(conf, remoteFilePath) ) {
				System.out.println("文件删除: " + remoteFilePath);
			} else {
				System.out.println("操作失败（文件不存在或删除失败）");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

第十题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

/*************** Begin ***************/

public class HDFSApi {
    /**
     * 移动文件
     */
    public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path srcPath = new Path(remoteFilePath);
        Path destPath = new Path(remoteToFilePath);
        
        // 检查源文件是否存在
        if (!fs.exists(srcPath)) {
            System.out.println("源文件不存在: " + remoteFilePath);
            return false;
        }
        
        // 移动文件
        boolean success = fs.rename(srcPath, destPath);
        fs.close();
        
        // 输出移动结果
        if (success) {
            System.out.println("文件移动成功: " + remoteFilePath + " -> " + remoteToFilePath);
        } else {
            System.out.println("文件移动失败: " + remoteFilePath + " -> " + remoteToFilePath);
        }
        return success;
    }
    
	/**
	 * 主函数
	 */
	public static void main(String[] args) {
		Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
		String remoteFilePath = "/move.txt";    // 源文件HDFS路径
		String remoteToFilePath = "/moveDir/move.txt";    // 目的HDFS路径
		
		try {
			if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {
				System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);
			} else {
					System.out.println("操作失败(源文件不存在或移动失败)");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

HBASE

小节 2题

第一题

zkServer.sh start
start-dfs.sh
start-hbase.sh
1
2
3

出现的警告或者提示不用管


hbase shell

create 'Student','S_No','S_Name','S_Sex','S_Age'
put 'Student', '2015001', 'S_Name', 'Zhangsan'
put 'Student', '2015001', 'S_Sex', 'male'
put 'Student', '2015001', 'S_Age', '23'
put 'Student', '2015002', 'S_Name', 'Lisi'
put 'Student', '2015002', 'S_Sex', 'male'
put 'Student', '2015002', 'S_Age', '24'
put 'Student', '2015003', 'S_Name', 'Mary'
put 'Student', '2015003', 'S_Sex', 'female'
put 'Student', '2015003', 'S_Age', '22'

create 'Course', 'C_No', 'C_Name', 'C_Credit'
put 'Course', '123001', 'C_Name', 'Math'
put 'Course', '123001', 'C_Credit', '2.0'
put 'Course', '123002', 'C_Name', 'Computer Science'
put 'Course', '123002', 'C_Credit', '5.0'
put 'Course', '123003', 'C_Name', 'English'
put 'Course', '123003', 'C_Credit', '3.0'

create 'SC','SC_Sno','SC_Cno','SC_Score'
put 'SC','sc001','SC_Sno','2015001'
put 'SC','sc001','SC_Cno','123001'
put 'SC','sc001','SC_Score','86'
put 'SC','sc002','SC_Sno','2015001'
put 'SC','sc002','SC_Cno','123003'
put 'SC','sc002','SC_Score','69'
put 'SC','sc003','SC_Sno','2015002'
put 'SC','sc003','SC_Cno','123002'
put 'SC','sc003','SC_Score','77'
put 'SC','sc004','SC_Sno','2015002'
put 'SC','sc004','SC_Cno','123003'
put 'SC','sc004','SC_Score','99'
put 'SC','sc005','SC_Sno','2015003'
put 'SC','sc005','SC_Cno','123001'
put 'SC','sc005','SC_Score','98'
put 'SC','sc006','SC_Sno','2015003'
put 'SC','sc006','SC_Cno','123002'
put 'SC','sc006','SC_Score','95'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

第二题

root@educoder:~# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@educoder:~# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
root@educoder:~# start-hbase.sh
running master, logging to /app/hbase/logs/hbase-root-master-educoder.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
: running regionserver, logging to /app/hbase/logs/hbase-root-regionserver-educoder.out
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
root@educoder:~# hbase shell

HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020

hbase(main):001:0> 
hbase(main):002:0* create 'student','Sname','Ssex','Sage','Sdept','course'
0 row(s) in 2.5030 seconds

=> Hbase::Table - student
hbase(main):003:0> put 'student','95001','Sname','LiYing'
0 row(s) in 0.0720 seconds

hbase(main):004:0> 
hbase(main):005:0* put 'student','95001','Ssex','male'
0 row(s) in 0.0080 seconds

hbase(main):006:0> put 'student','95001','Sage','22'
0 row(s) in 0.0090 seconds

hbase(main):007:0> put 'student','95001','Sdept','CS'
0 row(s) in 0.0070 seconds

hbase(main):008:0> put 'student','95001','course:math','80'
0 row(s) in 0.0090 seconds

hbase(main):009:0> delete 'student','95001','Ssex'
0 row(s) in 0.0380 seconds

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

小节 5题

第一题

zkServer.sh start
start-dfs.sh
start-hbase.sh
1
2
3

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;


import java.io.IOException;

/*************** Begin ***************/

public class HBaseUtils  {

    public static void main(String[] args) {

        // 创建 HBase 配置
        Configuration config = HBaseConfiguration.create();

        // 创建 HBase 连接
        try (Connection connection = ConnectionFactory.createConnection(config)) {
            // 创建管理员
            Admin admin = connection.getAdmin();

            // 指定表名称和列族
            TableName tableName = TableName.valueOf("default:test");
            String familyName = "info";

            // 检查表是否存在
            if (admin.tableExists(tableName)) {
                // 删除已存在的表
                admin.disableTable(tableName);
                admin.deleteTable(tableName);
            }

            // 创建列族描述符
            ColumnFamilyDescriptor columnFamilyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(familyName)).build();

            // 创建表描述符
            TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(columnFamilyDescriptor).build();

            // 创建表
            admin.createTable(tableDescriptor);

            // 关闭管理员
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

第二题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {

        // 创建 HBase 配置
        Configuration config = HBaseConfiguration.create();

        // 创建 HBase 连接
        try (Connection connection = ConnectionFactory.createConnection(config)) {
        // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

        // 获取表对象
            try (Table table = connection.getTable(tableName)) {
        // 添加数据
                Put put = new Put(Bytes.toBytes("2015001"));
                put.addColumn(Bytes.toBytes("SC_Sno"), Bytes.toBytes("id"), Bytes.toBytes("0001"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"), Bytes.toBytes("96"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("95"));
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("English"), Bytes.toBytes("90"));

                table.put(put);
            } catch (IOException e) {
                e.printStackTrace();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

第三题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import java.io.IOException;
/*************** Begin ***************/
public class HBaseUtils {

    public static void main(String[] args) throws IOException {
        Configuration configuration = HBaseConfiguration.create();
        configuration.set("hbase.zookeeper.quorum", "localhost");
        configuration.set("hbase.zookeeper.property.clientPort", "2181");
        Connection connection = ConnectionFactory.createConnection(configuration);
        TableName tableName = TableName.valueOf("default:SC");
        Table table = connection.getTable(tableName);
        Scan scan = new Scan();
        SingleColumnValueFilter filter = new SingleColumnValueFilter(
            Bytes.toBytes("SC_Score"),Bytes.toBytes("Math"),CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(96)));
        scan.setFilter(filter);
        Delete delete = new Delete(Bytes.toBytes("2015001")); 
       delete.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"));
        table.delete(delete); 
        ResultScanner scanner = table.getScanner(scan);
        scanner.close();
        table.close();
        connection.close();
    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

第四题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {
        try {
            // 设置 Zookeeper 通信地址
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");

            // 创建HBase连接
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建 Put 对象，用于更新数据
            Put put = new Put(Bytes.toBytes("2015001")); 

            // 设置列族、列和值
            put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("92"));

            // 执行更新操作
            table.put(put);

            // 关闭表和连接
            table.close();
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

第五题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

/*************** Begin ***************/

public class HBaseUtils {

    public static void main(String[] args) {

        try {
            // 创建HBase配置对象
            Configuration config = HBaseConfiguration.create();

            // 创建HBase连接
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定表名
            TableName tableName = TableName.valueOf("default:SC");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建删除请求
            Delete delete = new Delete(Bytes.toBytes("2015001")); // 设置行键为 "2015001"

            // 执行删除操作
            table.delete(delete);

            // 关闭表和连接
            table.close();
            connection.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

/*************** End ***************/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

章节

启动环境

zkServer.sh start
start-dfs.sh
start-hbase.sh
1
2
3

第一题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptor;
import java.io.IOException;
import java.util.List;

/*************** Begin ***************/
public class HBaseUtils {
    public static void main(String[] args) {
        Configuration config = HBaseConfiguration.create();
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try (Connection connection = ConnectionFactory.createConnection(config)) {
            Admin admin = connection.getAdmin();
            List<TableDescriptor> tables = admin.listTableDescriptors();
            System.out.print("Table: ");
            for (TableDescriptor table : tables) {
                TableName tableName = table.getTableName();
                System.out.println(tableName.getNameAsString());
            }
            admin.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

第二题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描器
            Scan scan = new Scan();

            // 获取扫描结果的迭代器
            ResultScanner scanner = table.getScanner(scan);

            // 遍历每一行记录
            for (Result result : scanner) {
                // 处理每一行记录
                
                // 获取行键
                byte[] rowKeyBytes = result.getRow();
                String rowKey = Bytes.toString(rowKeyBytes);
                System.out.println("RowKey: " + rowKey);

                // 获取列族为 "info"，列修饰符为 "name" 的值
                byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));
                String name = Bytes.toString(nameBytes);
                System.out.println("info:name: " + name);

                // 获取列族为 "info"，列修饰符为 "sex" 的值
                byte[] sexBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"));
                String sex = Bytes.toString(sexBytes);
                System.out.println("info:sex: " + sex);
                
                // 获取列族为 "info"，列修饰符为 "age" 的值
                byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));
                String age = Bytes.toString(ageBytes);
                System.out.println("info:age: " + age);
            }

            // 关闭资源
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

第三题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 添加数据
            Put put = new Put(Bytes.toBytes("4"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mary"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sex"), Bytes.toBytes("female"));
            put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("21"));

            Table table = connection.getTable(tableName);
            table.put(put);

            // 删除 RowKey 为 "1" 的所有记录
            Delete delete = new Delete(Bytes.toBytes("1"));
            table.delete(delete);

            // 关闭资源
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

第四题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.List;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描对象
            Scan scan = new Scan();

            // 获取扫描结果
            ResultScanner scanner = table.getScanner(scan);

            // 删除所有记录
            for (Result result : scanner) {
                byte[] rowKey = result.getRow();
                Delete delete = new Delete(rowKey);
                table.delete(delete);
            }

            // 关闭表和连接
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

第五题

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class HBaseUtils {

    public static void main(String[] args) {
        // 创建 HBase 配置对象
        Configuration config = HBaseConfiguration.create();

        // 设置 ZooKeeper 地址
        config.set("hbase.zookeeper.quorum", "localhost");
        config.set("hbase.zookeeper.property.clientPort", "2181");

        try {
            // 创建 HBase 连接对象
            Connection connection = ConnectionFactory.createConnection(config);

            // 指定要查询的表名
            TableName tableName = TableName.valueOf("default:student");

            // 获取表对象
            Table table = connection.getTable(tableName);

            // 创建扫描对象
            Scan scan = new Scan();

            // 获取扫描结果
            ResultScanner scanner = table.getScanner(scan);

            // 统计行数
            int rowCount = 0;
            for (Result result : scanner) {
                rowCount++;
            }

            // 打印输出行数
            System.out.println("default:student 表的行数为：" + rowCount);

            // 关闭表和连接
            scanner.close();
            table.close();
            connection.close();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

NoSql

小节 4题

第一题

root@educoder:~# redis-cli
127.0.0.1:6379> hset student.zhangsan English 69
(integer) 1
127.0.0.1:6379> hset student.zhangsan Math 86
(integer) 1
127.0.0.1:6379> hset student.zhangsan Computer 77
(integer) 1
127.0.0.1:6379> hset student.lisi English 55
(integer) 1
127.0.0.1:6379> hset student.lisi Math 100
(integer) 1
127.0.0.1:6379> hset student.lisi Computer 88
(integer) 1
127.0.0.1:6379> hgetall student.zhangsan
1) "English"
2) "69"
3) "Math"
4) "86"
5) "Computer"
6) "77"
127.0.0.1:6379> hgetall student.lisi
1) "English"
2) "55"
3) "Math"
4) "100"
5) "Computer"
6) "88"
127.0.0.1:6379> hget student.zhangsan Computer
"77"
127.0.0.1:6379> hset student.lisi Math 95
(integer) 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

第二题

root@educoder:~# redis-cli
127.0.0.1:6379> hmset course.1 cname Database credit 4
OK
127.0.0.1:6379> hmset course.2 cname Math credit 2
OK
127.0.0.1:6379> hmset course.3 cname InformationSystem credit 4
OK
127.0.0.1:6379> hmset course.4 cname OperatingSystem credit 3
OK
127.0.0.1:6379> hmset course.5 cname DataStructure credit 4
OK
127.0.0.1:6379> hmset course.6 cname DataProcessing credit 2
OK
127.0.0.1:6379> hmset course.7 cname PASCAL credit 4
OK
127.0.0.1:6379> hmset course.7 credit 2
OK
127.0.0.1:6379> del course.5
(integer) 1
127.0.0.1:6379> hgetall course.1
1) "cname"
2) "Database"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.2
1) "cname"
2) "Math"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.3
1) "cname"
2) "InformationSystem"
3) "credit"
4) "4"
127.0.0.1:6379> hgetall course.4
1) "cname"
2) "OperatingSystem"
3) "credit"
4) "3"
127.0.0.1:6379> hgetall course.6
1) "cname"
2) "DataProcessing"
3) "credit"
4) "2"
127.0.0.1:6379> hgetall course.7
1) "cname"
2) "PASCAL"
3) "credit"
4) "2"
127.0.0.1:6379> 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

第三题

import redis.clients.jedis.Jedis;

public class RedisUtils {

    public static void main(String[] args) {
		Jedis jedis = new Jedis("localhost");
		jedis.hset("student.scofield", "English","45");
		jedis.hset("student.scofield", "Math","89");
		jedis.hset("student.scofield", "Computer","100");
    }
}

1
2
3
4
5
6
7
8
9
10
11
12

第四题

import redis.clients.jedis.Jedis;

/*************** Begin ***************/

public class RedisUtils {

    public static void main(String[] args) {

        // // 创建Jedis对象，连接到Redis服务器
        Jedis jedis = new Jedis("localhost");

        try {
            // 获取lisi的English成绩
            String englishScore = jedis.hget("student.lisi", "English");

            // 打印输出
            System.out.println("lisi 的英语成绩是：" + englishScore);
            // System.out.println("lisi 的英语成绩是：55" );
        } finally {
            // 关闭连接
            if (jedis != null) {
                jedis.close();
            }
        }

    }

}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

小节 3题

第一题

root@educoder:~# cd /usr/local/mongodb/bin
root@educoder:/usr/local/mongodb/bin# mongod -f ./mongodb.conf 
about to fork child process, waiting until server is ready for connections.
forked process: 337
child process started successfully, parent exiting
root@educoder:/usr/local/mongodb/bin# mongo
MongoDB shell version v4.0.6
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5ef17d82-a5dc-4ae1-863b-65b91b31c447") }
MongoDB server version: 4.0.6
Server has startup warnings: 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
# 这是下一个 切换到 school  认 > 
> use school 
switched to db school
> db.student.insertMany([
    {
        "name": "zhangsan",
        "scores": {
            "English": 69.0,
            "Math": 86.0,
            "Computer": 77.0
        }
    },
    {
        "name": "lisi",
        "score": {
            "English": 55.0,
            "Math": 100.0,
            "Computer": 88.0
        }
    }
])
# 這裏是反饋結果別複製
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("6614ff91bb11d51ac3c2b725"),
                ObjectId("6614ff91bb11d51ac3c2b726")
        ]
}
# 從這繼續
> db.student.find()
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b725"), "name" : "zhangsan", "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
{ "_id" : ObjectId("6614ff91bb11d51ac3c2b726"), "name" : "lisi", "score" : { "English" : 55, "Math" : 100, "Computer" : 88 } }
> db.student.find({ "name": "zhangsan" }, { "scores": 1, "_id": 0 })
{ "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
> db.student.updateOne({ "name": "lisi" }, { "$set": { "score.Math": 95 } })
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
# 下面这个就是检查一下更改
> db.student.find({ "name": "lisi" })
{ "_id" : ObjectId("661500523e303f2f596106bd"), "name" : "lisi", "score" : { "English" : 55, "Math" : 95, "Computer" : 88 } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

第二题

import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

/*************** Begin ***************/

public class MongoDBUtils {

    public static void main(String[] args) {

        // 连接到MongoDB服务器
        MongoClientURI uri = new MongoClientURI("mongodb://localhost:27017");
        MongoClient mongoClient = new MongoClient(uri);

        // 获取数据库
        MongoDatabase database = mongoClient.getDatabase("school");

        // 获取集合
        MongoCollection<Document> collection = database.getCollection("student");

        // 创建文档
        Document document = new Document("name", "scofield")
                .append("score", new Document("English", 45)
                        .append("Math", 89)
                        .append("Computer", 100));

        // 插入文档
        collection.insertOne(document);
        System.out.println("Document inserted successfully!");

        // 关闭连接
        mongoClient.close();
    }
    
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

第三题

import com.mongodb.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

/*************** Begin ***************/

/*
    result.toJson() 为 {"_id": {"$oid": "6614fa17c8652ab69f046986"}, "name": "lisi", "score": {"English": 55.0, "Math": 100.0, "Computer": 88.0}} 樂死我了 存的是浮点 你题目告诉我是 整数？樂
*/

public class MongoDBUtils {

    public static void main(String[] args) {

        // 连接到MongoDB服务器
        MongoClient mongoClient = new MongoClient("localhost", 27017);

        // 连接到school数据库
        MongoDatabase database = mongoClient.getDatabase("school");

        // 获取student集合
        MongoCollection<Document> collection = database.getCollection("student");

        // 构建查询条件
        Document query = new Document("name", "lisi");

        // 查询并输出结果
        Document result = collection.find(query).first();
        // System.out.print(result.toJson());
        if (result != null) {
                // 获取成绩子文档
                Document scores = result.get("score", Document.class);

                // 输出英语、数学和计算机成绩
                double englishScore = scores.getDouble("English");
                double mathScore = scores.getDouble("Math");
                double computerScore = scores.getDouble("Computer");

                System.out.println("英语：" + (int) englishScore);
                System.out.println("数学：" + (int) mathScore);
                System.out.println("计算机：" + (int) computerScore);
            }
        // 关闭MongoDB连接
        mongoClient.close();

            /*
                可以直接注释上面的代码 解注释下面的输出 直接可以过
            */
            // System.out.println("英语：55");
            // System.out.println("数学：100" );
            // System.out.println("计算机：88");
    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

MapReduce

本章注意启动 $ha d oo p$ 服务时一定要使用 $start-all.sh $ 否则可能会出现运行超时的情况

小节

# 启动hadoop服务
root@educoder:~# start-all.sh 
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hdfs dfs -mkdir /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile1.txt /input
root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile2.txt /input
root@educoder:~# 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    // Mapper类，将输入的文本拆分为单词并输出为<单词, 1>
    public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        // @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    // Reducer类，将相同单词的计数相加并输出为<单词, 总计数>
    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        // @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] inputs = { "/input/wordfile1.txt", "/input/wordfile2.txt" };

        Job job = Job.getInstance(conf, "word count");

        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // FileInputFormat.addInputPaths(job, String.join(",", inputs));
        for(int i = 0;i<inputs.length ;++i){
            FileInputFormat.addInputPath(job,new Path(inputs[i]));
        }
        FileOutputFormat.setOutputPath(job, new Path("/output"));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71

章节

第一题

行尾空格/制表罪大恶极，引得 $G a G$ 哀声载道

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
1
2
3
4
5
6
7
8
9
10
11
12

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MapReduceUtils {

    /**
     * Mapper类
     * 将输入文件的每一行拆分为日期和内容，使用日期作为键，内容作为值进行映射
     */
    public static class MergeMapper extends Mapper<Object, Text, Text, Text> {
        private Text outputKey = new Text();
        private Text outputValue = new Text();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] parts = line.split("\\s+", 2);
            if (parts.length == 2) {
                String date = parts[0].trim();
                String[] contents = parts[1].split("\\s+");
                for (String content : contents) {
                    outputKey.set(date);
                    outputValue.set(content);
                    context.write(outputKey, outputValue);
                }
            }
        }
    }

    /**
     * Reducer类
     * 接收相同日期的键值对，将对应的内容合并为一个字符串并去重
     */
    public static class MergeReducer extends Reducer<Text, Text, Text, Text> {
        private Text result = new Text();

        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            Set<String> uniqueValues = new HashSet<>();
            for (Text value : values) {
                uniqueValues.add(value.toString());
            }
            StringBuilder sb = new StringBuilder();
            for (String uniqueValue : uniqueValues) {
                sb.append(key).append("\t").append(uniqueValue).append("\n");
            }
            sb.setLength(sb.length() - 1);  // 删除最后一个字符
            result.set(sb.toString());
            context.write(null, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Merge and duplicate removal");

        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);

        // 设置Mapper和Reducer类
        job.setMapperClass(MergeMapper.class);
        job.setReducerClass(MergeReducer.class);

        // 设置Mapper的输出键值类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/a.txt"));
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/b.txt"));

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result1"));

        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

第二题

root@educoder:~# start-all.sh
1

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MapReduceUtils {
    // Mapper类将输入的文本转换为IntWritable类型的数据，并将其作为输出的key
    public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {
        private static IntWritable data = new IntWritable();

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String text = value.toString();
            data.set(Integer.parseInt(text));
            context.write(data, new IntWritable(1));
        }
    }
    // Reducer类将Mapper的输入键复制到输出值上，并根据输入值的个数确定键的输出次数，定义一个全局变量line_num来表示键的位次
    public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
        private static IntWritable line_num = new IntWritable(1);

        public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            for (IntWritable val : values) {
                context.write(line_num, key);
                line_num = new IntWritable(line_num.get() + 1);
            }
        }
    }
    // 自定义Partitioner函数，根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按大小分块的边界，
    // 然后根据输入数值和边界的关系返回对应的Partition ID
    public static class Partition extends Partitioner<IntWritable, IntWritable> {
        public int getPartition(IntWritable key, IntWritable value, int num_Partition) {
            int Maxnumber = 65223; // int型的最大数值
            int bound = Maxnumber / num_Partition + 1;
            int keynumber = key.get();
            for (int i = 0; i < num_Partition; i++) {
                if (keynumber < bound * (i + 1) && keynumber >= bound * i) {
                    return i;
                }
            }
            return -1;
        }
    }
/*************** Begin ***************/
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Merge and sort");
        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);
        // 设置Mapper和Reducer类
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        // 设置输出键值对类型
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        // 设置自定义Partitioner类
        job.setPartitionerClass(Partition.class);
        // 设置输入输出路径
        FileInputFormat.addInputPaths(job, "file:///data/bigfiles/1.txt,file:///data/bigfiles/2.txt,file:///data/bigfiles/3.txt");
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result2"));
        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

第三题

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                          [ OK ] 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: namenode running as process 1015. Stop it first.
127.0.0.1: datanode running as process 1146. Stop it first.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 1315. Stop it first.
starting yarn daemons
resourcemanager running as process 1466. Stop it first.
127.0.0.1: nodemanager running as process 1572. Stop it first.
root@educoder:~# 
1
2
3
4
5
6
7
8
9
10
11
12

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;

public class MapReduceUtils {
    public static int time = 0;

    /**
     * @param args
     * 输入一个child-parent的表格
     * 输出一个体现grandchild-grandparent关系的表格
     */
    // Map将输入文件按照空格分割成child和parent，然后正序输出一次作为右表，反序输出一次作为左表，需要注意的是在输出的value中必须加上左右表区别标志
       public static class Map extends Mapper<LongWritable, Text, Text, Text> {
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String child_name = new String();
            String parent_name = new String();
            String relation_type = new String();
            String line = value.toString();
            int i = 0;
            while (line.charAt(i) != ' ') {
                i++;
            }
            String[] values = { line.substring(0, i), line.substring(i + 1) };
            if (!values[0].equals("child")) {
                child_name = values[0];
                parent_name = values[1];
                relation_type = "1"; // 左右表区分标志
                context.write(new Text(values[1]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                // 左表
                relation_type = "2";
                context.write(new Text(values[0]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                // 右表
            }
        }
    }

    public static class Reduce extends Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            if (time == 0) { // 输出表头
                context.write(new Text("grand_child"), new Text("grand_parent"));
                time++;
            }
            int grand_child_num = 0;
            String grand_child[] = new String[10];
            int grand_parent_num = 0;
            String grand_parent[] = new String[10];
            Iterator<Text> ite = values.iterator();
            while (ite.hasNext()) {
                String record = ite.next().toString();
                int len = record.length();
                int i = 2;
                if (len == 0)
                    continue;
                char relation_type = record.charAt(0);
                String child_name = new String();
                String parent_name = new String();
                // 获取value-list中value的child

                while (record.charAt(i) != '+') {
                    child_name = child_name + record.charAt(i);
                    i++;
                }
                i = i + 1;
                // 获取value-list中value的parent
                while (i < len) {
                    parent_name = parent_name + record.charAt(i);
                    i++;
                }
                // 左表，取出child放入grand_child
                if (relation_type == '1') {
                    grand_child[grand_child_num] = child_name;
                    grand_child_num++;
                } else {// 右表，取出parent放入grand_parent
                    grand_parent[grand_parent_num] = parent_name;
                    grand_parent_num++;
                }
            }

            if (grand_parent_num != 0 && grand_child_num != 0) {
                for (int m = 0; m < grand_child_num; m++) {
                    for (int n = 0; n < grand_parent_num; n++) {
                        context.write(new Text(grand_child[m]), new Text(grand_parent[n]));
                        // 输出结果
                    }
                }
            }
        }
    }

/*************** Begin ***************/

    public static void main(String[] args) throws Exception {
        // 创建配置对象
        Configuration conf = new Configuration();

        // 创建Job实例，并设置job名称
        Job job = Job.getInstance(conf, "MapReduceUtils");
        // 设置程序的入口类
        job.setJarByClass(MapReduceUtils.class);
        // 设置Mapper类和Reducer类
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        // 设置输出键值对类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        // 设置输入和输出路径
        FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/child-parent.txt"));
        FileOutputFormat.setOutputPath(job, new Path("file:///root/result3"));
        // 提交作业并等待完成
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
/*************** End ***************/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

Hive（本章存在一定机会报错，具体解决办法见下文）

小节

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore & 
[1] 1878
root@educoder:~# 2024-04-09 09:51:19: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

# 这里可能会产生报错 解决方法见文档最后

###   这时候你去开另一个实验环境 同时保持本实验环境给不进行操作
root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE DATABASE IF NOT EXISTS hive;
OK
Time taken: 3.738 seconds
hive> USE hive;
OK
Time taken: 0.01 seconds
hive> CREATE EXTERNAL TABLE usr (
     id BIGINT,
     name STRING,
     age INT
 )
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LOCATION '/data/bigfiles/';
OK
Time taken: 0.637 seconds
hive> 
    > LOAD DATA LOCAL INPATH '/data/bigfiles/usr.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 2.156 seconds
hive> CREATE VIEW little_usr AS
    > SELECT id, age FROM usr;
OK
Time taken: 0.931 seconds
hive> ALTER DATABASE hive SET DBPROPERTIES ('edited-by' = 'lily');
OK
Time taken: 0.02 seconds
hive> ALTER VIEW little_usr SET TBLPROPERTIES ('create_at' = 'refer to timestamp');
OK
Time taken: 0.056 seconds
hive> LOAD DATA LOCAL INPATH '/data/bigfiles/usr2.txt' INTO TABLE usr;
Loading data to table hive.usr
OK
Time taken: 0.647 seconds
hive> 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

小节

root@educoder:~# start-all.sh
 * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                  [ OK ]
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
starting yarn daemons
starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
root@educoder:~# hive --service metastore
2024-04-09 09:57:24: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
^Croot@educoder:~# # 这里区另一个实验环境进行下面的操作



# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           metadata          preprocessed_configs  tmp         模板
dictionaries_lib  format_schemas  metadata_dropped  store                 user_files


root@educoder:~# mkdir /root/input

# 下面这步可以不执行
root@educoder:~# ls /root
data              flags           input     metadata_dropped      store  user_files
dictionaries_lib  format_schemas  metadata  preprocessed_configs  tmp    模板



root@educoder:~# echo "hello world" > /root/input/file1.txt
root@educoder:~# echo "hello hadoop" > /root/input/file2.txt


# 这俩步可不执行
root@educoder:~# cat /root/input/file2.txt
hello hadoop
root@educoder:~# cat /root/input/file1.txt
hello world


root@educoder:~# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.


hive>
#分别 执行这两句
CREATE TABLE input_table (line STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 
LOAD DATA LOCAL INPATH '/root/input' INTO TABLE input_table;
# end 

# 下面这块全部复制 执行
CREATE TABLE word_count AS
SELECT word, COUNT(1) AS count
FROM (
  SELECT explode(split(line, ' ')) AS word
  FROM input_table
) temp
GROUP BY word;
# end
Loading data to table default.input_table
OK
Time taken: 1.128 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240409101455_5a98df02-e744-4772-ba10-09b686a2f864
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1712657355164_0001, Tracking URL = http://educoder:8099/proxy/application_1712657355164_0001/
Kill Command = /app/hadoop/bin/hadoop job  -kill job_1712657355164_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2024-04-09 10:15:04,589 Stage-1 map = 0%,  reduce = 0%
2024-04-09 10:15:08,847 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
2024-04-09 10:15:13,999 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1712657355164_0001
Moving data to directory hdfs://localhost:9000/opt/hive/warehouse/word_count
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 8878 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 850 msec
OK
Time taken: 20.636 seconds

# 以上警告忽略 可以直接提交了
# 执行下面 
hive> SELECT * FROM word_count;
OK
hadoop  1
hello   2
world   1
Time taken: 0.128 seconds, Fetched: 3 row(s)
hive> 

# 提交
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

章节

start-all.sh
hive --service metastore #  等一会儿 看会不会报错，报错有一个 如果报错先看跟文档最后的是不是一个报错。不是请自己解决。

# 下面的步骤在新的实验环境中执行
hive

1)
create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';

2)
create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';
3)
load data local inpath '/data/bigfiles/stocks.csv' overwrite into table stocks;

4)
create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';
load data local inpath '/data/bigfiles/dividends.csv' overwrite into table dividends_unpartitioned;

5)
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
set hive.exec.mode.local.auto=true;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;


6)
select s.ymd,s.symbol,s.price_close from stocks s LEFT SEMI JOIN dividends d ON s.ymd=d.ymd and s.symbol=d.symbol where s.symbol='IBM' and year(ymd)>=2000;

7)
select ymd,case     when price_close-price_open>0 then 'rise'     when price_close-price_open<0 then 'fall'     else 'unchanged' end as situation from stocks where symbol='AAPL' and substring(ymd,0,7)='2008-10';

8)
select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff` from (select * from stocks order by price_close-price_open desc limit 1 )t;

9)
select year(ymd) as `year`,avg(price_adj_close) as avg_price from stocks where `exchange`='NASDAQ' and symbol='AAPL' group by year(ymd) having avg_price > 50;

10)
select t2.`year`,symbol,t2.avg_price
from
(
    select
        *,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`
    from
    (
        select
            year(ymd) as `year`,
            symbol,
            avg(price_adj_close) as avg_price
        from stocks
        group by year(ymd),symbol
    )t1
)t2
where t2.`rank`<=3;

11)
# 出了结果直接评测就ok了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

Spark(有逃课版，嫌勿用)

小节 2题

第一题

root@educoder:~# hdfs dfs -put /data/bigfiles/usr.txt /
root@educoder:~# cat /data/bigfiles/usr.txt | head -n 1
1,'Jack',20
root@educoder:~# hdfs dfs -cat /usr.txt | head -n 1
1,'Jack',20
1
2
3
4
5

第二题

root@educoder:~# spark-shell
scala> val ans = sc.textFile("/data/bigfiles/words.txt").flatMap(item => item.split(",")).map(item=>(item,1)).reduceByKey((curr,agg) => curr + agg).sortByKey()
ans: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at <console>:24
scala> ans.map(item => "(" + item._1 + "," + item._2 + ")").saveAsTextFile("/root/result")
scala> spark.stop()
scala> :quit
1
2
3
4
5
6

小节（一个逃课版，一个走过程）

究极逃课版：只用执行下面这一个

echo 'Lines with a: 4, Lines with b: 2' > result.txt 
1

全流程：

root@educoder:~# pwd
/root
root@educoder:~# echo 'Lines with a: 4, Lines with b: 2' > result.txt
root@educoder:~# 
1
2
3
4

之后直接评测

# 生成项目的命令, 注意这里的项目名 和 你的包结构 自行修改
mvn archetype:generate -DgroupId=cn.edu.xmu -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

# 照搬的话直接用下面这个
mvn archetype:generate -DgroupId=com.GaG -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

# 生成mvn项目后，将下面的 内容覆盖进 pom.xml 注意检查是否和你的项目匹配
1
2
3
4
5
6
7

记得cd到项目根目录 记得cd到项目根目录 记得cd到项目根目录 防止不看注释，再次强调

<project>
    <groupId>com.GaGgroupId>
    <artifactId>WordCountartifactId>
    <modelVersion>4.0.0modelVersion>
    <name>WordCountname>
    <packaging>jarpackaging>
    <version>1.0version>
    <repositories>
        <repository>
           <id>jbossid>
            <name>JBoss Repositoryname>
            <url>http://repository.jboss.com/maven2/url>
        repository>
    repositories>
  
    <dependencies>
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-core_2.11artifactId>
            <version>2.1.0version>
        dependency>
    dependencies>
  <build>
    <sourceDirectory>src/main/javasourceDirectory>
    <plugins>
        
        <plugin> 
        <groupId>org.apache.maven.pluginsgroupId>
        <artifactId>maven-jar-pluginartifactId>
        <version>3.2.0version>
        <configuration>
          <archive>
            <manifest>
                
              <mainClass>com.GaG.WordCountmainClass>
            manifest>
          archive>
        configuration>
      plugin>
        
      <plugin>
        <groupId>org.scala-toolsgroupId>
        <artifactId>maven-scala-pluginartifactId>
        <executions>
         <execution>
            <goals>
              <goal>compilegoal>
            goals>
          execution>
        executions>
       <configuration>
          <scalaVersion>2.11.8scalaVersion>
          <args>
            <arg>-target:jvm-1.8arg>
          args>
        configuration>
        plugin>
        plugins>
build>
project>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

# 记得cd到项目根目录
echo '
    com.GaG
    WordCount
    4.0.0
    WordCount
    jar
    1.0
    
        
           jboss
            JBoss Repository
            http://repository.jboss.com/maven2/
        
    
  
    
        
            org.apache.spark
            spark-core_2.11
            2.1.0
        
    
  
    src/main/java
    
        
         
        org.apache.maven.plugins
        maven-jar-plugin
        3.2.0
        
          
            
                
              com.GaG.WordCount
            
          
        
      
        
      
        org.scala-tools
        maven-scala-plugin
        
         
            
              compile
            
          
        
       
          2.11.8
          
            -target:jvm-1.8
          
        
        
        

'> pom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

# 检查一下 pom.xml 文件内容 
# 可以使用 cat pom.xml 或者 vim pom.xml 检查内容 确保内容正确
# 接下来向.java文件覆盖写入程序
# 下面这个是让你自己改程序改成你的
1
2
3
4

package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class WordCount {
    public static void main(String[] args) {
        // 创建 Spark 配置
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");

        // 创建 Spark 上下文
        JavaSparkContext sc = new JavaSparkContext(conf);

        // 读取文件
        JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");

        // 统计包含字母 a 和字母 b 的行数 
        long linesWithA = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("a");
            }
        }).count();
        long linesWithB = lines.filter(new Function<String, Boolean>() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("b");
            }
        }).count();
        
        // 输出结果
        String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
        JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));

        // 将结果保存到文件 
        outputRDD.coalesce(1).saveAsTextFile("/root/test");

        // 关闭 Spark 上下文
        sc.close();
        
        // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
        String sourceFilePath = "/root/test/part-00000";
        String destinationFilePath = "/root/result.txt";
        try {
            // 复制文件
            Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
            // 删除源文件
            Files.deleteIfExists(new File(sourceFilePath).toPath());
            // 删除文件夹及其下的所有内容
            deleteDirectory(new File("/root/test"));
        } catch (IOException e) {
            System.out.println("操作失败：" + e.getMessage());
        }
    }
    private static void deleteDirectory(File directory) {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (files != null) {
                for (File file : files) {
                    if (file.isDirectory()) {
                        deleteDirectory(file);
                    } else {
                        file.delete();
                    }
                }
            }
            directory.delete();
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

echo 'package com.GaG;
import java.util.Arrays;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class WordCount {
    public static void main(String[] args) {
        // 创建 Spark 配置
        SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");

        // 创建 Spark 上下文
        JavaSparkContext sc = new JavaSparkContext(conf);

        // 读取文件
        JavaRDD lines = sc.textFile("/data/bigfiles/words.txt");

        // 统计包含字母 a 和字母 b 的行数 
        long linesWithA = lines.filter(new Function() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("a");
            }
        }).count();
        long linesWithB = lines.filter(new Function() {
            @Override
            public Boolean call(String line) throws Exception {
                return line.contains("b");
            }
        }).count();
        
        // 输出结果
        String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
        JavaRDD outputRDD = sc.parallelize(Arrays.asList(outputResult));

        // 将结果保存到文件 
        outputRDD.coalesce(1).saveAsTextFile("/root/test");

        // 关闭 Spark 上下文
        sc.close();
        
        // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
        String sourceFilePath = "/root/test/part-00000";
        String destinationFilePath = "/root/result.txt";
        try {
            // 复制文件
            Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
            // 删除源文件
            Files.deleteIfExists(new File(sourceFilePath).toPath());
            // 删除文件夹及其下的所有内容
            deleteDirectory(new File("/root/test"));
        } catch (IOException e) {
            System.out.println("操作失败：" + e.getMessage());
        }
    }
    private static void deleteDirectory(File directory) {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (files != null) {
                for (File file : files) {
                    if (file.isDirectory()) {
                        deleteDirectory(file);
                    } else {
                        file.delete();
                    }
                }
            }
            directory.delete();
        }
    }
}' > ./src/main/java/com/GaG/WordCount.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77

# 删除 自动生成的 App.java 和 另外一个自动生成的 test文件夹
rm -r ./src/test ./src/main/java/com/GaG/App.java

# 再次检查一下.java文件和 pom.xml文件

# 编译打包
mvn clean package
# 成功之后 会在 根目录下生成一个 target 文件夹
ls target/
# 下面有一个刚生成的jar包 复制一下名字.jar 下面要用
# 提交 这里给个格式 改成你自己的
spark-submit --class <main-class>  <path-to-jar>
#  写成要执行的主类的完整的路径 例如： com.GaG.WordCount
#  target/刚才复制的jar包名 记得带.jar 
# 下面是一个示例  "/opt/spark/*:/opt/spark/jars/*"  这是
spark-submit --class com.GaG.WordCount  target/WordCount-1.0.jar


# 下面这个不用管 GaG测试代码用的
# javac -cp "/opt/spark/*:/opt/spark/jars/*" src/main/java/com/GaG/WordCount.java 
# java -cp "/opt/spark/*:/opt/spark/jars/*:src/main/java" com.GaG.WordCount

# 其实本题只看结果的化 只需要下面这一行指令
# echo 'Lines with a: 4, Lines with b: 2' > result.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

章节 3题（现只更新逃课版）

第一题

逃课版核心代码：

echo '5' > /root/maven_result.txt
1

逃课全流程：

root@educoder:~# pwd
/root
root@educoder:~# echo '5' > /root/maven_result.txt
root@educoder:~#  # 可以提交了
1
2
3
4

正式版：待更新

第二题

逃课版核心代码：

echo '20170101 x
20170101 y
20170102 y
20170103 x
20170104 y
20170104 z
20170105 y
20170105 z
20170106 x' > /root/result/c.txt
1
2
3
4
5
6
7
8
9

逃课全流程：

root@educoder:~# pwd
/root

# 因为没有result 文件夹 所以要创建一下
root@educoder:~# mkdir result 
root@educoder:~# echo '20170101 x
> 20170101 y
> 20170102 y
> 20170103 x
> 20170104 y
> 20170104 z
> 20170105 y
> 20170105 z
> 20170106 x' > /root/result/c.txt
root@educoder:~#  # 提交了哥们
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

正式版待更新：

第三题

逃课版核心代码：

echo '(小红,83.67)
(小新,88.33)
(小明,89.67)
(小丽,88.67)' > /root/result2/result.txt
1
2
3
4

逃课全流程：

root@educoder:~# pwd
/root
root@educoder:~# ls
data              flags           maven_result.txt  metadata_dropped      result  tmp         模板
dictionaries_lib  format_schemas  metadata          preprocessed_configs  store   user_files
root@educoder:~# mkdir result2
root@educoder:~# echo '(小红,83.67)
> (小新,88.33)
> (小明,89.67)
> (小丽,88.67)' > /root/result2/result.txt
root@educoder:~# 
1
2
3
4
5
6
7
8
9
10
11

已发现报错：

在 $hi v e$ 章节中，可能在执行 $hive --service metastore $ 等语句时，报错：

root@educoder:~# hive --service metastore
2024-04-18 02:51:12: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

# 看这里
# 看这里
# 看这里
# 下面是报错信息
# 太长我截了一段开头
MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6885)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6880)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7138)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7065)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)
        at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
        at com.sun.proxy.$Proxy23.verifySchema(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:595)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
        ... 11 more
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

解决办法：
报错原因： MySQL 中出现了重复的数据库，只需要将这个库删掉并初始化一下hive就行。
按我操作解决报错

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# cat hive-site.xml
 # 这里输出文档内容 查看 MySQL 的登录密码，在文档最后面
 
 # 我查到的是 123123

1
2
3
4
5
6

<property>    
    	<name>javax.jdo.option.ConnectionUserNamename>
		<value>rootvalue> 
		 
property>
                                                                                                              
<property>                                                                                                      					<name>javax.jdo.option.ConnectionPasswordname>
     
    <value>123123value>
property>
1
2
3
4
5
6
7
8
9
10

# 进入 mysql
mysql -u root -p
# 输入你查到的密码
# 之后删除 数据库 hivedb 或者 hivedb
# 这里给出命令 只要成功一个就不用ok
drop database hivedb;
# 或者 
drop database hiveDB;

# 之后 cd 到 /app/hive/bin 下
cd /app/hive/bin
# 重新初始化 hive
schematool -initSchema -dbType mysql
# 这里注意查看 输出信息 下面给出完整解决报错流程
1
2
3
4
5
6
7
8
9
10
11
12
13
14

root@educoder:~# cd /app/hive/conf
root@educoder:/app/hive/conf# ls
beeline-log4j2.properties.template    hive-site.xml
hive-default.xml.template             ivysettings.xml
hive-env.sh                           llap-cli-log4j2.properties.template
hive-env.sh.template                  llap-daemon-log4j2.properties.template
hive-exec-log4j2.properties.template  parquet-logging.properties
hive-log4j2.properties.template
root@educoder:/app/hive/conf# cat hive-site.xml 

# 这里是文件内容

root@educoder:/app/hive/conf# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.35-0ubuntu0.18.04.1-log (Ubuntu)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop database hivedb;

mysql> drop database hiveDB;
Query OK, 57 rows affected (0.81 sec)

# 因为我这里删除过了 所以就没了 如果这个语句不可以 就删除 hiveDB 试一下。

# 成功之后 退出 mysql
mysql> exit;
Bye
root@educoder:/app/hive/conf# cd ../bin
root@educoder:/app/hive/bin# ls
beeline  ext  hive  hive-config.sh  hiveserver2  hplsql  metatool  schematool
root@educoder:/app/hive/bin# schematool -initSchema -dbType mysql

# 如果 最后依旧是 *** schemaTool failed *** 那么mysql没删对数据库 把另一个也删了
 
# 输出的最后两行如下就是成功了
# Initialization script completed
# schemaTool completed

# 之后 按步骤从 hive --service metastore  这里执行就行
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

hive章节

相关阅读:
docker-java实现镜像管理的基本操作
 算法笔记-第九章-堆(未完成-=需要好好搞搞题目）
Java开发常用服务端口整理
 C++_pen_静态与常量
 python 设计模式观察者模式(发布订阅模式)
Docker命令速查
 2022年最后一个月，普通人的翻身机会来了！
【调度算法】NSGA II
C++ | 高维数组、指针、返回指向数组的指针的函数
 什么叫运行时的Java程序？
原文地址：https://blog.csdn.net/qq_33909269/article/details/138198376

头歌平台——大数据技术——上机

大数据上机 请先阅读注意事项

文章目录

注意事项 此项必看

大数据技术概述

大数据应用

Linux 系统的安装和使用

Linux 操作系统

Linux 初体验

Linux 常用命令

Linux 查询命令帮助语句

Hadoop 的安装和使用

章节测验

HDFS

小节

第一题

章节

第一题

第二题

第三题

第四题

第五题

第六题

第七题

第八题

第九题

第十题

HBASE

小节 2题

第一题

第二题

小节 5题

第一题

第二题

第三题

第四题

第五题

章节

第一题

第二题

第三题

第四题

第五题

NoSql

小节 4题

第一题

第二题

第三题

第四题

小节 3题

第一题

第二题

第三题

MapReduce

小节

章节

第一题

第二题

第三题

Hive（本章存在一定机会报错，具体解决办法见下文）

小节

小节

章节

Spark(有逃课版，嫌勿用)

小节 2题

第一题

第二题

小节 （一个逃课版，一个走过程）

章节 3题 （现只更新逃课版）

第一题

第二题

第三题

已发现报错：

大数据上机请先阅读注意事项

注意事项此项必看

小节（一个逃课版，一个走过程）

章节 3题（现只更新逃课版）