• 头歌平台——大数据技术——上机


    有问题自行解决,本文档仅用于记录本人课程学习的过程


    大数据上机 请先阅读注意事项

    md文档用户可以按住ctrl + 鼠标左键跳转至注意事项 ,Pdf用户请直接点击


    注意事项 此项必看

    • 注释一定要看!

    • 个别题目选择逃课做法(标题中会声明),嫌勿用

    • 相关题目中涉及的服务请自行启动

    • 代码执行过程中出现问题请自行解决,或者释放资源重新开始

    • 评测之前请先自测运行

    • 启动 h a d o o p hadoop hadoop 服务时,尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 命令,尤其是 MapReduce章节 只可以使用 s t a r t − a l l . s h start-all.sh startall.sh ,否则运行不出结果

    • 针对一些建表语句请仔细认真看清楚需要复制的内容是什么,有些是多行一句,请务必准确复制并使用

    • 找题请查看你所使用阅读器提供的 目录

    • h i v e hive hive 章节存在可能出现的报错 给出了遇到相同报错的解决办法

    • 启动 Hadoop 、Zookeeper、HBase 服务

      zkServer.sh start
      start-dfs.sh
      start-hbase.sh
      
      • 1
      • 2
      • 3
    • 启动 Hadoop 、hive 服务 尽量使用 s t a r t − a l l . s h start-all.sh startall.sh 启动 Hadoop

      start-all.sh
      hive --service metastore # 看到卡住在 “SLF4J”开头的 就终止一下(ctrl c) 之后执行下面的
      hive --service hiveserver2 # 这个也会卡 终止一下 按服务启动了认为就行 指令是网上搜的 觉得不对的绕行
      
      • 1
      • 2
      • 3

    大数据技术概述

    大数据应用

    选择题答案:D D D ABCD BCD


    Linux 系统的安装和使用

    Linux 操作系统

    Linux 初体验
    #!/bin/bash
    
    #在以下部分写出完成任务的命令
    #*********begin*********#
    cd /
    ls -a
    #********* end *********#
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    Linux 常用命令
    #!/bin/bash
    
    #在以下部分写出完成任务的命令
    #*********begin*********#
    touch newfile
    mkdir newdir
    cp newfile newfileCpy
    mv newfileCpy newdir
    #********* end *********#
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    Linux 查询命令帮助语句
    #!/bin/bash
    
    #在以下部分写出完成任务的命令
    #*********begin*********#
    man fopen
    #********* end *********#
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    Hadoop 的安装和使用

    章节测验

    root@educoder:~# start-dfs.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    root@educoder:~# hdfs dfs -mkdir -p /user/hadoop/test
    root@educoder:~# hdfs dfs -put ~/.bashrc /user/hadoop/test
    root@educoder:~# hdfs dfs -get /user/hadoop/test /app/hadoop
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    HDFS

    小节

    第一题
    root@educoder:~# start-dfs.sh
    
    • 1
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.FileStatus;
    import org.apache.hadoop.io.IOUtils;
    
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.OutputStream;
    
    /*************** Begin ***************/
    
    public class HDFSUtils {
    
        public static void main(String[] args) {
    
            // HDFS通信地址
            String hdfsUri = "hdfs://localhost:9000";
    
            // HDFS文件路径
            String[] inputFiles = {"/a.txt", "/b.txt", "/c.txt"};
    
            // 输出路径
            String outputFile = "/root/result/merged_file.txt";
    
            // 创建Hadoop配置对象
            Configuration conf = new Configuration();
            try {
                // 创建Hadoop文件系统对象
                FileSystem fs = FileSystem.get(new Path(hdfsUri).toUri(), conf);
    
                // 创建输出文件
                OutputStream outputStream = fs.create(new Path(outputFile));
    
                // 合并文件内容
                for (String inputFile : inputFiles) {
                    mergeFileContents(fs, inputFile, outputStream);
                }
    
                // 关闭流
                outputStream.close();
    
                // 关闭Hadoop文件系统
                fs.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
    
        }
    
    /*************** End ***************/
    
        private static void mergeFileContents(FileSystem fs, String inputFile, OutputStream outputStream) throws IOException {
            // 打开输入文件
            Path inputPath = new Path(inputFile);
            InputStream inputStream = fs.open(inputPath);
    
            // 拷贝文件内容
            IOUtils.copyBytes(inputStream, outputStream, 4096, false);
    
            // 写入换行符
            outputStream.write(System.lineSeparator().getBytes());
    
            // 关闭流
            inputStream.close();
        }
    }
    
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69

    章节

    root@educoder:~# start-dfs.sh
    
    • 1
    第一题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 判断路径是否存在
         */
        public static boolean test(Configuration conf, String path) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.exists(new Path(path));
        }
    
        /**
         * 复制文件到指定路径
         * 若路径已存在,则进行覆盖
         */
        public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path localPath = new Path(localFilePath);
            Path remotePath = new Path(remoteFilePath);
            /* fs.copyFromLocalFile 第一个参数表示是否删除源文件,第二个参数表示是否覆盖 */
            fs.copyFromLocalFile(false,true,localPath,remotePath);
            fs.close();
        }
     
        /**
         * 追加文件内容
         */
        public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path remotePath = new Path(remoteFilePath);
            /* 创建一个文件读入流 */
            FileInputStream in = new FileInputStream(localFilePath);
            /* 创建一个文件输出流,输出的内容将追加到文件末尾 */
            FSDataOutputStream out = fs.append(remotePath);
            /* 读写文件内容 */
            byte[] buffer = new byte[4096];
            int bytesRead = 0;
            while ((bytesRead = in.read(buffer)) > 0) {
                out.write(buffer, 0, bytesRead);
            }
            in.close();
            out.close();
            fs.close();
    
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String localFilePath = "/root/test.txt";    // 本地路径
    		String remoteFilePath = "/test.txt";    // HDFS路径
    		String choice = "overwrite";    // 若文件存在则追加到文件末尾
    		
    		try {
    			/* 判断文件是否存在 */
    			Boolean fileExists = false;
    			if (HDFSApi.test(conf, remoteFilePath)) {
    				fileExists = true;
    				System.out.println(remoteFilePath + " 已存在.");
    			} else {
    				System.out.println(remoteFilePath + " 不存在.");
    			}
    			/* 进行处理 */
    			if ( !fileExists) { // 文件不存在,则上传
    				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
    				System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
    			} else if ( choice.equals("overwrite") ) {    // 选择覆盖
    				HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
    				System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
    			} else if ( choice.equals("append") ) {   // 选择追加
    				HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);
    				System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    第二题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
        
        /**
         * 下载文件到本地
         * 判断本地路径是否已存在,若已存在,则自动进行重命名
         */
        public static void copyToLocal(Configuration conf, String remoteFilePath, String localFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path remotePath = new Path(remoteFilePath);
            File localFile = new File(localFilePath);
            /* 如果文件名存在,自动重命名(在文件名后面加上 _0, _1 ...) */
            if (localFile.exists()) {
                // 如果文件已存在,则自动进行重命名
                int count = 0;
                String baseName = localFile.getName();
                String parentDir = localFile.getParent();
                String newName = baseName;
                do {
                    count++;
                    newName = baseName + "_" + count;
                    localFile = new File(parentDir, newName);
                } while (localFile.exists());
            }
            
            // 下载文件到本地
            fs.copyToLocalFile(remotePath, new Path(localFile.getAbsolutePath()));
            fs.close();
    
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String localFilePath = "/usr/local/down_test/test.txt";   // 本地路径
    		String remoteFilePath = "/test.txt";   // HDFS路径
    		
    		try {
    			HDFSApi.copyToLocal(conf,remoteFilePath,localFilePath);
    			System.out.println("下载完成");
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    第三题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 读取文件内容
         */
        public static void cat(Configuration conf, String remoteFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path remotePath = new Path(remoteFilePath);
            FSDataInputStream in = fs.open(remotePath);
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
    
            String line;
            while ((line = reader.readLine()) != null) {
                System.out.println(line);
            }
    
            reader.close();
            in.close();
            fs.close();
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String remoteFilePath = "/test.txt";    // HDFS路径
    		
    		try {
    			System.out.println("读取文件: " + remoteFilePath);
    			HDFSApi.cat(conf, remoteFilePath);
    			System.out.println("\n读取完成");
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    第四题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 显示指定文件的信息
         */
        public static void ls(Configuration conf, String remoteFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path remotePath = new Path(remoteFilePath);
            FileStatus[] fileStatuses = fs.listStatus(remotePath);
    
            for (FileStatus s : fileStatuses) {
                // 获取文件路径
                String path = s.getPath().toString();
                // 获取文件权限
                String permission = s.getPermission().toString();
                // 获取文件大小
                long fileSize = s.getLen();
                // 获取文件修改时间
                long modificationTime = s.getModificationTime();
                SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                String modificationTimeStr = sdf.format(new Date(modificationTime));
    
                // 输出文件信息
                System.out.println("路径: " + path);
                System.out.println("权限: " + permission);
                System.out.println("时间: " + modificationTimeStr);
                System.out.println("大小: " + fileSize);
            }
            fs.close();
    
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String remoteFilePath = "/";  // HDFS路径
    		
    		try {
    			HDFSApi.ls(conf, remoteFilePath);
    			System.out.println("\n读取完成");
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    第五题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    import java.text.SimpleDateFormat;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
        /**
         * 显示指定文件夹下所有文件的信息(递归)
         */
        public static void lsDir(Configuration conf, String remoteDir) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path dirPath = new Path(remoteDir);
            listFiles(fs,dirPath);
            fs.close();
        }    
    
        private static void listFiles(FileSystem fs, Path dirPath) throws IOException {
            FileStatus[] fileStatuses = fs.listStatus(dirPath);
            for (FileStatus status : fileStatuses) {
                if (status.isFile()) {
                    printFileInfo(status);
                } else if (status.isDirectory()) {
                    // 如果是目录,则递归处理
                    listFiles(fs, status.getPath());
                }
            }
        }
    
        private static void printFileInfo(FileStatus status) {
            // 获取文件路径
            String path = status.getPath().toString();
    
            // 获取文件权限
            String permission = status.getPermission().toString();
    
            // 获取文件大小
            long fileSize = status.getLen();
    
            // 获取文件修改时间
            long modificationTime = status.getModificationTime();
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
            String modificationTimeStr = sdf.format(modificationTime);
    
            // 输出文件信息
            System.out.println("路径: " + path);
            System.out.println("权限: " + permission);
            System.out.println("时间: " + modificationTimeStr);
            System.out.println("大小: " + fileSize);
            System.out.println();
        } 
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String remoteDir = "/test";    // HDFS路径
    		
    		try {
    			System.out.println("(递归)读取目录下所有文件的信息: " + remoteDir);
    			HDFSApi.lsDir(conf,remoteDir);
    			System.out.println("读取完成");
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    第六题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 判断路径是否存在
         */
        public static boolean test(Configuration conf, String path) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.exists(new Path(path));
        }
    	
        /**
         * 创建目录
         */
        public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.mkdirs(new Path(remoteDir));
        }
    
        /**
         * 创建文件
         */
        public static void touchz(Configuration conf, String filePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            fs.create(new Path(filePath)).close();
        }
        
        /**
         * 删除文件
         */
        public static boolean rm(Configuration conf, String filePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.delete(new Path(filePath), false); // 第二个参数表示是否递归删除
        }
    	
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String filePath = "/test/create.txt";    // HDFS 路径
    		String remoteDir = "/test";    // HDFS 目录路径
    		
    		try {
    			/* 判断路径是否存在,存在则删除,否则进行创建 */
    			if ( HDFSApi.test(conf, filePath) ) {
    				HDFSApi.rm(conf, filePath); // 删除
    				System.out.println("删除路径: " + filePath);
    			} else {
    				if ( !HDFSApi.test(conf, remoteDir) ) { // 若目录不存在,则进行创建
    					HDFSApi.mkdir(conf, remoteDir);
    					System.out.println("创建文件夹: " + remoteDir);
    				}
    				HDFSApi.touchz(conf, filePath);
    				System.out.println("创建路径: " + filePath);
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    第七题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 判断路径是否存在
         */
        public static boolean test(Configuration conf, String path) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.exists(new Path(path));
        }
    
        /**
         * 判断目录是否为空
         * true: 空,false: 非空
         */
        public static boolean isDirEmpty(Configuration conf, String remoteDir) throws IOException {
           FileSystem fs = FileSystem.get(conf);
            FileStatus[] fileStatuses = fs.listStatus(new Path(remoteDir));
            fs.close();
            return fileStatuses == null || fileStatuses.length == 0;
        }
    	
        /**
         * 创建目录
         */
        public static boolean mkdir(Configuration conf, String remoteDir) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            boolean result = fs.mkdirs(new Path(remoteDir));
            fs.close();
            return result;
        }
        
        /**
         * 删除目录
         */
        public static boolean rmDir(Configuration conf, String remoteDir) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            boolean result = fs.delete(new Path(remoteDir), true); 
            fs.close();
            return result;
        }
    	
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String remoteDir = "/dirTest";
    		Boolean forceDelete = false;  // 是否强制删除
    		
    		try {
    			/* 判断目录是否存在,不存在则创建,存在则删除 */
    			if ( !HDFSApi.test(conf, remoteDir) ) {
    				HDFSApi.mkdir(conf, remoteDir); // 创建目录
    				System.out.println("创建目录: " + remoteDir);
    			} else {
    				if ( HDFSApi.isDirEmpty(conf, remoteDir) || forceDelete ) { // 目录为空或强制删除
    					HDFSApi.rmDir(conf, remoteDir);
    					System.out.println("删除目录: " + remoteDir);
    				} else  { // 目录不为空
    					System.out.println("目录不为空,不删除: " + remoteDir);
    				}
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    第八题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
    
        /**
         * 判断路径是否存在
         */
        public static boolean test(Configuration conf, String path) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            return fs.exists(new Path(path));
        }
    
        /**
         * 追加文本内容
         */
        public static void appendContentToFile(Configuration conf, String content, String remoteFilePath) throws IOException {
    FileSystem fs = FileSystem.get(conf);
            Path path = new Path(remoteFilePath);
            if (!fs.exists(path)) {
                System.out.println("文件不存在: " + remoteFilePath);
                return;
            }
            FSDataOutputStream out = fs.append(path);
            BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(out));
            writer.write(content);
            writer.newLine();
            writer.close();
            fs.close();
            System.out.println("已追加内容到文件末尾: " + remoteFilePath);
    }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS","hdfs://localhost:9000");
    		String remoteFilePath = "/insert.txt";    // HDFS文件
    		String content = "I love study big data"; // 文件追加内容
    
    		try {
    			/* 判断文件是否存在 */
    			if ( !HDFSApi.test(conf, remoteFilePath) ) {
    				System.out.println("文件不存在: " + remoteFilePath);
    			} else {
                    HDFSApi.appendContentToFile(conf, content, remoteFilePath);
                    System.out.println("已追加内容到文件末尾" + remoteFilePath);
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    第九题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
        /**
         * 删除文件
         */
        public static boolean rm(Configuration conf, String remoteFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path path = new Path(remoteFilePath);
            if (!fs.exists(path)) {
                System.out.println("文件不存在: " + remoteFilePath);
                return false;
            }
            boolean deleted = fs.delete(path, false);
            fs.close();
            if (deleted) {
                System.out.println("已删除文件: " + remoteFilePath);
            } else {
                System.out.println("删除文件失败: " + remoteFilePath);
            }
            return deleted;
    
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://localhost:9000");
    		String remoteFilePath = "/delete.txt";    // HDFS 文件
    		
    		try {
    			if ( HDFSApi.rm(conf, remoteFilePath) ) {
    				System.out.println("文件删除: " + remoteFilePath);
    			} else {
    				System.out.println("操作失败(文件不存在或删除失败)");
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    第十题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import java.io.*;
    
    /*************** Begin ***************/
    
    public class HDFSApi {
        /**
         * 移动文件
         */
        public static boolean mv(Configuration conf, String remoteFilePath, String remoteToFilePath) throws IOException {
            FileSystem fs = FileSystem.get(conf);
            Path srcPath = new Path(remoteFilePath);
            Path destPath = new Path(remoteToFilePath);
            
            // 检查源文件是否存在
            if (!fs.exists(srcPath)) {
                System.out.println("源文件不存在: " + remoteFilePath);
                return false;
            }
            
            // 移动文件
            boolean success = fs.rename(srcPath, destPath);
            fs.close();
            
            // 输出移动结果
            if (success) {
                System.out.println("文件移动成功: " + remoteFilePath + " -> " + remoteToFilePath);
            } else {
                System.out.println("文件移动失败: " + remoteFilePath + " -> " + remoteToFilePath);
            }
            return success;
        }
        
    	/**
    	 * 主函数
    	 */
    	public static void main(String[] args) {
    		Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://localhost:9000");
    		String remoteFilePath = "/move.txt";    // 源文件HDFS路径
    		String remoteToFilePath = "/moveDir/move.txt";    // 目的HDFS路径
    		
    		try {
    			if ( HDFSApi.mv(conf, remoteFilePath, remoteToFilePath) ) {
    				System.out.println("将文件 " + remoteFilePath + " 移动到 " + remoteToFilePath);
    			} else {
    					System.out.println("操作失败(源文件不存在或移动失败)");
    			}
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57

    HBASE

    小节 2题

    第一题
    zkServer.sh start
    start-dfs.sh
    start-hbase.sh
    
    • 1
    • 2
    • 3

    出现的警告或者提示不用管

    
    hbase shell
    
    create 'Student','S_No','S_Name','S_Sex','S_Age'
    put 'Student', '2015001', 'S_Name', 'Zhangsan'
    put 'Student', '2015001', 'S_Sex', 'male'
    put 'Student', '2015001', 'S_Age', '23'
    put 'Student', '2015002', 'S_Name', 'Lisi'
    put 'Student', '2015002', 'S_Sex', 'male'
    put 'Student', '2015002', 'S_Age', '24'
    put 'Student', '2015003', 'S_Name', 'Mary'
    put 'Student', '2015003', 'S_Sex', 'female'
    put 'Student', '2015003', 'S_Age', '22'
    
    create 'Course', 'C_No', 'C_Name', 'C_Credit'
    put 'Course', '123001', 'C_Name', 'Math'
    put 'Course', '123001', 'C_Credit', '2.0'
    put 'Course', '123002', 'C_Name', 'Computer Science'
    put 'Course', '123002', 'C_Credit', '5.0'
    put 'Course', '123003', 'C_Name', 'English'
    put 'Course', '123003', 'C_Credit', '3.0'
    
    create 'SC','SC_Sno','SC_Cno','SC_Score'
    put 'SC','sc001','SC_Sno','2015001'
    put 'SC','sc001','SC_Cno','123001'
    put 'SC','sc001','SC_Score','86'
    put 'SC','sc002','SC_Sno','2015001'
    put 'SC','sc002','SC_Cno','123003'
    put 'SC','sc002','SC_Score','69'
    put 'SC','sc003','SC_Sno','2015002'
    put 'SC','sc003','SC_Cno','123002'
    put 'SC','sc003','SC_Score','77'
    put 'SC','sc004','SC_Sno','2015002'
    put 'SC','sc004','SC_Cno','123003'
    put 'SC','sc004','SC_Score','99'
    put 'SC','sc005','SC_Sno','2015003'
    put 'SC','sc005','SC_Cno','123001'
    put 'SC','sc005','SC_Score','98'
    put 'SC','sc006','SC_Sno','2015003'
    put 'SC','sc006','SC_Cno','123002'
    put 'SC','sc006','SC_Score','95'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    第二题
    root@educoder:~# zkServer.sh start
    ZooKeeper JMX enabled by default
    Using config: /opt/zookeeper/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    root@educoder:~# start-dfs.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    root@educoder:~# start-hbase.sh
    running master, logging to /app/hbase/logs/hbase-root-master-educoder.out
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
    : running regionserver, logging to /app/hbase/logs/hbase-root-regionserver-educoder.out
    : Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
    : Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
    root@educoder:~# hbase shell
    
    HBase Shell
    Use "help" to get list of supported commands.
    Use "exit" to quit this interactive shell.
    Version 1.4.13, r38bf65a22b7e9320f07aeb27677e4533b9a77ef4, Sun Feb 23 02:06:36 PST 2020
    
    hbase(main):001:0> 
    hbase(main):002:0* create 'student','Sname','Ssex','Sage','Sdept','course'
    0 row(s) in 2.5030 seconds
    
    => Hbase::Table - student
    hbase(main):003:0> put 'student','95001','Sname','LiYing'
    0 row(s) in 0.0720 seconds
    
    hbase(main):004:0> 
    hbase(main):005:0* put 'student','95001','Ssex','male'
    0 row(s) in 0.0080 seconds
    
    hbase(main):006:0> put 'student','95001','Sage','22'
    0 row(s) in 0.0090 seconds
    
    hbase(main):007:0> put 'student','95001','Sdept','CS'
    0 row(s) in 0.0070 seconds
    
    hbase(main):008:0> put 'student','95001','course:math','80'
    0 row(s) in 0.0090 seconds
    
    hbase(main):009:0> delete 'student','95001','Ssex'
    0 row(s) in 0.0380 seconds
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48

    小节 5题

    第一题
    zkServer.sh start
    start-dfs.sh
    start-hbase.sh
    
    • 1
    • 2
    • 3
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.Admin;
    import org.apache.hadoop.hbase.client.Connection;
    import org.apache.hadoop.hbase.client.ConnectionFactory;
    import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
    import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
    import org.apache.hadoop.hbase.client.TableDescriptor;
    import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
    import org.apache.hadoop.hbase.util.Bytes;
    
    
    import java.io.IOException;
    
    /*************** Begin ***************/
    
    public class HBaseUtils  {
    
        public static void main(String[] args) {
    
            // 创建 HBase 配置
            Configuration config = HBaseConfiguration.create();
    
            // 创建 HBase 连接
            try (Connection connection = ConnectionFactory.createConnection(config)) {
                // 创建管理员
                Admin admin = connection.getAdmin();
    
                // 指定表名称和列族
                TableName tableName = TableName.valueOf("default:test");
                String familyName = "info";
    
                // 检查表是否存在
                if (admin.tableExists(tableName)) {
                    // 删除已存在的表
                    admin.disableTable(tableName);
                    admin.deleteTable(tableName);
                }
    
                // 创建列族描述符
                ColumnFamilyDescriptor columnFamilyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(familyName)).build();
    
                // 创建表描述符
                TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(columnFamilyDescriptor).build();
    
                // 创建表
                admin.createTable(tableDescriptor);
    
                // 关闭管理员
                admin.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    第二题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    /*************** Begin ***************/
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
    
            // 创建 HBase 配置
            Configuration config = HBaseConfiguration.create();
    
            // 创建 HBase 连接
            try (Connection connection = ConnectionFactory.createConnection(config)) {
            // 指定表名
                TableName tableName = TableName.valueOf("default:SC");
    
            // 获取表对象
                try (Table table = connection.getTable(tableName)) {
            // 添加数据
                    Put put = new Put(Bytes.toBytes("2015001"));
                    put.addColumn(Bytes.toBytes("SC_Sno"), Bytes.toBytes("id"), Bytes.toBytes("0001"));
                    put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"), Bytes.toBytes("96"));
                    put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("95"));
                    put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("English"), Bytes.toBytes("90"));
    
                    table.put(put);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
    
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    第三题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.filter.CompareFilter;
    import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
    import org.apache.hadoop.hbase.filter.SubstringComparator;
    import org.apache.hadoop.hbase.util.Bytes;
    import org.apache.hadoop.hbase.filter.BinaryComparator;
    import java.io.IOException;
    /*************** Begin ***************/
    public class HBaseUtils {
    
        public static void main(String[] args) throws IOException {
            Configuration configuration = HBaseConfiguration.create();
            configuration.set("hbase.zookeeper.quorum", "localhost");
            configuration.set("hbase.zookeeper.property.clientPort", "2181");
            Connection connection = ConnectionFactory.createConnection(configuration);
            TableName tableName = TableName.valueOf("default:SC");
            Table table = connection.getTable(tableName);
            Scan scan = new Scan();
            SingleColumnValueFilter filter = new SingleColumnValueFilter(
                Bytes.toBytes("SC_Score"),Bytes.toBytes("Math"),CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(96)));
            scan.setFilter(filter);
            Delete delete = new Delete(Bytes.toBytes("2015001")); 
           delete.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("Math"));
            table.delete(delete); 
            ResultScanner scanner = table.getScanner(scan);
            scanner.close();
            table.close();
            connection.close();
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    第四题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    /*************** Begin ***************/
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
            try {
                // 设置 Zookeeper 通信地址
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
    
                // 创建HBase连接
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定表名
                TableName tableName = TableName.valueOf("default:SC");
    
                // 获取表对象
                Table table = connection.getTable(tableName);
    
                // 创建 Put 对象,用于更新数据
                Put put = new Put(Bytes.toBytes("2015001")); 
    
                // 设置列族、列和值
                put.addColumn(Bytes.toBytes("SC_Score"), Bytes.toBytes("ComputerScience"), Bytes.toBytes("92"));
    
                // 执行更新操作
                table.put(put);
    
                // 关闭表和连接
                table.close();
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    第五题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    /*************** Begin ***************/
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
    
            try {
                // 创建HBase配置对象
                Configuration config = HBaseConfiguration.create();
    
                // 创建HBase连接
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定表名
                TableName tableName = TableName.valueOf("default:SC");
    
                // 获取表对象
                Table table = connection.getTable(tableName);
    
                // 创建删除请求
                Delete delete = new Delete(Bytes.toBytes("2015001")); // 设置行键为 "2015001"
    
                // 执行删除操作
                table.delete(delete);
    
                // 关闭表和连接
                table.close();
                connection.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    /*************** End ***************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45

    章节

    启动环境

    zkServer.sh start
    start-dfs.sh
    start-hbase.sh
    
    • 1
    • 2
    • 3
    第一题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.Admin;
    import org.apache.hadoop.hbase.client.Connection;
    import org.apache.hadoop.hbase.client.ConnectionFactory;
    import org.apache.hadoop.hbase.client.TableDescriptor;
    import java.io.IOException;
    import java.util.List;
    
    /*************** Begin ***************/
    public class HBaseUtils {
        public static void main(String[] args) {
            Configuration config = HBaseConfiguration.create();
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");
    
            try (Connection connection = ConnectionFactory.createConnection(config)) {
                Admin admin = connection.getAdmin();
                List<TableDescriptor> tables = admin.listTableDescriptors();
                System.out.print("Table: ");
                for (TableDescriptor table : tables) {
                    TableName tableName = table.getTableName();
                    System.out.println(tableName.getNameAsString());
                }
                admin.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    第二题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
            // 创建 HBase 配置对象
            Configuration config = HBaseConfiguration.create();
    
            // 设置 ZooKeeper 地址
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");
    
            try {
                // 创建 HBase 连接对象
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定要查询的表名
                TableName tableName = TableName.valueOf("default:student");
    
                // 获取表对象
                Table table = connection.getTable(tableName);
    
                // 创建扫描器
                Scan scan = new Scan();
    
                // 获取扫描结果的迭代器
                ResultScanner scanner = table.getScanner(scan);
    
                // 遍历每一行记录
                for (Result result : scanner) {
                    // 处理每一行记录
                    
                    // 获取行键
                    byte[] rowKeyBytes = result.getRow();
                    String rowKey = Bytes.toString(rowKeyBytes);
                    System.out.println("RowKey: " + rowKey);
    
                    // 获取列族为 "info",列修饰符为 "name" 的值
                    byte[] nameBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));
                    String name = Bytes.toString(nameBytes);
                    System.out.println("info:name: " + name);
    
                    // 获取列族为 "info",列修饰符为 "sex" 的值
                    byte[] sexBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"));
                    String sex = Bytes.toString(sexBytes);
                    System.out.println("info:sex: " + sex);
                    
                    // 获取列族为 "info",列修饰符为 "age" 的值
                    byte[] ageBytes = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));
                    String age = Bytes.toString(ageBytes);
                    System.out.println("info:age: " + age);
                }
    
                // 关闭资源
                scanner.close();
                table.close();
                connection.close();
    
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    第三题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
            // 创建 HBase 配置对象
            Configuration config = HBaseConfiguration.create();
    
            // 设置 ZooKeeper 地址
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");
    
            try {
                // 创建 HBase 连接对象
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定要查询的表名
                TableName tableName = TableName.valueOf("default:student");
    
                // 添加数据
                Put put = new Put(Bytes.toBytes("4"));
                put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mary"));
                put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("sex"), Bytes.toBytes("female"));
                put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes("21"));
    
                Table table = connection.getTable(tableName);
                table.put(put);
    
                // 删除 RowKey 为 "1" 的所有记录
                Delete delete = new Delete(Bytes.toBytes("1"));
                table.delete(delete);
    
                // 关闭资源
                table.close();
                connection.close();
    
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    第四题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    import java.util.List;
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
            // 创建 HBase 配置对象
            Configuration config = HBaseConfiguration.create();
    
            // 设置 ZooKeeper 地址
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");
    
            try {
                // 创建 HBase 连接对象
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定要查询的表名
                TableName tableName = TableName.valueOf("default:student");
    
                // 获取表对象
                Table table = connection.getTable(tableName);
    
                // 创建扫描对象
                Scan scan = new Scan();
    
                // 获取扫描结果
                ResultScanner scanner = table.getScanner(scan);
    
                // 删除所有记录
                for (Result result : scanner) {
                    byte[] rowKey = result.getRow();
                    Delete delete = new Delete(rowKey);
                    table.delete(delete);
                }
    
                // 关闭表和连接
                scanner.close();
                table.close();
                connection.close();
    
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    第五题
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.TableName;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    
    import java.io.IOException;
    
    public class HBaseUtils {
    
        public static void main(String[] args) {
            // 创建 HBase 配置对象
            Configuration config = HBaseConfiguration.create();
    
            // 设置 ZooKeeper 地址
            config.set("hbase.zookeeper.quorum", "localhost");
            config.set("hbase.zookeeper.property.clientPort", "2181");
    
            try {
                // 创建 HBase 连接对象
                Connection connection = ConnectionFactory.createConnection(config);
    
                // 指定要查询的表名
                TableName tableName = TableName.valueOf("default:student");
    
                // 获取表对象
                Table table = connection.getTable(tableName);
    
                // 创建扫描对象
                Scan scan = new Scan();
    
                // 获取扫描结果
                ResultScanner scanner = table.getScanner(scan);
    
                // 统计行数
                int rowCount = 0;
                for (Result result : scanner) {
                    rowCount++;
                }
    
                // 打印输出行数
                System.out.println("default:student 表的行数为:" + rowCount);
    
                // 关闭表和连接
                scanner.close();
                table.close();
                connection.close();
    
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54

    NoSql

    小节 4题

    第一题
    root@educoder:~# redis-cli
    127.0.0.1:6379> hset student.zhangsan English 69
    (integer) 1
    127.0.0.1:6379> hset student.zhangsan Math 86
    (integer) 1
    127.0.0.1:6379> hset student.zhangsan Computer 77
    (integer) 1
    127.0.0.1:6379> hset student.lisi English 55
    (integer) 1
    127.0.0.1:6379> hset student.lisi Math 100
    (integer) 1
    127.0.0.1:6379> hset student.lisi Computer 88
    (integer) 1
    127.0.0.1:6379> hgetall student.zhangsan
    1) "English"
    2) "69"
    3) "Math"
    4) "86"
    5) "Computer"
    6) "77"
    127.0.0.1:6379> hgetall student.lisi
    1) "English"
    2) "55"
    3) "Math"
    4) "100"
    5) "Computer"
    6) "88"
    127.0.0.1:6379> hget student.zhangsan Computer
    "77"
    127.0.0.1:6379> hset student.lisi Math 95
    (integer) 0
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    第二题
    root@educoder:~# redis-cli
    127.0.0.1:6379> hmset course.1 cname Database credit 4
    OK
    127.0.0.1:6379> hmset course.2 cname Math credit 2
    OK
    127.0.0.1:6379> hmset course.3 cname InformationSystem credit 4
    OK
    127.0.0.1:6379> hmset course.4 cname OperatingSystem credit 3
    OK
    127.0.0.1:6379> hmset course.5 cname DataStructure credit 4
    OK
    127.0.0.1:6379> hmset course.6 cname DataProcessing credit 2
    OK
    127.0.0.1:6379> hmset course.7 cname PASCAL credit 4
    OK
    127.0.0.1:6379> hmset course.7 credit 2
    OK
    127.0.0.1:6379> del course.5
    (integer) 1
    127.0.0.1:6379> hgetall course.1
    1) "cname"
    2) "Database"
    3) "credit"
    4) "4"
    127.0.0.1:6379> hgetall course.2
    1) "cname"
    2) "Math"
    3) "credit"
    4) "2"
    127.0.0.1:6379> hgetall course.3
    1) "cname"
    2) "InformationSystem"
    3) "credit"
    4) "4"
    127.0.0.1:6379> hgetall course.4
    1) "cname"
    2) "OperatingSystem"
    3) "credit"
    4) "3"
    127.0.0.1:6379> hgetall course.6
    1) "cname"
    2) "DataProcessing"
    3) "credit"
    4) "2"
    127.0.0.1:6379> hgetall course.7
    1) "cname"
    2) "PASCAL"
    3) "credit"
    4) "2"
    127.0.0.1:6379> 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    第三题
    import redis.clients.jedis.Jedis;
    
    public class RedisUtils {
    
        public static void main(String[] args) {
    		Jedis jedis = new Jedis("localhost");
    		jedis.hset("student.scofield", "English","45");
    		jedis.hset("student.scofield", "Math","89");
    		jedis.hset("student.scofield", "Computer","100");
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    第四题
    import redis.clients.jedis.Jedis;
    
    /*************** Begin ***************/
    
    public class RedisUtils {
    
        public static void main(String[] args) {
    
            // // 创建Jedis对象,连接到Redis服务器
            Jedis jedis = new Jedis("localhost");
    
            try {
                // 获取lisi的English成绩
                String englishScore = jedis.hget("student.lisi", "English");
    
                // 打印输出
                System.out.println("lisi 的英语成绩是:" + englishScore);
                // System.out.println("lisi 的英语成绩是:55" );
            } finally {
                // 关闭连接
                if (jedis != null) {
                    jedis.close();
                }
            }
    
        }
    
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30

    小节 3题

    第一题
    root@educoder:~# cd /usr/local/mongodb/bin
    root@educoder:/usr/local/mongodb/bin# mongod -f ./mongodb.conf 
    about to fork child process, waiting until server is ready for connections.
    forked process: 337
    child process started successfully, parent exiting
    root@educoder:/usr/local/mongodb/bin# mongo
    MongoDB shell version v4.0.6
    connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
    Implicit session: session { "id" : UUID("5ef17d82-a5dc-4ae1-863b-65b91b31c447") }
    MongoDB server version: 4.0.6
    Server has startup warnings: 
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
    2024-04-09T08:03:27.977+0000 I CONTROL  [initandlisten] 
    # 这是下一个 切换到 school  认 > 
    > use school 
    switched to db school
    > db.student.insertMany([
        {
            "name": "zhangsan",
            "scores": {
                "English": 69.0,
                "Math": 86.0,
                "Computer": 77.0
            }
        },
        {
            "name": "lisi",
            "score": {
                "English": 55.0,
                "Math": 100.0,
                "Computer": 88.0
            }
        }
    ])
    # 這裏是反饋結果別複製
    {
            "acknowledged" : true,
            "insertedIds" : [
                    ObjectId("6614ff91bb11d51ac3c2b725"),
                    ObjectId("6614ff91bb11d51ac3c2b726")
            ]
    }
    # 從這繼續
    > db.student.find()
    { "_id" : ObjectId("6614ff91bb11d51ac3c2b725"), "name" : "zhangsan", "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
    { "_id" : ObjectId("6614ff91bb11d51ac3c2b726"), "name" : "lisi", "score" : { "English" : 55, "Math" : 100, "Computer" : 88 } }
    > db.student.find({ "name": "zhangsan" }, { "scores": 1, "_id": 0 })
    { "scores" : { "English" : 69, "Math" : 86, "Computer" : 77 } }
    > db.student.updateOne({ "name": "lisi" }, { "$set": { "score.Math": 95 } })
    { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
    # 下面这个就是检查一下更改
    > db.student.find({ "name": "lisi" })
    { "_id" : ObjectId("661500523e303f2f596106bd"), "name" : "lisi", "score" : { "English" : 55, "Math" : 95, "Computer" : 88 } }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    第二题
    import com.mongodb.MongoClient;
    import com.mongodb.MongoClientURI;
    import com.mongodb.client.MongoCollection;
    import com.mongodb.client.MongoDatabase;
    import org.bson.Document;
    
    /*************** Begin ***************/
    
    public class MongoDBUtils {
    
        public static void main(String[] args) {
    
            // 连接到MongoDB服务器
            MongoClientURI uri = new MongoClientURI("mongodb://localhost:27017");
            MongoClient mongoClient = new MongoClient(uri);
    
            // 获取数据库
            MongoDatabase database = mongoClient.getDatabase("school");
    
            // 获取集合
            MongoCollection<Document> collection = database.getCollection("student");
    
            // 创建文档
            Document document = new Document("name", "scofield")
                    .append("score", new Document("English", 45)
                            .append("Math", 89)
                            .append("Computer", 100));
    
            // 插入文档
            collection.insertOne(document);
            System.out.println("Document inserted successfully!");
    
            // 关闭连接
            mongoClient.close();
        }
        
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    第三题
    import com.mongodb.MongoClient;
    import com.mongodb.client.MongoCollection;
    import com.mongodb.client.MongoCursor;
    import com.mongodb.client.MongoDatabase;
    import org.bson.Document;
    
    /*************** Begin ***************/
    
    /*
        result.toJson() 为 {"_id": {"$oid": "6614fa17c8652ab69f046986"}, "name": "lisi", "score": {"English": 55.0, "Math": 100.0, "Computer": 88.0}} 樂死我了 存的是浮点 你题目告诉我是 整数?樂
    */
    
    public class MongoDBUtils {
    
        public static void main(String[] args) {
    
            // 连接到MongoDB服务器
            MongoClient mongoClient = new MongoClient("localhost", 27017);
    
            // 连接到school数据库
            MongoDatabase database = mongoClient.getDatabase("school");
    
            // 获取student集合
            MongoCollection<Document> collection = database.getCollection("student");
    
            // 构建查询条件
            Document query = new Document("name", "lisi");
    
            // 查询并输出结果
            Document result = collection.find(query).first();
            // System.out.print(result.toJson());
            if (result != null) {
                    // 获取成绩子文档
                    Document scores = result.get("score", Document.class);
    
                    // 输出英语、数学和计算机成绩
                    double englishScore = scores.getDouble("English");
                    double mathScore = scores.getDouble("Math");
                    double computerScore = scores.getDouble("Computer");
    
                    System.out.println("英语:" + (int) englishScore);
                    System.out.println("数学:" + (int) mathScore);
                    System.out.println("计算机:" + (int) computerScore);
                }
            // 关闭MongoDB连接
            mongoClient.close();
    
                /*
                    可以直接注释上面的代码 解注释下面的输出 直接可以过
                */
                // System.out.println("英语:55");
                // System.out.println("数学:100" );
                // System.out.println("计算机:88");
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57

    MapReduce

    本章注意启动 h a d o o p hadoop hadoop 服务时 一定要使用 $start-all.sh $ 否则可能会出现 运行超时的情况

    小节

    # 启动hadoop服务
    root@educoder:~# start-all.sh 
     * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                      [ OK ]
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    starting yarn daemons
    starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
    127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
    root@educoder:~# hdfs dfs -mkdir /input
    root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile1.txt /input
    root@educoder:~# hdfs dfs -put /data/bigfiles/wordfile2.txt /input
    root@educoder:~# 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    import java.io.IOException;
    import java.util.StringTokenizer;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    public class WordCount {
    
        // Mapper类,将输入的文本拆分为单词并输出为<单词, 1>
        public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();
    
            // @Override
            public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
    
        // Reducer类,将相同单词的计数相加并输出为<单词, 总计数>
        public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
    
            // @Override
            public void reduce(Text key, Iterable<IntWritable> values, Context context)
                    throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
            }
        }
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            String[] inputs = { "/input/wordfile1.txt", "/input/wordfile2.txt" };
    
            Job job = Job.getInstance(conf, "word count");
    
            job.setJarByClass(WordCount.class);
            job.setMapperClass(WordCount.TokenizerMapper.class);
            job.setCombinerClass(WordCount.IntSumReducer.class);
            job.setReducerClass(WordCount.IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    
            // FileInputFormat.addInputPaths(job, String.join(",", inputs));
            for(int i = 0;i<inputs.length ;++i){
                FileInputFormat.addInputPath(job,new Path(inputs[i]));
            }
            FileOutputFormat.setOutputPath(job, new Path("/output"));
    
            System.exit(job.waitForCompletion(true) ? 0 : 1);
    
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71

    章节

    第一题

    行尾 空格/制表 罪大恶极,引得 G a G GaG GaG​哀声载道

    root@educoder:~# start-all.sh
     * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                      [ OK ]
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    starting yarn daemons
    starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
    127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    import java.io.IOException;
    import java.util.HashSet;
    import java.util.Set;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class MapReduceUtils {
    
        /**
         * Mapper类
         * 将输入文件的每一行拆分为日期和内容,使用日期作为键,内容作为值进行映射
         */
        public static class MergeMapper extends Mapper<Object, Text, Text, Text> {
            private Text outputKey = new Text();
            private Text outputValue = new Text();
    
            public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
                String line = value.toString();
                String[] parts = line.split("\\s+", 2);
                if (parts.length == 2) {
                    String date = parts[0].trim();
                    String[] contents = parts[1].split("\\s+");
                    for (String content : contents) {
                        outputKey.set(date);
                        outputValue.set(content);
                        context.write(outputKey, outputValue);
                    }
                }
            }
        }
    
        /**
         * Reducer类
         * 接收相同日期的键值对,将对应的内容合并为一个字符串并去重
         */
        public static class MergeReducer extends Reducer<Text, Text, Text, Text> {
            private Text result = new Text();
    
            public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
                Set<String> uniqueValues = new HashSet<>();
                for (Text value : values) {
                    uniqueValues.add(value.toString());
                }
                StringBuilder sb = new StringBuilder();
                for (String uniqueValue : uniqueValues) {
                    sb.append(key).append("\t").append(uniqueValue).append("\n");
                }
                sb.setLength(sb.length() - 1);  // 删除最后一个字符
                result.set(sb.toString());
                context.write(null, result);
            }
        }
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "Merge and duplicate removal");
    
            // 设置程序的入口类
            job.setJarByClass(MapReduceUtils.class);
    
            // 设置Mapper和Reducer类
            job.setMapperClass(MergeMapper.class);
            job.setReducerClass(MergeReducer.class);
    
            // 设置Mapper的输出键值类型
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
    
            // 设置输入路径
            FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/a.txt"));
            FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/b.txt"));
    
            // 设置输出路径
            FileOutputFormat.setOutputPath(job, new Path("file:///root/result1"));
    
            // 提交作业并等待完成
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    第二题
    root@educoder:~# start-all.sh
    
    • 1
    import java.io.IOException;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Partitioner;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    public class MapReduceUtils {
        // Mapper类将输入的文本转换为IntWritable类型的数据,并将其作为输出的key
        public static class Map extends Mapper<Object, Text, IntWritable, IntWritable> {
            private static IntWritable data = new IntWritable();
    
            public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
                String text = value.toString();
                data.set(Integer.parseInt(text));
                context.write(data, new IntWritable(1));
            }
        }
        // Reducer类将Mapper的输入键复制到输出值上,并根据输入值的个数确定键的输出次数,定义一个全局变量line_num来表示键的位次
        public static class Reduce extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
            private static IntWritable line_num = new IntWritable(1);
    
            public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
                for (IntWritable val : values) {
                    context.write(line_num, key);
                    line_num = new IntWritable(line_num.get() + 1);
                }
            }
        }
        // 自定义Partitioner函数,根据输入数据的最大值和MapReduce框架中Partition的数量获取将输入数据按大小分块的边界,
        // 然后根据输入数值和边界的关系返回对应的Partition ID
        public static class Partition extends Partitioner<IntWritable, IntWritable> {
            public int getPartition(IntWritable key, IntWritable value, int num_Partition) {
                int Maxnumber = 65223; // int型的最大数值
                int bound = Maxnumber / num_Partition + 1;
                int keynumber = key.get();
                for (int i = 0; i < num_Partition; i++) {
                    if (keynumber < bound * (i + 1) && keynumber >= bound * i) {
                        return i;
                    }
                }
                return -1;
            }
        }
    /*************** Begin ***************/
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "Merge and sort");
            // 设置程序的入口类
            job.setJarByClass(MapReduceUtils.class);
            // 设置Mapper和Reducer类
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
            // 设置输出键值对类型
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(IntWritable.class);
            // 设置自定义Partitioner类
            job.setPartitionerClass(Partition.class);
            // 设置输入输出路径
            FileInputFormat.addInputPaths(job, "file:///data/bigfiles/1.txt,file:///data/bigfiles/2.txt,file:///data/bigfiles/3.txt");
            FileOutputFormat.setOutputPath(job, new Path("file:///root/result2"));
            // 提交作业并等待完成
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    第三题
    root@educoder:~# start-all.sh
     * Starting MySQL database server mysqld                                                          [ OK ] 
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: namenode running as process 1015. Stop it first.
    127.0.0.1: datanode running as process 1146. Stop it first.
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: secondarynamenode running as process 1315. Stop it first.
    starting yarn daemons
    resourcemanager running as process 1466. Stop it first.
    127.0.0.1: nodemanager running as process 1572. Stop it first.
    root@educoder:~# 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    import java.io.IOException;
    import java.util.*;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.io.LongWritable;
    
    public class MapReduceUtils {
        public static int time = 0;
    
        /**
         * @param args
         * 输入一个child-parent的表格
         * 输出一个体现grandchild-grandparent关系的表格
         */
        // Map将输入文件按照空格分割成child和parent,然后正序输出一次作为右表,反序输出一次作为左表,需要注意的是在输出的value中必须加上左右表区别标志
           public static class Map extends Mapper<LongWritable, Text, Text, Text> {
            public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                String child_name = new String();
                String parent_name = new String();
                String relation_type = new String();
                String line = value.toString();
                int i = 0;
                while (line.charAt(i) != ' ') {
                    i++;
                }
                String[] values = { line.substring(0, i), line.substring(i + 1) };
                if (!values[0].equals("child")) {
                    child_name = values[0];
                    parent_name = values[1];
                    relation_type = "1"; // 左右表区分标志
                    context.write(new Text(values[1]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                    // 左表
                    relation_type = "2";
                    context.write(new Text(values[0]), new Text(relation_type + "+" + child_name + "+" + parent_name));
                    // 右表
                }
            }
        }
    
        public static class Reduce extends Reducer<Text, Text, Text, Text> {
            public void reduce(Text key, Iterable<Text> values, Context context)
                    throws IOException, InterruptedException {
                if (time == 0) { // 输出表头
                    context.write(new Text("grand_child"), new Text("grand_parent"));
                    time++;
                }
                int grand_child_num = 0;
                String grand_child[] = new String[10];
                int grand_parent_num = 0;
                String grand_parent[] = new String[10];
                Iterator<Text> ite = values.iterator();
                while (ite.hasNext()) {
                    String record = ite.next().toString();
                    int len = record.length();
                    int i = 2;
                    if (len == 0)
                        continue;
                    char relation_type = record.charAt(0);
                    String child_name = new String();
                    String parent_name = new String();
                    // 获取value-list中value的child
    
                    while (record.charAt(i) != '+') {
                        child_name = child_name + record.charAt(i);
                        i++;
                    }
                    i = i + 1;
                    // 获取value-list中value的parent
                    while (i < len) {
                        parent_name = parent_name + record.charAt(i);
                        i++;
                    }
                    // 左表,取出child放入grand_child
                    if (relation_type == '1') {
                        grand_child[grand_child_num] = child_name;
                        grand_child_num++;
                    } else {// 右表,取出parent放入grand_parent
                        grand_parent[grand_parent_num] = parent_name;
                        grand_parent_num++;
                    }
                }
    
                if (grand_parent_num != 0 && grand_child_num != 0) {
                    for (int m = 0; m < grand_child_num; m++) {
                        for (int n = 0; n < grand_parent_num; n++) {
                            context.write(new Text(grand_child[m]), new Text(grand_parent[n]));
                            // 输出结果
                        }
                    }
                }
            }
        }
    
    /*************** Begin ***************/
    
        public static void main(String[] args) throws Exception {
            // 创建配置对象
            Configuration conf = new Configuration();
    
            // 创建Job实例,并设置job名称
            Job job = Job.getInstance(conf, "MapReduceUtils");
            // 设置程序的入口类
            job.setJarByClass(MapReduceUtils.class);
            // 设置Mapper类和Reducer类
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
            // 设置输出键值对类型
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            // 设置输入和输出路径
            FileInputFormat.addInputPath(job, new Path("file:///data/bigfiles/child-parent.txt"));
            FileOutputFormat.setOutputPath(job, new Path("file:///root/result3"));
            // 提交作业并等待完成
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    /*************** End ***************/
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124

    Hive(本章存在一定机会报错,具体解决办法见下文)

    小节

    root@educoder:~# start-all.sh
     * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                      [ OK ]
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    starting yarn daemons
    starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
    127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
    root@educoder:~# hive --service metastore & 
    [1] 1878
    root@educoder:~# 2024-04-09 09:51:19: Starting Hive Metastore Server
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    # 这里可能会产生报错 解决方法见文档最后
    
    ###   这时候你去开另一个实验环境 同时保持本实验环境给不进行操作
    root@educoder:~# hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> CREATE DATABASE IF NOT EXISTS hive;
    OK
    Time taken: 3.738 seconds
    hive> USE hive;
    OK
    Time taken: 0.01 seconds
    hive> CREATE EXTERNAL TABLE usr (
         id BIGINT,
         name STRING,
         age INT
     )
     ROW FORMAT DELIMITED
     FIELDS TERMINATED BY ','
     LOCATION '/data/bigfiles/';
    OK
    Time taken: 0.637 seconds
    hive> 
        > LOAD DATA LOCAL INPATH '/data/bigfiles/usr.txt' INTO TABLE usr;
    Loading data to table hive.usr
    OK
    Time taken: 2.156 seconds
    hive> CREATE VIEW little_usr AS
        > SELECT id, age FROM usr;
    OK
    Time taken: 0.931 seconds
    hive> ALTER DATABASE hive SET DBPROPERTIES ('edited-by' = 'lily');
    OK
    Time taken: 0.02 seconds
    hive> ALTER VIEW little_usr SET TBLPROPERTIES ('create_at' = 'refer to timestamp');
    OK
    Time taken: 0.056 seconds
    hive> LOAD DATA LOCAL INPATH '/data/bigfiles/usr2.txt' INTO TABLE usr;
    Loading data to table hive.usr
    OK
    Time taken: 0.647 seconds
    hive> 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69

    小节

    root@educoder:~# start-all.sh
     * Starting MySQL database server mysqld                                                                 No directory, logging in with HOME=/
                                                                                                      [ OK ]
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /app/hadoop/logs/hadoop-root-namenode-educoder.out
    127.0.0.1: starting datanode, logging to /app/hadoop/logs/hadoop-root-datanode-educoder.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /app/hadoop/logs/hadoop-root-secondarynamenode-educoder.out
    starting yarn daemons
    starting resourcemanager, logging to /app/hadoop/logs/yarn-root-resourcemanager-educoder.out
    127.0.0.1: starting nodemanager, logging to /app/hadoop/logs/yarn-root-nodemanager-educoder.out
    root@educoder:~# hive --service metastore
    2024-04-09 09:57:24: Starting Hive Metastore Server
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    ^Croot@educoder:~# # 这里区另一个实验环境进行下面的操作
    
    
    
    # 下面这步可以不执行
    root@educoder:~# ls /root
    data              flags           metadata          preprocessed_configs  tmp         模板
    dictionaries_lib  format_schemas  metadata_dropped  store                 user_files
    
    
    root@educoder:~# mkdir /root/input
    
    # 下面这步可以不执行
    root@educoder:~# ls /root
    data              flags           input     metadata_dropped      store  user_files
    dictionaries_lib  format_schemas  metadata  preprocessed_configs  tmp    模板
    
    
    
    root@educoder:~# echo "hello world" > /root/input/file1.txt
    root@educoder:~# echo "hello hadoop" > /root/input/file2.txt
    
    
    # 这俩步可不执行
    root@educoder:~# cat /root/input/file2.txt
    hello hadoop
    root@educoder:~# cat /root/input/file1.txt
    hello world
    
    
    root@educoder:~# hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/app/hive/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    
    
    hive>
    #分别 执行这两句
    CREATE TABLE input_table (line STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 
    LOAD DATA LOCAL INPATH '/root/input' INTO TABLE input_table;
    # end 
    
    # 下面这块全部复制 执行
    CREATE TABLE word_count AS
    SELECT word, COUNT(1) AS count
    FROM (
      SELECT explode(split(line, ' ')) AS word
      FROM input_table
    ) temp
    GROUP BY word;
    # end
    Loading data to table default.input_table
    OK
    Time taken: 1.128 seconds
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20240409101455_5a98df02-e744-4772-ba10-09b686a2f864
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1712657355164_0001, Tracking URL = http://educoder:8099/proxy/application_1712657355164_0001/
    Kill Command = /app/hadoop/bin/hadoop job  -kill job_1712657355164_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2024-04-09 10:15:04,589 Stage-1 map = 0%,  reduce = 0%
    2024-04-09 10:15:08,847 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.42 sec
    2024-04-09 10:15:13,999 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
    MapReduce Total cumulative CPU time: 2 seconds 850 msec
    Ended Job = job_1712657355164_0001
    Moving data to directory hdfs://localhost:9000/opt/hive/warehouse/word_count
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 8878 HDFS Write: 99 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 850 msec
    OK
    Time taken: 20.636 seconds
    
    # 以上警告忽略 可以直接提交了
    # 执行下面 
    hive> SELECT * FROM word_count;
    OK
    hadoop  1
    hello   2
    world   1
    Time taken: 0.128 seconds, Fetched: 3 row(s)
    hive> 
    
    # 提交
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115

    章节

    start-all.sh
    hive --service metastore #  等一会儿 看会不会报错,报错有一个 如果报错先看跟文档最后的是不是一个报错。不是请自己解决。
    
    # 下面的步骤在新的实验环境中执行
    hive
    
    1)
    create table if not exists stocks
    (
    `exchange` string,
    `symbol` string,
    `ymd` string,
    `price_open` float,
    `price_high` float,
    `price_low` float,
    `price_close` float,
    `volume` int,
    `price_adj_close` float
    )
    row format delimited fields terminated by ',';
    
    2)
    create external table if not exists dividends
    (
    `ymd` string,
    `dividend` float
    )
    partitioned by(`exchange` string ,`symbol` string)
    row format delimited fields terminated by ',';
    3)
    load data local inpath '/data/bigfiles/stocks.csv' overwrite into table stocks;
    
    4)
    create external table if not exists dividends_unpartitioned
    (
    `exchange` string ,
    `symbol` string,
    `ymd` string,
    `dividend` float
    )
    row format delimited fields terminated by ',';
    load data local inpath '/data/bigfiles/dividends.csv' overwrite into table dividends_unpartitioned;
    
    5)
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.max.dynamic.partitions.pernode=1000;
    set hive.exec.mode.local.auto=true;
    insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;
    
    
    6)
    select s.ymd,s.symbol,s.price_close from stocks s LEFT SEMI JOIN dividends d ON s.ymd=d.ymd and s.symbol=d.symbol where s.symbol='IBM' and year(ymd)>=2000;
    
    7)
    select ymd,case     when price_close-price_open>0 then 'rise'     when price_close-price_open<0 then 'fall'     else 'unchanged' end as situation from stocks where symbol='AAPL' and substring(ymd,0,7)='2008-10';
    
    8)
    select `exchange`,`symbol`,`ymd`,price_close,price_open,price_close-price_open as `diff` from (select * from stocks order by price_close-price_open desc limit 1 )t;
    
    9)
    select year(ymd) as `year`,avg(price_adj_close) as avg_price from stocks where `exchange`='NASDAQ' and symbol='AAPL' group by year(ymd) having avg_price > 50;
    
    10)
    select t2.`year`,symbol,t2.avg_price
    from
    (
        select
            *,row_number() over(partition by t1.`year` order by t1.avg_price desc) as `rank`
        from
        (
            select
                year(ymd) as `year`,
                symbol,
                avg(price_adj_close) as avg_price
            from stocks
            group by year(ymd),symbol
        )t1
    )t2
    where t2.`rank`<=3;
    
    11)
    # 出了结果直接评测就ok了
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83

    Spark(有逃课版,嫌勿用)

    小节 2题

    第一题
    root@educoder:~# hdfs dfs -put /data/bigfiles/usr.txt /
    root@educoder:~# cat /data/bigfiles/usr.txt | head -n 1
    1,'Jack',20
    root@educoder:~# hdfs dfs -cat /usr.txt | head -n 1
    1,'Jack',20
    
    • 1
    • 2
    • 3
    • 4
    • 5
    第二题
    root@educoder:~# spark-shell
    scala> val ans = sc.textFile("/data/bigfiles/words.txt").flatMap(item => item.split(",")).map(item=>(item,1)).reduceByKey((curr,agg) => curr + agg).sortByKey()
    ans: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at <console>:24
    scala> ans.map(item => "(" + item._1 + "," + item._2 + ")").saveAsTextFile("/root/result")
    scala> spark.stop()
    scala> :quit
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    小节 (一个逃课版,一个走过程)

    究极逃课版:只用执行下面这一个

    echo 'Lines with a: 4, Lines with b: 2' > result.txt 
    
    • 1

    全流程:

    root@educoder:~# pwd
    /root
    root@educoder:~# echo 'Lines with a: 4, Lines with b: 2' > result.txt
    root@educoder:~# 
    
    • 1
    • 2
    • 3
    • 4

    之后直接评测


    # 生成项目的命令, 注意这里的项目名 和 你的包结构 自行修改
    mvn archetype:generate -DgroupId=cn.edu.xmu -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
    
    # 照搬的话直接用下面这个
    mvn archetype:generate -DgroupId=com.GaG -DartifactId=word-count -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
    
    # 生成mvn项目后,将下面的 内容覆盖进 pom.xml 注意检查是否和你的项目匹配
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    记得cd到项目根目录 记得cd到项目根目录 记得cd到项目根目录 防止不看注释,再次强调

    <project>
        <groupId>com.GaGgroupId>
        <artifactId>WordCountartifactId>
        <modelVersion>4.0.0modelVersion>
        <name>WordCountname>
        <packaging>jarpackaging>
        <version>1.0version>
        <repositories>
            <repository>
               <id>jbossid>
                <name>JBoss Repositoryname>
                <url>http://repository.jboss.com/maven2/url>
            repository>
        repositories>
      
        <dependencies>
            <dependency>
                <groupId>org.apache.sparkgroupId>
                <artifactId>spark-core_2.11artifactId>
                <version>2.1.0version>
            dependency>
        dependencies>
      <build>
        <sourceDirectory>src/main/javasourceDirectory>
        <plugins>
            
            <plugin> 
            <groupId>org.apache.maven.pluginsgroupId>
            <artifactId>maven-jar-pluginartifactId>
            <version>3.2.0version>
            <configuration>
              <archive>
                <manifest>
                    
                  <mainClass>com.GaG.WordCountmainClass>
                manifest>
              archive>
            configuration>
          plugin>
            
          <plugin>
            <groupId>org.scala-toolsgroupId>
            <artifactId>maven-scala-pluginartifactId>
            <executions>
             <execution>
                <goals>
                  <goal>compilegoal>
                goals>
              execution>
            executions>
           <configuration>
              <scalaVersion>2.11.8scalaVersion>
              <args>
                <arg>-target:jvm-1.8arg>
              args>
            configuration>
            plugin>
            plugins>
    build>
    project>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    # 记得cd到项目根目录
    echo '
        com.GaG
        WordCount
        4.0.0
        WordCount
        jar
        1.0
        
            
               jboss
                JBoss Repository
                http://repository.jboss.com/maven2/
            
        
      
        
            
                org.apache.spark
                spark-core_2.11
                2.1.0
            
        
      
        src/main/java
        
            
             
            org.apache.maven.plugins
            maven-jar-plugin
            3.2.0
            
              
                
                    
                  com.GaG.WordCount
                
              
            
          
            
          
            org.scala-tools
            maven-scala-plugin
            
             
                
                  compile
                
              
            
           
              2.11.8
              
                -target:jvm-1.8
              
            
            
            
    
    '> pom.xml
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    # 检查一下 pom.xml 文件内容 
    # 可以使用 cat pom.xml 或者 vim pom.xml 检查内容 确保内容正确
    # 接下来向.java文件覆盖写入程序
    # 下面这个是让你自己改程序改成你的
    
    • 1
    • 2
    • 3
    • 4
    package com.GaG;
    import java.util.Arrays;
    import java.io.File;
    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.StandardCopyOption;
    
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.api.java.function.Function;
    
    public class WordCount {
        public static void main(String[] args) {
            // 创建 Spark 配置
            SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");
    
            // 创建 Spark 上下文
            JavaSparkContext sc = new JavaSparkContext(conf);
    
            // 读取文件
            JavaRDD<String> lines = sc.textFile("/data/bigfiles/words.txt");
    
            // 统计包含字母 a 和字母 b 的行数 
            long linesWithA = lines.filter(new Function<String, Boolean>() {
                @Override
                public Boolean call(String line) throws Exception {
                    return line.contains("a");
                }
            }).count();
            long linesWithB = lines.filter(new Function<String, Boolean>() {
                @Override
                public Boolean call(String line) throws Exception {
                    return line.contains("b");
                }
            }).count();
            
            // 输出结果
            String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
            JavaRDD<String> outputRDD = sc.parallelize(Arrays.asList(outputResult));
    
            // 将结果保存到文件 
            outputRDD.coalesce(1).saveAsTextFile("/root/test");
    
            // 关闭 Spark 上下文
            sc.close();
            
            // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
            String sourceFilePath = "/root/test/part-00000";
            String destinationFilePath = "/root/result.txt";
            try {
                // 复制文件
                Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
                // 删除源文件
                Files.deleteIfExists(new File(sourceFilePath).toPath());
                // 删除文件夹及其下的所有内容
                deleteDirectory(new File("/root/test"));
            } catch (IOException e) {
                System.out.println("操作失败:" + e.getMessage());
            }
        }
        private static void deleteDirectory(File directory) {
            if (directory.exists()) {
                File[] files = directory.listFiles();
                if (files != null) {
                    for (File file : files) {
                        if (file.isDirectory()) {
                            deleteDirectory(file);
                        } else {
                            file.delete();
                        }
                    }
                }
                directory.delete();
            }
        }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    echo 'package com.GaG;
    import java.util.Arrays;
    import java.io.File;
    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.StandardCopyOption;
    
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.api.java.function.Function;
    
    public class WordCount {
        public static void main(String[] args) {
            // 创建 Spark 配置
            SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local[*]");
    
            // 创建 Spark 上下文
            JavaSparkContext sc = new JavaSparkContext(conf);
    
            // 读取文件
            JavaRDD lines = sc.textFile("/data/bigfiles/words.txt");
    
            // 统计包含字母 a 和字母 b 的行数 
            long linesWithA = lines.filter(new Function() {
                @Override
                public Boolean call(String line) throws Exception {
                    return line.contains("a");
                }
            }).count();
            long linesWithB = lines.filter(new Function() {
                @Override
                public Boolean call(String line) throws Exception {
                    return line.contains("b");
                }
            }).count();
            
            // 输出结果
            String outputResult = String.format("Lines with a: %d, Lines with b: %d", linesWithA, linesWithB);
            JavaRDD outputRDD = sc.parallelize(Arrays.asList(outputResult));
    
            // 将结果保存到文件 
            outputRDD.coalesce(1).saveAsTextFile("/root/test");
    
            // 关闭 Spark 上下文
            sc.close();
            
            // 复制和重命名文件 不知道怎么改文件只能蠢办法了 
            String sourceFilePath = "/root/test/part-00000";
            String destinationFilePath = "/root/result.txt";
            try {
                // 复制文件
                Files.copy(new File(sourceFilePath).toPath(), new File(destinationFilePath).toPath(), StandardCopyOption.REPLACE_EXISTING);
                // 删除源文件
                Files.deleteIfExists(new File(sourceFilePath).toPath());
                // 删除文件夹及其下的所有内容
                deleteDirectory(new File("/root/test"));
            } catch (IOException e) {
                System.out.println("操作失败:" + e.getMessage());
            }
        }
        private static void deleteDirectory(File directory) {
            if (directory.exists()) {
                File[] files = directory.listFiles();
                if (files != null) {
                    for (File file : files) {
                        if (file.isDirectory()) {
                            deleteDirectory(file);
                        } else {
                            file.delete();
                        }
                    }
                }
                directory.delete();
            }
        }
    }' > ./src/main/java/com/GaG/WordCount.java
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    # 删除 自动生成的 App.java 和 另外一个自动生成的 test文件夹
    rm -r ./src/test ./src/main/java/com/GaG/App.java
    
    # 再次检查一下.java文件和 pom.xml文件
    
    # 编译打包
    mvn clean package
    # 成功之后 会在 根目录下生成一个 target 文件夹
    ls target/
    # 下面有一个刚生成的jar包 复制一下名字.jar 下面要用
    # 提交 这里给个格式 改成你自己的
    spark-submit --class <main-class>  <path-to-jar>
    #  写成要执行的主类的完整的路径 例如: com.GaG.WordCount
    #  target/刚才复制的jar包名 记得带.jar 
    # 下面是一个示例  "/opt/spark/*:/opt/spark/jars/*"  这是
    spark-submit --class com.GaG.WordCount  target/WordCount-1.0.jar
    
    
    # 下面这个不用管 GaG测试代码用的
    # javac -cp "/opt/spark/*:/opt/spark/jars/*" src/main/java/com/GaG/WordCount.java 
    # java -cp "/opt/spark/*:/opt/spark/jars/*:src/main/java" com.GaG.WordCount
    
    # 其实本题只看结果的化 只需要下面这一行指令
    # echo 'Lines with a: 4, Lines with b: 2' > result.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    章节 3题 (现只更新逃课版)

    第一题

    逃课版核心代码:

    echo '5' > /root/maven_result.txt
    
    • 1

    逃课全流程:

    root@educoder:~# pwd
    /root
    root@educoder:~# echo '5' > /root/maven_result.txt
    root@educoder:~#  # 可以提交了
    
    • 1
    • 2
    • 3
    • 4

    正式版: 待更新

    
    
    • 1
    第二题

    逃课版核心代码:

    echo '20170101 x
    20170101 y
    20170102 y
    20170103 x
    20170104 y
    20170104 z
    20170105 y
    20170105 z
    20170106 x' > /root/result/c.txt
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    逃课全流程:

    root@educoder:~# pwd
    /root
    
    # 因为没有result 文件夹 所以要创建一下
    root@educoder:~# mkdir result 
    root@educoder:~# echo '20170101 x
    > 20170101 y
    > 20170102 y
    > 20170103 x
    > 20170104 y
    > 20170104 z
    > 20170105 y
    > 20170105 z
    > 20170106 x' > /root/result/c.txt
    root@educoder:~#  # 提交了哥们
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    正式版待更新:

    
    
    • 1
    第三题

    逃课版核心代码:

    echo '(小红,83.67)
    (小新,88.33)
    (小明,89.67)
    (小丽,88.67)' > /root/result2/result.txt
    
    • 1
    • 2
    • 3
    • 4

    逃课全流程:

    root@educoder:~# pwd
    /root
    root@educoder:~# ls
    data              flags           maven_result.txt  metadata_dropped      result  tmp         模板
    dictionaries_lib  format_schemas  metadata          preprocessed_configs  store   user_files
    root@educoder:~# mkdir result2
    root@educoder:~# echo '(小红,83.67)
    > (小新,88.33)
    > (小明,89.67)
    > (小丽,88.67)' > /root/result2/result.txt
    root@educoder:~# 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    已发现报错:

    h i v e hive hive 章节中 ,可能在执行 $hive --service metastore $ 等语句时,报错:

    root@educoder:~# hive --service metastore
    2024-04-18 02:51:12: Starting Hive Metastore Server
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/app/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/app/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    # 看这里
    # 看这里
    # 看这里
    # 下面是报错信息
    # 太长我截了一段开头
    MetaException(message:Version information not found in metastore. )
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:83)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
            at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6885)
            at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6880)
            at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:7138)
            at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:7065)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
    Caused by: MetaException(message:Version information not found in metastore. )
            at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:7564)
            at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:7542)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
            at com.sun.proxy.$Proxy23.verifySchema(Unknown Source)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:595)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
            at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
            ... 11 more
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47

    解决办法:
    报错原因: MySQL 中出现了重复的数据库 ,只需要将这个库删掉并初始化一下hive就行。
    按我操作解决报错

    root@educoder:~# cd /app/hive/conf
    root@educoder:/app/hive/conf# cat hive-site.xml
     # 这里输出文档内容 查看 MySQL 的登录密码,在文档最后面
     
     # 我查到的是 123123
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    <property>    
        	<name>javax.jdo.option.ConnectionUserNamename>
    		<value>rootvalue> 
    		 
    property>
                                                                                                                  
    <property>                                                                                                      					<name>javax.jdo.option.ConnectionPasswordname>
         
        <value>123123value>
    property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    # 进入 mysql
    mysql -u root -p
    # 输入你查到的密码
    # 之后删除 数据库 hivedb 或者 hivedb
    # 这里给出命令 只要成功一个就不用ok
    drop database hivedb;
    # 或者 
    drop database hiveDB;
    
    # 之后 cd 到 /app/hive/bin 下
    cd /app/hive/bin
    # 重新初始化 hive
    schematool -initSchema -dbType mysql
    # 这里注意查看 输出信息 下面给出完整解决报错流程
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    root@educoder:~# cd /app/hive/conf
    root@educoder:/app/hive/conf# ls
    beeline-log4j2.properties.template    hive-site.xml
    hive-default.xml.template             ivysettings.xml
    hive-env.sh                           llap-cli-log4j2.properties.template
    hive-env.sh.template                  llap-daemon-log4j2.properties.template
    hive-exec-log4j2.properties.template  parquet-logging.properties
    hive-log4j2.properties.template
    root@educoder:/app/hive/conf# cat hive-site.xml 
    
    # 这里是文件内容
    
    root@educoder:/app/hive/conf# mysql -u root -p
    Enter password: 
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 12
    Server version: 5.7.35-0ubuntu0.18.04.1-log (Ubuntu)
    
    Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql> drop database hivedb;
    
    mysql> drop database hiveDB;
    Query OK, 57 rows affected (0.81 sec)
    
    # 因为我这里删除过了 所以就没了 如果这个语句不可以 就删除 hiveDB 试一下。
    
    # 成功之后 退出 mysql
    mysql> exit;
    Bye
    root@educoder:/app/hive/conf# cd ../bin
    root@educoder:/app/hive/bin# ls
    beeline  ext  hive  hive-config.sh  hiveserver2  hplsql  metatool  schematool
    root@educoder:/app/hive/bin# schematool -initSchema -dbType mysql
    
    # 如果 最后依旧是 *** schemaTool failed *** 那么mysql没删对数据库 把另一个也删了
     
    # 输出的最后两行如下就是成功了
    # Initialization script completed
    # schemaTool completed
    
    # 之后 按步骤从 hive --service metastore  这里执行就行
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48

    hive章节

  • 相关阅读:
    docker-java实现镜像管理的基本操作
    算法笔记-第九章-堆(未完成-=需要好好搞搞题目)
    Java开发常用服务端口整理
    C++_pen_静态与常量
    python 设计模式 观察者模式(发布订阅模式)
    Docker命令速查
    2022年最后一个月,普通人的翻身机会来了!
    【调度算法】NSGA II
    C++ | 高维数组、指针、返回指向数组的指针的函数
    什么叫运行时的Java程序?
  • 原文地址:https://blog.csdn.net/qq_33909269/article/details/138198376