• hadoop项目之求出每年二月的最高气温(Combiner优化)


    hadoop项目之求出每年二月的最高气温(Combiner优化)

    一、项目结构

    image-20220830205325932

    image-20220830205359558

    一、java实现随机生成日期和气温

    package com.shujia.weather;
    
    import java.io.BufferedWriter;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.text.DateFormat;
    import java.text.ParseException;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    
    public class RandomWeather {
        public static void main(String[] args) throws ParseException, IOException {
            //创建日期格式
            DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            long start = sdf.parse("2000-01-01 00:00:00").getTime();
            long end = sdf.parse("2022-12-31 00:00:00").getTime();
            long difference=end - start;
    
            BufferedWriter bw = new BufferedWriter(new FileWriter("F:\\software\\IdeaProjects\\bigdata19-project\\biddata19-mapreduce\\src\\data\\weather.txt"));
            for (int i=0;i<10000;i++){
                //随机生成时间
                Date date = new Date(start + (long) (Math.random() * difference));
                //随机生成一个温度
                int temperature = -20+(int) (Math.random()*60);
                //打印
    //            System.out.println(date+"\t"+temperature);
                bw.write(sdf.format(date)+"\t"+temperature);//将结果写入文件
                bw.newLine();
                bw.flush();
            }
            bw.close();
    
    
        }
    }
    
    

    二、将这个weather.txt文件上传到虚拟机后再上传到hadoop

    1、通过xftp上传文件
    2、通过命令上传到hadoop
    hadoop fs -put weather.txt /路径
    

    三、项目实现

    package com.shujia.weather;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    import java.io.IOException;
    
    
    class WeatherMapper extends Mapper{
        /*
        2022-06-12 02:40:26	21
        2002-01-03 03:49:27	-13
        2001-04-21 19:19:22	-16
        2005-01-18 01:52:15	10
        求出每年二月份的最高气温
         */
    
        @Override
        protected void map(LongWritable key, Text value, Mapper.Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] str = line.split("\t");
            String temperature = str[1];
            String[] strings = str[0].split("-");
            String Month = strings[1];
            if ("02".equals(Month)){
                context.write(new Text(strings[0]+"-"+Month),new LongWritable(Long.parseLong(temperature)));
            }
    
        }
    }
    
    class WeatherReducer extends Reducer{
        @Override
        protected void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
            long max=0L;
            for (LongWritable value : values) {
                long l = value.get();
                if (l>max){
                    max=l;
                }
            }
            context.write(key,new LongWritable(max));
        }
    }
    
    public class WeatherDemo {
        public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf);
    
            job.setCombinerClass(WeatherReducer.class);//Combiner优化
            job.setJarByClass(WeatherDemo.class);
            job.setMapperClass(WeatherMapper.class);
            job.setReducerClass(WeatherReducer.class);
    
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);
    
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(LongWritable.class);
    
            FileInputFormat.setInputPaths(job,new Path(args[0]));
            FileOutputFormat.setOutputPath(job,new Path(args[1]));
    
            job.waitForCompletion(true);
        }
    }
    
    

    优化前

    image-20220830210856305

    优化后

    image-20220830210944445

    减少了reduce 从map拉取数据的过程,提高计算效率。

    hadoop 的计算特点:将计算任务向数据靠拢,而不是将数据向计算靠拢。

    注意:将reduce端的聚合操作,放到map 进行执行。适合求和,计数,等一些等幂操作。不适合求平均值,次幂等类似操作

  • 相关阅读:
    力扣 792. 匹配子序列的单词数
    测试工程师的简历到底怎么写?
    前端八股文85-141
    这些大厂笔试题 你都见识(被无情鞭挞)过了吗?—— 瓜子二手车篇
    关于序列化协议,你需要知道的一些内容(2)
    【开发篇】十七、消息:模拟订单短信通知
    分享几个常问的vue3面试题!
    CCL2022自然语言处理国际前沿动态综述——开放域对话生成前沿综述
    矩阵分析与应用+张贤达
    PostgreSQL 修改数据库用户的密码
  • 原文地址:https://www.cnblogs.com/bfy0221/p/16640855.html