Hive SQL ——窗口函数源码阅读

前言

使用Starrocks引擎中的窗口函数 row_number() over( )对10亿的数据集进行去重操作，BE内存溢出问题频发（忘记当时指定的BE内存上限是多少了.....），此时才意识到，开窗操作，如果使用不当，反而更容易引发性能问题。下文是对Hive中的窗口函数底层源码进行初步学习，若有问题，请指正~

一、窗口函数的执行步骤

（1）将数据分割成多个分区；

（2）在各个分区上调用窗口函数；

由于窗口函数的返回结果不是一个聚合值，而是另一张表的格式（table-in, table-out），因此Hive社区引入分区表函数 Partitioned Table Function（PTF）。

简略的代码流转图：

hive会把QueryBlock，翻译成执行操作数OperatorTree，其中每个operator都会有三个重要的方法：

initializeOp() ：初始化算子
process() ：执行每一行数据
forward() ：把处理好的每一行数据发送到下个Operator

当遇到窗口函数时，会生成PTFOperator，PTFOperator依赖PTFInvocation 读取已经排好序的数据，创建相应的输入分区：PTFPartition inputPart;

WindowTableFunction 负责管理窗口帧、调用窗口函数（UDAF）、并将结果写入输出分区: PTFPartition outputPart。

二、源码分析

2.1 PTFOperator 类

是PartitionedTableFunction的运算符，继承Operator抽象类（Hive运算符基类）

重写process(Object row, int tag) 方法，该方法来处理一行数据Row


@Override
    public void process(Object row, int tag) throws HiveException {
        if (!isMapOperator) {
            /*
             * check if current row belongs to the current accumulated Partition:
             * - If not:
             *  - process the current Partition
             *  - reset input Partition
             * - set currentKey to the newKey if it is null or has changed.
             */
            newKeys.getNewKey(row, inputObjInspectors[0]);
            //会判断当前row所属的Key（newKeys）是否等于当前正在累积数据的partition所属的key（currentKeys）
            boolean keysAreEqual = (currentKeys != null && newKeys != null) ?
                    newKeys.equals(currentKeys) : false;
            // 如果不相等，就结束当前partition分区的数据累积，触发窗口计算
            if (currentKeys != null && !keysAreEqual) {
                // 关闭正在积累的分区
                ptfInvocation.finishPartition();
            }
            // 如果currentKeys为空或者被改变，就将newKeys赋值给currentKeys
            if (currentKeys == null || !keysAreEqual) {
                // 开启一个新的分区partition
                ptfInvocation.startPartition();
                if (currentKeys == null) {
                    currentKeys = newKeys.copyKey();
                } else {
                    currentKeys.copyKey(newKeys);
                }
            }
        } else if (firstMapRow) { // 说明当前row是进入的第一行
            ptfInvocation.startPartition();
            firstMapRow = false;
        }
        // 将数据row添加到分区中，积累数据
        ptfInvocation.processRow(row);
    }

上面的代码可以看出，所有数据应该是按照分区排好了序，排队进入process方法，当遇到进入的row和当前分区不是同一个key时，当前分区就可以关闭了，然后在打开下一个分区。

2.2 PTFInvocation类

PTFInvocation是PTFOperator类的内部类

在PTFOperator的初始化方法中创建了实例。


@Override
  protected void initializeOp(Configuration jobConf) throws HiveException {
    ...
    ptfInvocation = setupChain();
    ptfInvocation.initializeStreaming(jobConf, isMapOperator);
    ...
  }

它的主要作用是负责PTF 数据链中行（ row）的流动，通过 ptfInvocation.processRow(row) 方法调用传递链中的每一行，并且通过ptfInvocation.startPartition()、ptfInvocation.finishPartition()方法来通知分区何时开始何时结束。

该类中包含TableFunction，用来处理分区数据。


PTFPartition inputPart; // inputPart理解为：分区对象，一直是在复用一个inputPart
TableFunctionEvaluator tabFn; // tabFn理解为：窗口函数的实例
 
//向分区中添加一行数据
void processRow(Object row) throws HiveException {
    if (isStreaming()) {
            // tabFn是窗口函数的实例
        handleOutputRows(tabFn.processRow(row));
    } else {
        // inputPart就是当前正在累积数据的分区
        inputPart.append(row);
    }
}
 
// 开启一个分区
void startPartition() throws HiveException {
    if (isStreaming()) {
        tabFn.startPartition();
    } else {
        if (prev == null || prev.isOutputIterator()) {
            if (inputPart == null) {
                // 创建新分区对象：PTFPartition对象
                createInputPartition();
            } else {
                // 重置分区
                inputPart.reset();
            }
        }
    }
    if (next != null) {
        next.startPartition();
    }
}
 
// 关闭一个分区
void finishPartition() throws HiveException {
    if (isStreaming()) {
        handleOutputRows(tabFn.finishPartition());
    } else {
        if (tabFn.canIterateOutput()) {
            outputPartRowsItr = inputPart == null ? null :
                    tabFn.iterator(inputPart.iterator());
        } else {
            // tabFn是窗口函数的实例，execute方法：执行窗口函数逻辑的计算，返回outputPart依旧是一个分区对象
            outputPart = inputPart == null ? null : tabFn.execute(inputPart);
            outputPartRowsItr = outputPart == null ? null : outputPart.iterator();
        }
        if (next != null) {
            if (!next.isStreaming() && !isOutputIterator()) {
                next.inputPart = outputPart;
            } else {
                if (outputPartRowsItr != null) {
                    while (outputPartRowsItr.hasNext()) {
                        next.processRow(outputPartRowsItr.next());
                    }
                }
            }
        }
 
    if (next != null) {
        next.finishPartition();
    } else {
        if (!isStreaming()) {
            if (outputPartRowsItr != null) {
                while (outputPartRowsItr.hasNext()) {
                    // 将窗口函数计算结果逐条输出到下一个Operator中
                    forward(outputPartRowsItr.next(), outputObjInspector);
                }
            }
        }
    }
}

2.3 PTFPartition类

该类表示由TableFunction或WindowFunction来处理的行集合，使用PTFRowContainer来保存数据。


private final PTFRowContainer> elems; // 存放数据的容器
 
public void append(Object o) throws HiveException {
  //在往PTFPartition中添加数据时，如果当前累计条数超过了Int最大值(21亿)，会抛异常。
    if (elems.rowCount() == Integer.MAX_VALUE) {
        throw new HiveException(String.format("Cannot add more than %d elements to a PTFPartition",
                Integer.MAX_VALUE));
    }
 
    @SuppressWarnings("unchecked")
    List

Hive SQL ——窗口函数源码阅读

前言

一、窗口函数的执行步骤

二、源码分析

2.1 PTFOperator 类

2.2 PTFInvocation类

2.3 PTFPartition类

2.4 TableFunctionEvaluator类

三、Hive SQL窗口函数实现原理

3.1 window函数部分

3.2 窗口定义部分

3.3 window Function实现原理

四、窗口函数的性能问题

4.1 性能问题产生原因

4.1.1 第一个版本

4.1.2 第二个版本

4.2 性能问题的优化方法

4.2.1 用聚合函数替代排序开窗函数

4.2.2 减少数据量

4.2.3 避免多次排序

Hive SQL ——窗口函数源码阅读

前言

一、窗口函数的执行步骤

二、源码分析

2.1 PTFOperator 类

2.2 PTFInvocation类

2.3 PTFPartition类

2.4 TableFunctionEvaluator类

三、Hive SQL窗口函数实现原理

3.1 window函数部分

3.2 窗口定义部分

3.3 window Function实现原理

四、窗口函数的性能问题

4.1 性能问题产生原因

4.1.1 第一个版本

4.1.2 第二个版本

4.2 性能问题的优化方法

4.2.1 用聚合函数替代 排序开窗函数

4.2.2 减少数据量

4.2.3 避免多次排序

4.2.1 用聚合函数替代排序开窗函数