Elasticsearch(三)聚合基本使用

基础概念

bucket

数据分组，一些数据按照某个字段进行bucket划分，这个字段值相同的数据放到一个bucket中。可以理解成Java中的Map结构，类似于Mysql中的group by后的查询结果。

metric：

对一个数据分组执行的统计，比如计算最大值，最小值，平均值等类似于Mysql中的max(),min(),avg()函数的值，都是在group by后使用的。

案例

以如下文档结构为例：

{
  "_index" : "zb_notice",
  "_type" : "_doc",
  "_id" : "4451224572914342308301065",
  "_score" : 1.0,
  "_source" : {
    "_class" : "NoticeEntity",
    "id" : "111",
    "url" : "https://xxxxxx/purchaseNotice/view/111?",
    "owner" : "河管养所",
    "procurementName" : "工程建筑",
    "procurementNameText" : "应急抢险配套工程建筑",
    "intermediaryServiceMatters" : "无（属于非行政管理的中介服务项目采购）",
    "investmentApprovalProject" : "是",
    "code" : "789456",
    "scale" : 3.167183E8,
    "scaleText" : "投资额（￥316,718,300.00元）",
    "area" : "",
    "requiredServices" : "工程建筑",
    "typeCodes" : [
      "021"
    ],
    "context" : "是一座具有灌溉 、供水 、排洪 、交通和挡潮蓄淡等多功能的大（2）型水闸工程，承担黄冈河下游 8.65 万亩农田的灌溉任务并",
    "timeLimit" : "具体时限以合同条款约定为准。",
    "amount" : 0.0,
    "amountText" : "暂不做评估与测算",
    "amountDescription" : "",
    "selectIntermediaryType" : "直接选取",
    "isChooseIntermediary" : "否",
    "isAvoidance" : "否",
    "endTime" : "2023-09-04 09:30:00",
    "startTime" : "2023-08-31",
    "files" : [
      {
        "fileName" : "东溪水闸初设批复(1).pdf",
        "url" : "/aa/bb/file/downloadfile/PjAttachment/123456"
      }
    ]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

统计服务类型最多公告

GET zb_notice/_search
{
  "size": 0,
  "aggs": {
    "song_qty_by_language": {
      "terms": {
        "field": "requiredServices"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11

语法解释：

size:0 表示只要统计后的结果，原始数据不展现
aggs：固定语法，聚合分析都要声明aggs
song_qty_by_language：聚合的名称，可以随便写，建议规范命名
terms：按什么字段进行分组
field：具体的字段名称

响应结果如下：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "song_qty_by_language": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "doc_count": 5
        }
      ]
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

语法解释：

hits: 由于请求时设置了size:0，hits就是空的
aggregations：聚合查询的结果
song_qty_by_language：请求时声明的名称
buckets：根据指定字段查询后得到的数据分组集合，[]内的是每一个数据分组，其中key为每个bucket的对应指定字段的值，doc_count为统计的数量。

默认按doc_count降序排序。

按服务分类的平均服务价格

GET zb_notice/_search
{
  "size": 0,
  "aggs": {
    "lang": {
      "terms": {
        "field": "requiredServices"
      },
      "aggs": {
        "length_avg": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

这里为两层aggs聚合查询，先按服务类型统计，得到数据分组，再在数据分组里算平均价格。

多个aggs嵌套语法也是如此，aggs代码块的位置即可。

统计最多服务费、最少服务费等的公告

最常用的统计：count，avg，max，min，sum，语法含义与mysql相同。

GET zb_notice/_search
{
  "size": 0,
  "aggs": {
    "color": {
      "terms": {
        "field": "requiredServices"
      },
      "aggs": {
        "length_avg": {
          "avg": {
            "field": "amount"
          }
        },
        "length_max": {
          "max": {
            "field": "amount"
          }
        },
        "length_min": {
          "min": {
            "field": "amount"
          }
        },
        "length_sum": {
          "sum": {
            "field": "amount"
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

按上架日期分段统计服务类型数量

按月统计

date histogram与histogram语法类似，搭配date interval指定区间间隔 extended_bounds表示最大的时间范围。

复制代码GET zb_notice/_search
{
  "size": 0,
  "aggs": {
    "sales": {
      "date_histogram": {
        "field": "publishTime",
        "interval": "month",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2023-01-01",
          "max": "2023-12-31"
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

interval的值可以天、周、月、季度、年等。可以延伸一下

GET zb_notice/_search
{
  "size": 0,
  "aggs": {
    "sales": {
      "date_histogram": {
        "field": "publishTime",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2019-01-01",
          "max": "2019-12-31"
        }
      },
      "aggs": {
        "lang_qty": {
          "terms": {
            "field": "requiredServices"
          },
          "aggs": {
            "like_sum": {
              "sum": {
                "field": "amount"
              }
            }
          }
        },
        "total" :{
          "sum": {
            "field": "amount"
          }
        }
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

带上过滤条件

聚合查询可以和query搭配使用，相当于mysql中where与group by联合使用

查询条件

GET zb_notice/_search
{
  "size": 0,
  "query": {
    "match": {
      "requiredServices": "工程咨询"
    }
  },
  "aggs": {
    "sales": {
      "terms": {
        "field": "requiredServices"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

过滤条件

GET zb_notice/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "requiredServices": "工程咨询"
        }
      }
    }
  },
  "aggs": {
    "sales": {
      "terms": {
        "field": "requiredServices"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

相关阅读:
模板的特化
 一个破单机，也要用远程缓存？
一文入门USB设备的驱动编写方法
 Azure OpenAI 服务
 音频采集原理
 LyScript 批量搜索反汇编特征
 Wireshark数据包分析——时间盲注/延时注入攻击
 大数据队列Kafka
如何评价GPT-4o?
java基础（冒泡排序）精简
原文地址：https://blog.csdn.net/qq_42583549/article/details/132795041