ElasticSearch深度分页并可以小幅度跳页的实现

背景

最近项目上有个日志采集，我作为接收端接收udp发送过来的报文数据缓存到es上，然后查询es上的数据分页展示。但是之后我发现es对分页支持很不友好，它分为深分页与浅分页，浅分页就是MySQL里的limit，但是他最大展示长度只能到10000，也就是说当每页100条数据的话，只能翻100页，超过会报错。所以你要么做限制，尽可能的把数据控制在10000条以内，要么对前端翻页进行限制。

下面我们针对es提供的search after深分页来完成小幅跳页的操作，所谓的小幅跳页就是虽然我不能直接从第一页到最后一页，但是我也可以通过缓存游标的方式实现几页几页的跳，search after深分页的方式只能一直往后翻，scroll我不太了解，但是应该原理差不多。

环境

jdk8， es7.6.1， maven3.3.9， springboot2.3.2

代码

添加依赖

<dependencies>
		<dependency>
			<groupId>org.projectlombokgroupId>
			<artifactId>lombokartifactId>
			<version>1.18.20version>
		dependency>

		<dependency>
			<groupId>cn.hutoolgroupId>
			<artifactId>hutool-allartifactId>
			<version>5.8.5version>
		dependency>

		<dependency>
			<groupId>org.springframework.bootgroupId>
			<artifactId>spring-boot-starter-data-elasticsearchartifactId>
		dependency>

		<dependency>
			<groupId>org.springframework.bootgroupId>
			<artifactId>spring-boot-starter-webartifactId>
		dependency>

		<dependency>
			<groupId>org.springframework.bootgroupId>
			<artifactId>spring-boot-starter-testartifactId>
			<scope>testscope>
		dependency>
	dependencies>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

配置

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author 朝花不迟暮
 * @version 1.0
 * @date 2020/9/26 9:08
 */
@Configuration
public class ElasticSearchClientConfig
{
    @Bean
    public RestHighLevelClient restHighLevelClient()
    {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("119.29.10.76", 9200, "http"))
        );
        return client;
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

创建实体

索引类

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;

import java.util.Date;

@Data
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class Document {
    /**
     * es中的唯一id
     */
    private Long id;
    /**
     * 文档标题
     */
    private String title;
    /**
     * 文档内容
     */
    private String content;
    /**
     * 创建时间
     */
    private Date createTime;
    /**
     * 当前时间
     */
    private Long currentTime;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

传输层，当然有冗余设计，各位取其精华去其糟粕吧

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;

import java.util.Date;

@Data
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class DocumentDTO {

    private Integer pageNum = 1;

    private Integer pageSize = 10;

    /**
     * es中的唯一id
     */
    private Long id;

    /**
     * 文档标题
     */
    private String title;

    /**
     * 文档内容
     */
    private String content;

    /**
     * 创建时间
     */
    private Date createTime;

    /**
     * 当前时间
     */
    private Long currentTime;

    /**
     * 开始时间
     */
    private String startTime;

    /**
     * 结束时间
     */
    private String endTime;

    /**
     * 最后一页的游标页码
     */
    private Object[] lastPageSort;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

返回对象，可根据需求自定义

import com.study.sample.entity.Document;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;

import java.util.List;

@Data
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class DocumentVO {

    private Integer pageNum;

    private Integer pageSize;

    private long total;

    private List<Document> data;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

服务层

import cn.hutool.core.map.MapUtil;
import cn.hutool.core.util.StrUtil;
import com.study.sample.entity.Document;
import com.study.sample.entity.dto.DocumentDTO;
import com.study.sample.entity.vo.DocumentVO;
import com.study.sample.service.DocumentService;
import com.study.sample.utils.DateParseUtil;
import com.study.sample.utils.EsClientUtil;
import lombok.extern.slf4j.Slf4j;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import javax.servlet.http.HttpServletRequest;
import java.io.IOException;
import java.util.*;

@Service
@Slf4j
public class DocumentServiceImpl implements DocumentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    //存储游标的集合
    private static final Map<String, Map<Integer, Object[]>> sortMap = new HashMap<>(256);

    @Override
    public DocumentVO deepSearchPage(DocumentDTO documentDTO, HttpServletRequest req) {
        String id = req.getSession().getId();
        //条件构造器
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        //返回的数据
        List<Document> documents = new ArrayList<>();
        //当前页
        int currentPageNum;
        //总数
        long total = 0;

        DocumentVO documentVO = new DocumentVO();

        if (StrUtil.isEmpty(id)) throw new RuntimeException("id不能为空");

        //页码和游标对应的集合
        Map<Integer, Object[]> pageMap = sortMap.get(id);

        //---------------设置查询条件start--------------
        //范围查询
        if (documentDTO.getStartTime() != null) {
            Date startDate = DateParseUtil.parseString2Date(documentDTO.getStartTime());
            boolQueryBuilder.filter(QueryBuilders.rangeQuery("createTime").gte(startDate));
        }
        if (documentDTO.getEndTime() != null) {
            Date endDate = DateParseUtil.parseString2Date(documentDTO.getEndTime());
            boolQueryBuilder.filter(QueryBuilders.rangeQuery("createTime").lte(endDate));
        }
        // 模糊查询
        if (documentDTO.getContent() != null) {
            // 同一字段在多个field里查询
            // boolQueryBuilder.filter(QueryBuilders.multiMatchQuery(documentDTO.getContent(), fields));

            boolQueryBuilder.must((QueryBuilders.wildcardQuery("content", documentDTO.getContent())));
        }
        if (documentDTO.getTitle() != null) {
            boolQueryBuilder.should((QueryBuilders.wildcardQuery("title", documentDTO.getTitle())));
        }
        //---------------设置查询条件end----------------

        /*首先不能是第一页，其次页码集合不能是空的，对应的页码也得在这个集合里，最后是当前页要小于此集合。
         * 我觉得重点在于最后一条，为什么一定要小于呢？因为当前页数=集合的容量，可以视为已经翻到了最后一页，那么我们要继续向后查询5页
         * 索引所以我们把这个边界处理放到了最后一层，本层只处理缓存有的游标，存在就放search after里查*/
        if (documentDTO.getPageNum() != 1 && MapUtil.isNotEmpty(pageMap)
                && pageMap.containsKey(documentDTO.getPageNum())
                && pageMap.size() > documentDTO.getPageNum()) {
            try {
                //构造查询条件
                searchSourceBuilder.query(boolQueryBuilder)
                        .sort("_id", SortOrder.DESC) //拿什么排序，游标就是什么
                        .size(documentDTO.getPageSize());
                //从缓存里拿到了当前页的游标---> 存放的时候就已经做了对应处理！！
                searchSourceBuilder.searchAfter(pageMap.get(documentDTO.getPageNum()));
                SearchRequest searchRequest2 = new SearchRequest("document")
                        .source(searchSourceBuilder);
                SearchResponse searchResponse2 = restHighLevelClient.search(searchRequest2, RequestOptions.DEFAULT);
                SearchHits searchHits = searchResponse2.getHits();
                if (searchHits.getTotalHits().value > 0) {
                    SearchHit[] hits = searchHits.getHits();
                    EsClientUtil.convertResult(documents, Document.class, hits);
                    total = searchHits.getTotalHits().value;
                }
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
        /*当pageNum=1的时候我就默认他刚接在或者已经刷新当然也有可能是从第2页回去之类的情况，但这里均不予考虑，只要是1就
         * 重新构造页标和游标的对应关系*/
        else if (documentDTO.getPageNum() == 1) {
            // 先移除
            sortMap.remove(id);
            // 上面被移除，pageMap更不可能获取到，这里必须自己初始化
            pageMap = new HashMap<>();
            //游标
            Object[] sortValues;
            //当前页
            currentPageNum = 1;
            //下一页
            int nextPageNum = currentPageNum + 1;

            try {
                searchSourceBuilder.query(boolQueryBuilder)
                        .sort("_id", SortOrder.DESC)
                        .from(0) //必须是0，不熟悉的朋友可能会觉得这里就可以循环，from从1开始就可以拿第二页，其实不行
                        //这样拿到的数据会有一点点错位，而easy-es这个框架是直接不允许深分页查询from > 0的
                        .size(documentDTO.getPageSize());
                SearchRequest searchRequest = new SearchRequest("document").source(searchSourceBuilder);
                SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
                SearchHit[] hits = searchResponse.getHits().getHits();
                if (hits.length != 0) {
                    //查询最后一个数据
                    SearchHit result = hits[hits.length - 1];
                    sortValues = result.getSortValues();
                    pageMap.put(1, new Object[]{}); // 第一页没有游标
                    pageMap.put(2, sortValues); //第一页的游标是去拿第二页的数据的，所以是2
                    EsClientUtil.convertResult(documents, Document.class, hits);
                    total = searchResponse.getHits().getTotalHits().value;
                }

                //向后获取5页的游标数据 所以你要品nextPageNum和currentPageNum的作用，就是处理游标和页码的对应关系的
                for (int i = nextPageNum; i < nextPageNum + 5; i++) {
                    //取出上一页的游标
                    searchSourceBuilder.searchAfter(pageMap.get(i));
                    SearchRequest searchRequest2 = new SearchRequest("document")
                            .source(searchSourceBuilder);
                    SearchResponse searchResponse2 = restHighLevelClient.search(searchRequest2, RequestOptions.DEFAULT);
                    SearchHits searchHits = searchResponse2.getHits();
                    if (searchHits.getTotalHits().value > 0) {
                        SearchHit[] nextHits = searchHits.getHits();
                        //当数据量不大的情况下且每页pageSize很大的话，他可能都没有5页，所以每次循环要判断，一旦
                        //不足就要终止,因为总数据已经不足分页了,在遍历就越界了
                        if (nextHits.length < documentDTO.getPageSize()) break;
                        SearchHit nextHit = nextHits[nextHits.length - 1];
                        sortValues = nextHit.getSortValues();
                        //从3开始 3/4/5/6/7
                        pageMap.put(i + 1, sortValues);
                    }
                }
            } catch (IOException e) {
                throw new RuntimeException(e);
            }

        }
        /*这里是边界，也就是当前端页面显示当前展示的最大页数到第7页了，而你的页码正好是7，那么就该继续向后拿后面的游标并且要和页码对应*/
        else if (pageMap.containsKey(documentDTO.getPageNum()) && pageMap.size() == documentDTO.getPageNum()) {
            searchSourceBuilder.query(boolQueryBuilder)
                    .sort("_id", SortOrder.DESC)
                    .size(documentDTO.getPageSize());
            currentPageNum = documentDTO.getPageNum();
            try {
                for (int i = currentPageNum; i < currentPageNum + 5; i++) {
                    //这里要知道当前页的游标在上面的集合里已经有了
                    searchSourceBuilder.searchAfter(pageMap.get(i));
                    SearchRequest searchRequest2 = new SearchRequest("document")
                            .source(searchSourceBuilder);
                    SearchResponse searchResponse2 = restHighLevelClient.search(searchRequest2, RequestOptions.DEFAULT);
                    SearchHits searchHits = searchResponse2.getHits();
                    total = searchHits.getTotalHits().value;

                    if (searchHits.getTotalHits().value > 0) {
                        SearchHit[] hits = searchHits.getHits();
                        //这里是数据边界的终止,上面已说
                        if (hits.length < documentDTO.getPageSize()) {
                            EsClientUtil.convertResult(documents, Document.class, hits);
                            break;
                        }
                        SearchHit result = hits[hits.length - 1];
                        Object[] sortValues = result.getSortValues();
                        //存放游标
                        pageMap.put(i + 1, sortValues);
                        //这里是拿出当前页的数据
                        if (i == documentDTO.getPageNum()) {
                            EsClientUtil.convertResult(documents, Document.class, hits);
                        }
                    }
                }
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
        documentVO.setPageNum(documentDTO.getPageNum());
        documentVO.setPageSize(documentDTO.getPageSize());
        documentVO.setTotal(total);
        documentVO.setData(documents);
        sortMap.put(id, pageMap);
        return documentVO;
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206

思路简述

其实从上面的代码加注释，我觉得你应该就可以理解了，思路只有一个那就是缓存游标，这里我有个三个判断，第一个判断是判断当前页数是不是已经在缓存里了，进了第一个说明是有的，就直接拿出游标查询并返给前端。
第二个判断是判断是不是初次加载，如果是就清掉之前缓存的游标集合，因为你要考虑数据增量的情况，如果你没有数据增量的情况甚至都不用按标记分，直接建立个游标缓存，什么时候有增量数据（比如那种一天一增），就什么时候删缓存。然后还要获取后五页的游标数据。
第三个判断是边界判断，主要任务有三个，第一个任务是获取后5页的游标，第二个任务是判断总数据是不是没得分了，第三个任务是拿到当前边界的数据

后续

首先是关于这个缓存的维护，比如session已经不再有效，怎么移除，其实我的项目里是还有个map的，他就是来实时更新这个session的最后查询时间的，可以通过定时任务，一旦超过一个时间点，就从sortMap移除。

其次关于时间的建议，我个人建议你在存时间的时候字段设置成Long型，就算不方便你也要一个Date类型一个Long型，Es读取出来的那个时间你不好转Date，所以建议用Long比较，建议你采纳我的建议！！

第三是谈浅分页，其实我们一开始也不是用深分这个方案的，而是通过限制数据的首次加载条数，我们后台逻辑处理好，尽量避免超出那个1w的限制。跟前端也说好，比如我每页50条数据，那么前端那边翻页的总页数就不能大于200，也不要展示总页数，也不要让前端弄那个尾页最大页的那个按钮，就让用户5页5页往后跳。如果你不能的客户不允许这样，那我这边建议你放弃es拥抱MySQL，两难自解！

第四条如果你可以随意选型的话，且你对传统的api不熟悉的话，建议你考虑easy-es，让你操作es如同操作关型数据库，且封装好了深分页查询。

第五是关于上面的代码逻辑，可能还会有漏洞，就是关于跳页的问题，我也许还没有处理的太成熟，但是前端那边老老实实jump的话应该不会出什么问题！

有问题可以联系707409741

相关阅读:
论文指标评价体系及权重计算
 《深入理解RPC框架原理与实现华钟明》使用Netty、Zookeeper等实现一个简单的RPC框架、自定义注解、SPI机制实践与原理分析
 A-Level经济真题每期一练（54）
基于遗传算法优化BP神经网络的滑坡稳定性预测，BP神经网络的详细原理
 JSR303参数校验与全局异常处理
 全真模拟题！PMP提分必练
 神经网络训练准确率不变,神经网络越训练越差
 Stream流使用——(未完)
11. [containerd] content服务解析
 初识Java篇
原文地址：https://blog.csdn.net/Curtisjia/article/details/127658239

ElasticSearch深度分页并可以小幅度跳页的实现

目录

背景

环境

代码

添加依赖

配置

创建实体

服务层

思路简述

后续