• ES索引Json格式字段设计


    properties

    https://www.elastic.co/guide/en/elasticsearch/reference/8.8/properties.html

    嵌套对象

    https://www.elastic.co/guide/en/elasticsearch/reference/8.8/object.html

    创建索引
    curl --location --request PUT 'https://myhost/order_index' \
    --header 'Content-Type: application/json' \
    --data '{
        "mappings": {
            "dynamic": "false",
            "properties": {
                "orderId": {
                    "type": "keyword"
                },
                "orderItems" : {
                	"properties": {
                		"itemType": {
                			"type": "keyword"
                		},
                		"itemName": {
                			"type": "keyword"
                		}
                	}
                }
            }
        }
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    写数据
    curl --location --request PUT 'https://myhost/order_index/_doc/1' \
    --header 'Content-Type: application/json' \
    --data '{
        "orderId": "1_1",
        "orderItems": [{
            "itemType": "food",
            "itemName": "egg"
        },
        {
            "itemType": "clothes",
            "itemName": "T-shirt"
        }
        ]
    }'
    
    curl --location --request PUT 'https://myhost/order_index/_doc/2' \
    --header 'Content-Type: application/json' \
    --data '{
        "orderId": "2_2",
        "orderItems": [{
            "itemType": "food",
            "itemName": "pork"
        },
        {
            "itemType": "poultryEggs",
            "itemName": "egg"
        }
        ]
    }'
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    读数据
    curl --location --request GET 'https://myhost/order_index/_search' \
    --header 'Content-Type: application/json' \
    --data '{
        "query":{
            "bool":{
                "must":[
                    {
                        "match":{
                            "orderItems.itemType":"food"
                        }
                    },
                    {
                        "match":{
                            "orderItems.itemName":"egg"
                        }
                    }
                ]
            }
        }
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    查询结果

    {
        "took": 763,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": {
                "value": 2,
                "relation": "eq"
            },
            "max_score": 0.723315,
            "hits": [
                {
                    "_index": "order_index",
                    "_type": "_doc",
                    "_id": "2",
                    "_score": 0.723315,
                    "_source": {
                        "orderId": "2_2",
                        "orderItems": [
                            {
                                "itemType": "food",
                                "itemName": "pork"
                            },
                            {
                                "itemType": "poultryEggs",
                                "itemName": "egg"
                            }
                        ]
                    }
                },
                {
                    "_index": "order_index",
                    "_type": "_doc",
                    "_id": "1",
                    "_score": 0.723315,
                    "_source": {
                        "orderId": "1_1",
                        "orderItems": [
                            {
                                "itemType": "food",
                                "itemName": "egg"
                            },
                            {
                                "itemType": "clothes",
                                "itemName": "T-shirt"
                            }
                        ]
                    }
                }
            ]
        }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    优劣势
    1. 劣势:item对象中字段之间的映射关系丢弃了(官方文档有说明:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html),通过两个字段检索数据得到的结果是“or”的关系,例如:案例中错误的检索结果:“orderItems.itemType”:“food” 或者 “orderItems.itemType”:“food”
    2. 优势:底层实现简单,而且针对一对一场景,上述缺点不存在,即:案例中可以得到正确的结果 “orderItems.itemType”:“food” 并且 “orderItems.itemType”:“food”

    tips:如果数据库中order与orderItems存储关系是一对一,那么使用嵌套对象没有任何问题。如果是一对多则检索会有问题,则只能选择下面的方案

    数据如何扁平化处理的?
    {
      "group" :        "fans",
      "user.first" : [ "alice", "john" ],
      "user.last" :  [ "smith", "white" ]
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5

    嵌套文档

    https://www.elastic.co/guide/en/elasticsearch/reference/8.8/nested.html

    创建索引
    curl --location --request PUT 'https://myhost/order_index' \
    --header 'Content-Type: application/json' \
    --data '{
        "mappings": {
            "dynamic": "false",
            "properties": {
                "orderId": {
                    "type": "keyword"
                },
                "orderItems" : {
                	"type": "nested",
                	"properties": {
                		"itemType": {
                			"type": "keyword"
                		},
                		"itemName": {
                			"type": "keyword"
                		}
                	}
                }
            }
        }
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    写数据

    同嵌套对象

    读数据
    curl --location --request GET 'https://myhost/order_index/_search' \
    --header 'Content-Type: application/json' \
    --data '{
        "query": {
            "nested": {
                "path": "orderItems",
                "query": {
                    "bool": {
                        "must": [
                            {
                                "match": {
                                    "orderItems.itemType": "food"
                                }
                            },
                            {
                                "match": {
                                    "orderItems.itemName": "egg"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    查询结果

    {
        "took": 580,
        "timed_out": false,
        "_shards": {
            "total": 5,
            "successful": 5,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": {
                "value": 1,
                "relation": "eq"
            },
            "max_score": 1.3862942,
            "hits": [
                {
                    "_index": "order_index",
                    "_type": "_doc",
                    "_id": "1",
                    "_score": 1.3862942,
                    "_source": {
                        "orderId": "1_1",
                        "orderItems": [
                            {
                                "itemType": "food",
                                "itemName": "egg"
                            },
                            {
                                "itemType": "clothes",
                                "itemName": "T-shirt"
                            }
                        ]
                    }
                }
            ]
        }
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    优劣势

    优势:可以解决嵌套对象的局限性,并且数据更精简,文档是独立存储的,不会冗余存储多份
    劣势:无法支持多级嵌套,例如:一个问题对应子级答案,答案对应子级投票(ES官方不建议使用该功能)。无法支持一个父多个子场景,例如:一个问题对应答案+评论
    例如如果存储多层级document会报错

    curl --location --request PUT 'https://myhost/order_index/_doc/3' \
    --header 'Content-Type: application/json' \
    --data '{
        "orderId": "3_3",
        "orderItems": [{
            "itemType": {
            	"myType": 1
            },
            "itemName": "egg"
        }
        ]
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    报错

    {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'"}],"type":"mapper_parsing_exception","reason":"failed to parse field [orderItems.itemType] of type [keyword] in document with id '3'. Preview of field's value: '{myType=1}'","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 4:21"}},"status":400}
    
    • 1
    数据如何存储的?

    官方文档中有描述,嵌套文档是独立的document,因此上述案例中实际底层存储了6个document,使用cat名称可以看到实际也是如此
    截屏2023-06-27 13.51.43.png

    Join

    官方文档案例很详细,不再搬运了,详情参考:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/parent-join.html

    优劣势

    优势:可以支持一个父document对应多个子document(数据结构不一样的子文档),也可以支持多级关系(官方不推荐使用)
    劣势:join实现成本高,需要花费更高的服务资源,cpu,内存

    Ingest Pipeline

    成本较高,需要增加Ingest节点,本文不再关注,详情参数官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/json-processor.html

    总结

    1. properties嵌套对象,基本满足的大部分需求,并且实现简单
    2. properties嵌套索引,查询用法会有些差异,如果子properties对于数组json的检索场景要求对象字段之间匹配关系是与运算关系(and关系),则需要使用该类型
    3. 子properties的字段可以动态添加,如果dynamic=false新增的字段不能用于检索,新增字段想要用于检索需要reindex,如果dynamic=true新增的字段可以用于检索,例如:下面案例,新增itemId字段可以直接用检索,dynamic=true时。
    curl --location --request PUT 'https://myhost/order_index/_doc/3' \
    --header 'Content-Type: application/json' \
    --data '{
        "orderId": "2_1",
        "orderItems": [{
            "itemType": "food",
            "itemName": "pork",
            "itemId": 2
        }]
    }'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    注意,子对象如果没有指定dynamic将继承父的设置,详情可以参考官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/8.8/dynamic.html

  • 相关阅读:
    软件测试SQL面试题(初)
    vivado产生报告阅读分析-Report Power4
    Android Aidl跨进程通讯(三)--进阶使用
    论文解读(SentiX)《SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis》
    回归算法的评估指标
    Redis 客户端缓存
    spring boot 将配置文件信息 赋值到类注解
    Django 4.1 可以做什么?
    java File类判断及获取功能
    C++设计模式之模板方法模式
  • 原文地址:https://blog.csdn.net/u010597819/article/details/131748186