• hive 之select 中文乱码


    此处的中文乱码和mysql的库表 编码 latin utf 无关。

    直接上案例。

    有时候我们需要自定义一列,有时是汉字有时是字母,结果遇到这种情况了。

    说实话看到这真是糟心。这谁受得了。

    单独select 没有任何问题。

    这是怎么回事呢? 经过一番检查,发现有个地方类似与 "境内"  as col但是没乱码,

    此时怀疑就是if 函数起了作用,但是一时间不知道是为啥。。

    经过多方面测试 concat("境内") concat_ws("","境内")没用,

    concat_ws("",arrary("境内")) 有用,此时也不知道如何下手,只有掏出大杀器 explain.

    起作用的

    Plan optimized by CBO.
    ""
    Vertex dependency in root stage
    Map 1 <- Map 3 (BROADCAST_EDGE)
    Reducer 2 <- Map 1 (SIMPLE_EDGE)
    ""
    Stage-0
      Fetch Operator
        limit:-1
        Stage-1
          Reducer 2
          File Output Operator [FS_14]
            Select Operator [SEL_13] (rows=105 width=273)
    "          Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4"",""_col5"",""_col6"",""_col7""]"
              Group By Operator [GBY_12] (rows=105 width=273)
    "            Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:KEY._col0, KEY._col1, KEY._col2, KEY._col3, KEY._col4"
              <-Map 1 [SIMPLE_EDGE] vectorized
                SHUFFLE [RS_28]
    "              PartitionCols:_col0, _col1, _col2, _col3, _col4"
                  Group By Operator [GBY_27] (rows=211 width=273)
    "                Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:_col1, _col2, _col3, _col4, _col5"
                    Map Join Operator [MAPJOIN_26] (rows=211 width=273)
    "                  Conds:SEL_25._col0=RS_23._col0(Inner),Output:[""_col1"",""_col2"",""_col3"",""_col4"",""_col5""]"
                    <-Map 3 [BROADCAST_EDGE] vectorized
                      BROADCAST [RS_23]
                        PartitionCols:_col0
                        Select Operator [SEL_22] (rows=1 width=736)
    "                      Output:[""_col0"",""_col1"",""_col2"",""_col3""]"
                          Filter Operator [FIL_21] (rows=1 width=736)
                            predicate:bank_code is not null
                            TableScan [TS_3] (rows=1 width=736)
    "                          dwapsdata@dw_conf_ce_bank_dict_v,t1,Tbl:COMPLETE,Col:NONE,Output:[""bank_code"",""bank_name"",""bank_short_name"",""bank_onshore_flag""]"
                    <-Select Operator [SEL_25] (rows=192 width=273)
    "                    Output:[""_col0"",""_col1""]"
                        Filter Operator [FIL_24] (rows=192 width=273)
                          predicate:bank_code is not null
                          TableScan [TS_0] (rows=192 width=273)
    "                        dwdmdata@dm_ce_f_portrait_credit_line,t,Tbl:COMPLETE,Col:COMPLETE,Output:[""bank_code""]"
    ""

    没有作用的

    Plan optimized by CBO.
    ""
    Vertex dependency in root stage
    Map 1 <- Map 3 (BROADCAST_EDGE)
    Reducer 2 <- Map 1 (SIMPLE_EDGE)
    ""
    Stage-0
      Fetch Operator
        limit:-1
        Stage-1
          Reducer 2 vectorized
          File Output Operator [FS_31]
            Select Operator [SEL_30] (rows=105 width=273)
    "          Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4"",""_col5"",""_col6""]"
              Group By Operator [GBY_29] (rows=105 width=273)
    "            Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:KEY._col0, KEY._col1, KEY._col2, KEY._col3, KEY._col4"
              <-Map 1 [SIMPLE_EDGE] vectorized
                SHUFFLE [RS_28]
    "              PartitionCols:_col0, _col1, _col2, _col3, _col4"
                  Group By Operator [GBY_27] (rows=211 width=273)
    "                Output:[""_col0"",""_col1"",""_col2"",""_col3"",""_col4""],keys:_col1, _col2, _col3, _col4, _col5"
                    Map Join Operator [MAPJOIN_26] (rows=211 width=273)
    "                  Conds:SEL_25._col0=RS_23._col0(Inner),Output:[""_col1"",""_col2"",""_col3"",""_col4"",""_col5""]"
                    <-Map 3 [BROADCAST_EDGE] vectorized
                      BROADCAST [RS_23]
                        PartitionCols:_col0
                        Select Operator [SEL_22] (rows=1 width=736)
    "                      Output:[""_col0"",""_col1"",""_col2"",""_col3""]"
                          Filter Operator [FIL_21] (rows=1 width=736)
                            predicate:bank_code is not null
                            TableScan [TS_3] (rows=1 width=736)
    "                          dwapsdata@dw_conf_ce_bank_dict_v,t1,Tbl:COMPLETE,Col:NONE,Output:[""bank_code"",""bank_name"",""bank_short_name"",""bank_onshore_flag""]"
                    <-Select Operator [SEL_25] (rows=192 width=273)
    "                    Output:[""_col0"",""_col1""]"
                        Filter Operator [FIL_24] (rows=192 width=273)
                          predicate:bank_code is not null
                          TableScan [TS_0] (rows=192 width=273)
    "                        dwdmdata@dm_ce_f_portrait_credit_line,t,Tbl:COMPLETE,Col:COMPLETE,Output:[""bank_code""]"
    ""

    对比发现

    vectorzied 这个单词一出来我就知道怎么回事了。

    hive decimal bug, nvl(decimal,1)=0_cclovezbf的博客-CSDN博客

    这个b参数好处没体会到一点,bug到是一堆。

    set hive.vectorized.execution.enabled=false; 即可解决中文乱码问题!!!!!!!

    其实还有别的办法,但是和concat_ws(array(""))一样比较丑陋,我就不说了

  • 相关阅读:
    [免费专栏] Android安全之ADB常用命令
    写了6年SQL,推荐快速上手MySQL 的SQL语句
    Linux 进程终止和等待
    保研/考研简历中数模竞赛的经历,就该这样写
    【深度学习 AIGC】stable diffusion webUI 使用过程,参数设置,教程,使用方法
    联邦学习开源框架方案选型
    神经网络的敏感性分析,性神经敏感度如何测试
    Idea导入maven工程后右键没有Maven选项
    【PAT甲级】1131 Subway Map
    Java开发面试常见问题总结
  • 原文地址:https://blog.csdn.net/cclovezbf/article/details/133792945