• hive和presto的求数组长度函数区别及注意事项


    1、任务

    获取邮箱字符串’@'后字符串 ,求长度

    2、hive & spark-sql 求数组长度的函数 size

    1. hive & spark-sql 求数组长度的函数 size
    2. select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
    3. FROM
    4. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    5. select size(split(email, '@')),split(email, '@'),split(email, '@')[0],split(email, '@')[1]
    6. FROM
    7. (select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;
    8. 2 ["tom","126.com.cn"] tom 126.com.cn
    9. 2 ["jack","126.com"] jack 126.com
    10. Time taken: 0.723 seconds, Fetched 2 row(s)

    3、presto  求数组长度的函数 cardinality

    1. presto 求数组长度的函数 cardinality
    2. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    3. FROM
    4. (select 'jack@126.com' as email union select 'tom@126.com.cn' as email) tb_mid;
    5. _col0 | _col1 | _col2 | _col3
    6. -------+-------------------+-------+------------
    7. 2 | [tom, 126.com.cn] | tom | 126.com.cn
    8. 2 | [jack, 126.com] | jack | 126.com
    9. (2 rows)
    10. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    11. FROM
    12. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    13. Query 20231019_070945_20009_n9u2s failed: line 3:9: Column 'jack@126.com' cannot be resolved
    14. select cardinality(split(email, '@')),split(email, '@'),split(email, '@')[1],split(email, '@')[2]
    15. FROM
    16. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

    4、注意事项

    1)、在计算数组长度的时候,hive和presto的函数不同
      其中hive的size函数默认数组的下标从0开始
      presto的cardinality函数默认数组的下标从1开始

    2)、presto 不支持双引号 ,而hive 既支持单引号,也支持双引号

    1. presto> SELECT
    2. -> email,
    3. -> (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
    4. -> FROM
    5. -> (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid;
    6. Query 20231016_070153_17958_p9f2s failed: line 5:9: Column 'jack@126.com' cannot be resolved
    7. SELECT
    8. email,
    9. (case when cardinality(split(email, '@')) = 2 then split(email, '@')[1] else '' end ) as email_suffix
    10. FROM
    11. (select "jack@126.com" as email union select "tom@126.com.cn" as email) tb_mid

  • 相关阅读:
    lintcode 540 · 左旋右旋迭代器 【算法 中等 迭代器】
    NSS [SWPUCTF 2021 新生赛]PseudoProtocols
    离散数学 --- 特殊图 --- 偶图与平面图
    SpringBoot-Mongodb工具类、字段转换
    动态规划——01背包
    object property order
    电脑小技巧45个
    Redis-实战篇-实现商铺缓存与数据库的双写一致(超时剔除和主动更新)
    每日一题:vue3自定义指令大全(呕心沥血所作,附可运行项目源码)
    Jmeter接口自动化测试 —— Jmeter断言之Json断言!
  • 原文地址:https://blog.csdn.net/helloxiaozhe/article/details/133927849