• spark yarn集群遇到的问题与解决方法


    背景:

        已经存在如下环境:

            Ambari 2.7.5
            HDFS 3.2.1
            YARN 3.2.1
            HIVE 3.1.1 ON MapReduce2 3.2.2
        新安装了 Spark2 2.4.8
        提交spark任务到yarn集群,不能成功执行

    Spark2.4.8遇到的问题与处理方式

        错误:java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        原因:spark自带的guava库版本比较低
        处理方法:去除spark自带的旧版的guava库,使用hadoop自带的guava库,重启Spark2

    1. su spark
    2. cd /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars
    3. rm -f guava-14.0.1.jar
    4. ln -s /opt/redoop/apps/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar guava-27.0-jre.jar
    5. ll guava-27.0-jre.jar

        错误:java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.2.2
        原因:spark自带的hive库版本比较低
        处理方法:去除spark自带的旧版的hive库,使用hive的自己的库,重启Spark2

    1. mkdir -p /root/zsp/spark-2.4.8-bin-hadoop2.7/jars
    2. mv /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/hive*.jar /root/zsp/spark-2.4.8-bin-hadoop2.7/jars
    3. su spark
    4. cd /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars
    5. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-beeline-3.1.1.jar hive-beeline-3.1.1.jar
    6. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-cli-3.1.1.jar hive-cli-3.1.1.jar
    7. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-exec-3.1.1.jar hive-exec-3.1.1.jar
    8. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-jdbc-3.1.1.jar hive-jdbc-3.1.1.jar
    9. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-metastore-3.1.1.jar hive-metastore-3.1.1.jar
    10. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-shims-3.1.1.jar hive-shims-3.1.1.jar
    11. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-shims-common-3.1.1.jar hive-shims-common-3.1.1.jar
    12. ln -s /opt/redoop/apps/apache-hive-3.1.1-bin/lib/hive-shims-scheduler-3.1.1.jar hive-shims-scheduler-3.1.1.jar
    13. ll hive*

        错误:java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
        原因:spark-hive基于旧版的hive代码,在新版hive中已经不存在对应的字段
        处理方法:重新编译Spark之后,拿编译后的spark-hive_2.11-2.4.8.jar替换原来的jar,重启Spark2

    1. cd /usr/local/src
    2. tar -xvf spark-2.4.8.tgz
    3. cd /usr/local/src/spark-2.4.8
    4. 修改 /usr/local/src/spark-2.4.8/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
    5. 注释掉 HIVE_STATS_JDBC_TIMEOUT 和 HIVE_STATS_RETRIES_WAIT 所在行
    6. 修改 /usr/local/src/spark-2.4.8/pom.xml
    7. 加入依赖
    8. <dependency>
    9. <groupId>net.alchim31.maven</groupId>
    10. <artifactId>scala-maven-plugin</artifactId>
    11. <version>3.2.2</version>
    12. </dependency>
    13. ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Dhadoop.version=2.7.3 -DskipTests clean package
    14. su spark
    15. cp -f /usr/local/src/spark-2.4.8/sql/hive/target/spark-hive_2.11-2.4.8.jar /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/
    16. ll /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/spark-hive*
    17. 其他机器:
    18. scp root@master:/opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/spark-hive_2.11-2.4.8.jar /tmp/
    19. su spark
    20. cp -f /tmp/spark-hive_2.11-2.4.8.jar /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/
    21. ll /opt/redoop/apps/spark-2.4.8-bin-hadoop2.7/jars/spark-hive*

        错误:org.apache.hadoop.hive.ql.metadata.HiveException: InvalidObjectException(message:No such catalog spark)
        处理方法:spark配置目录里的hive-site.xml问题,使用hive的配置文件,重启Spark2

    1. su spark
    2. mv /etc/spark/hive-site.xml /etc/spark/hive-site.xml_bak
    3. cp -p /etc/hive/hive-site.xml /etc/spark/
    4. ll /etc/spark/


        错误:NoSuchMethodError: org.apache.hadoop.hive.ql.exec.Utilities.copyTableJobPropertiesToConf(Lorg/apache/hadoop/hive/ql/plan/TableDesc;Lorg/apache/hadoop/conf/Configuration;)V
        原因:spark jar包与低版本hive一起编译,导致编译的class文件入参类型JobConf转为父类了
        处理方法:重新编译Spark之后,拿编译后的spark-hive_2.11-2.4.8.jar替换原来的jar,重启Spark2
        此处问题没能解决,最终走向了升级spark

    升级到spark3.1.3

        spark文件升级

    1. cd /opt/redoop/apps
    2. wget --no-check-certificate https://dlcdn.apache.org/spark/spark-3.1.3/spark-3.1.3-bin-hadoop3.2.tgz
    3. tar -xvf spark-3.1.3-bin-hadoop3.2.tgz
    4. mv spark-3.1.3-bin-hadoop3.2/conf spark-3.1.3-bin-hadoop3.2/conf_bak
    5. ln -s /etc/spark spark-3.1.3-bin-hadoop3.2/conf
    6. chown -R spark:hadoop spark-3.1.3-bin-hadoop3.2
    7. rm spark
    8. ln -s spark-3.1.3-bin-hadoop3.2 spark
    9. su spark
    10. cd /opt/redoop/apps/spark/jars
    11. rm -f guava-14.0.1.jar
    12. ln -s /opt/redoop/apps/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar guava-27.0-jre.jar
    13. ll guava-27.0-jre.jar

        Ambari页面上修改spark的配置文件

             spark2-hive-site-override
                 metastore.catalog.default:spark->hive
               spark2-env content 追加
    ​​​​​​​             #spark3 needs
    ​​​​​​​             export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)

     

        错误:java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
        原因:spark sql输出时使用了LZO压缩,没有找到对应的库
        处理方法:安装LZO,重启HDFS

    1. 1. 安装lzop
    2. sudo yum -y install lzop
    3. 2. 安装lzo
    4. 下载
    5. cd /usr/local/src
    6. wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
    7. 编译
    8. tar -zxvf lzo-2.10.tar.gz
    9. cd lzo-2.10
    10. export CFLAGS=-m64
    11. ./configure -enable-shared
    12. make
    13. sudo make install
    14. 编辑 lzo.conf 文件
    15. sudo vi /etc/ld.so.conf.d/lzo.conf
    16. 在里面写入 /usr/local/lib
    17. sudo /sbin/ldconfig -v
    18. rm -rf lzo-2.10
    19. 3. 安装Hadoop-LZO
    20. 先下载 https://github.com/twitter/hadoop-lzo
    21. 编译
    22. cd /usr/local/src/hadoop-lzo-master
    23. export CFLAGS=-m64
    24. export CXXFLAGS=-m64
    25. export C_INCLUDE_PATH=/usr/local/include/lzo
    26. export LIBRARY_PATH=/usr/local/lib
    27. sudo yum install maven
    28. mvn clean package -Dmaven.test.skip=true
    29. 复制文件
    30. cp target/native/Linux-amd64-64/lib/* /opt/redoop/apps/hadoop/lib/native/
    31. cp target/hadoop-lzo-0.4.21-SNAPSHOT.jar /opt/redoop/apps/hadoop/share/hadoop/common/lib/
    32. cp target/hadoop-lzo-0.4.21-SNAPSHOT-javadoc.jar /opt/redoop/apps/hadoop/share/hadoop/common/lib/
    33. cp target/hadoop-lzo-0.4.21-SNAPSHOT-sources.jar /opt/redoop/apps/hadoop/share/hadoop/common/lib/

        错误:WARN HdfsUtils: Unable to inherit permissions for file
        处理方法:代码中设置hive参数 hive.warehouse.subdir.inherit.perms:false,关闭hive的文件权限继承来规避该问题

        错误:java.lang.IllegalStateException: User did not initialize spark context
        处理方法:打包前注释掉代码中的.master("local[*]")

        错误:Class path contains multiple SLF4J bindings.
        处理方法:把spark下的slf4j-log4j12包重命名,重启spark
      mv /opt/redoop/apps/spark/jars/slf4j-log4j12-1.7.30.jar /opt/redoop/apps/spark/jars/slf4j-log4j12-1.7.30.jar_bak

        Ambari页面上显示组件心跳丢失
              集群节点中重启ambari-agent

    1. service ambari-agent status
    2. service ambari-agent restart

  • 相关阅读:
    Joplin Typora 粘贴图片 | 当使用Typora作为Joplin编辑器时,如何粘贴图片并上传到Joplin服务器,替换链接
    Java编程:实现控制台输入一个整数n,输出如下图形。(三角形)
    如果你要去拜访国外客户需要做哪些准备
    第八章 指针1
    数据结构之顺序表和链表
    基于QT的tensorRT加速的yolov5
    如何让照片动起来?几个制作方法和注意事项分享
    数据结构——排序算法——希尔排序
    Java 进阶书籍
    AFG EDI 解决方案
  • 原文地址:https://blog.csdn.net/langzitianya/article/details/127717794