在开发中,我们写的应用程序通常需要依赖第三方的库(即程序中引入了既不在 org.apache.spark包,也不再语言运行时的库的依赖),我们就需要确保所有的依赖在Spark应用运行时都能被找到
注意:
提交应用时,绝不要把spark本身放在提交的依赖中。spark-submit会自动确保spark在你的程序的运行路径中
<repositories>
<repository>
<id>aliyunid>
<url>http://maven.aliyun.com/nexus/content/groups/public/url>
repository>
<repository>
<id>clouderaid>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/url>
repository>
<repository>
<id>jbossid>
<url>https://repository.jboss.com/nexus/content/groups/public/url>
repository>
repositories>
<properties>
<project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
<maven.compiler.source>8maven.compiler.source>
<maven.compiler.target>8maven.compiler.target>
<scala.version>2.12.15scala.version>
<scala.binary.version>2.12scala.binary.version>
<hadoop.version>3.1.3hadoop.version>
<spark.version>3.2.0spark.version>
<spark.scope>compilespark.scope>
properties>
<dependencies>
<dependency>
<groupId>org.scala-langgroupId>
<artifactId>scala-libraryartifactId>
<version>${scala.version}version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-core_${scala.binary.version}artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>org.apache.hadoopgroupId>
<artifactId>hadoop-clientartifactId>
<version>${hadoop.version}version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.pluginsgroupId>
<artifactId>maven-compiler-pluginartifactId>
<version>3.10.1version>
<configuration>
<source>${maven.compiler.source}source>
<target>${maven.compiler.target}target>
<encoding>${project.build.sourceEncoding}encoding>
configuration>
plugin>
plugins>
build>
<repositories>
<repository>
<id>aliyunid>
<url>http://maven.aliyun.com/nexus/content/groups/public/url>
repository>
<repository>
<id>clouderaid>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/url>
repository>
<repository>
<id>jbossid>
<url>https://repository.jboss.com/nexus/content/groups/public/url>
repository>
repositories>
<properties>
<maven.compiler.source>1.8maven.compiler.source>
<maven.compiler.target>1.8maven.compiler.target>
<scala.version>2.13.5scala.version>
<scala.binary.version>2.13scala.binary.version>
<spark.version>3.2.0spark.version>
<hadoop.version>3.1.3hadoop.version>
properties>
<dependencies>
<dependency>
<groupId>org.scala-langgroupId>
<artifactId>scala-libraryartifactId>
<version>${scala.version}version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-core_${scala.binary.version}artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>org.apache.hadoopgroupId>
<artifactId>hadoop-clientartifactId>
<version>${hadoop.version}version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.pluginsgroupId>
<artifactId>maven-assembly-pluginartifactId>
<version>3.0.0version>
<executions>
<execution>
<id>make-assemblyid>
<phase>packagephase>
<goals>
<goal>singlegoal>
goals>
execution>
executions>
plugin>
<plugin>
<groupId>net.alchim31.mavengroupId>
<artifactId>scala-maven-pluginartifactId>
<version>3.2.2version>
<executions>
<execution>
<goals>
<goal>compilegoal>
<goal>testCompilegoal>
goals>
execution>
executions>
plugin>
plugins>
build>
目前未使用,暂时未记录
当我们的Spark Application与Spark本身依赖于同一个库时可能会发生依赖冲突,导致程序崩溃。
依赖冲突通常表现为:
对于这类问题,主要的两种解决方式:
1)修改Spark Application,使其使用的依赖库版本与Spark所使用的相同
2)通常使用”shading“的方式打包我们的Spark Application