整合spark3.3.x和hive2.1.1-cdh6.3.2碰到个问题,就是spark官方支持的hive是2.3.x,但是cdh中的hive确是2.1.x的,项目中又计划用spark-thrift-server,导致编译过程中有部分报错。其中OperationLog这个类在hive2.3中新增加了几个方法,导致编译报错。这个时候有两种解决办法:
最终决定使用第二种方法,减少对源码的修改。
OperationLog类打包以后会出现在hive-exec-{version}.jar中,其实要做的事情很简单,就是删除这个jar中的OperationLog类,这里通过maven实现我知道的有如下两种实现办法:
上述两种方法各有优劣
第一种方法的pom中build部分如下
- <build>
- <plugins>
- <plugin>
- <groupId>org.apache.maven.pluginsgroupId>
- <artifactId>maven-dependency-pluginartifactId>
- <version>3.3.0version>
- <executions>
- <execution>
- <phase>packagephase>
- <goals>
- <goal>copy-dependenciesgoal>
- goals>
- <configuration>
- <outputDirectory>${project.build.directory}/liboutputDirectory>
-
- <excludeArtifactIds>hive-execexcludeArtifactIds>
- configuration>
- execution>
- executions>
- plugin>
- <plugin>
- <groupId>org.apache.maven.pluginsgroupId>
- <artifactId>maven-shade-pluginartifactId>
- <version>3.2.4version>
- <executions>
- <execution>
- <phase>packagephase>
- <goals>
- <goal>shadegoal>
- goals>
- <configuration>
- <shadedArtifactAttached>falseshadedArtifactAttached>
- <artifactSet>
- <includes>
- <include>org.apache.hive:hive-execinclude>
- includes>
- artifactSet>
- <filters>
- <filter>
- <artifact>org.apache.hive:hive-execartifact>
- <excludes>
- <exclude>META-INF/*.MFexclude>
- <exclude>META-INF/*.SFexclude>
- <exclude>META-INF/*.DSAexclude>
- <exclude>META-INF/*.RSAexclude>
- <exclude>org/apache/hadoop/hive/ql/session/OperationLog*exclude>
- excludes>
- filter>
- filters>
- configuration>
- execution>
- executions>
- plugin>
- plugins>
- build>
第二种方法的pom中build部分如下
- <build>
- <plugins>
- <plugin>
- <groupId>org.apache.maven.pluginsgroupId>
- <artifactId>maven-jar-pluginartifactId>
- <version>3.2.0version>
- <executions>
- <execution>
-
- <id>default-jarid>
- <phase>packagephase>
- <goals>
- <goal>jargoal>
- goals>
- execution>
- executions>
- <configuration>
-
- <excludes>**/OperationLog*.classexcludes>
- configuration>
- plugin>
- <plugin>
- <groupId>org.apache.maven.pluginsgroupId>
- <artifactId>maven-dependency-pluginartifactId>
- <version>3.3.0version>
- <executions>
- <execution>
- <phase>packagephase>
- <goals>
- <goal>copy-dependenciesgoal>
- goals>
- <configuration>
- <outputDirectory>${project.build.directory}/liboutputDirectory>
- configuration>
- execution>
- executions>
- plugin>
- <plugin>
- <groupId>org.apache.maven.pluginsgroupId>
- <artifactId>maven-antrun-pluginartifactId>
- <version>1.8version>
- <executions>
- <execution>
- <phase>packagephase>
- <goals>
- <goal>rungoal>
- goals>
- <configuration>
- <target>
- <echo message="Repackage hive-exec."/>
-
- <unjar src="${project.build.directory}/lib/hive-exec-${hive.version}.jar"
- dest="${project.build.directory}/exploded/hive-exec">
- <patternset>
- <exclude name="**/OperationLog*.class"/>
- patternset>
- unjar>
-
- <copy todir="${project.build.directory}/exploded/hive-exec">
- <fileset dir="${project.build.directory}/classes">
- <include name="**/OperationLog*.class" />
- fileset>
- copy>
-
- <jar destfile="${project.build.directory}/lib/hive-exec-${hive.version}.jar"
- basedir="${project.build.directory}/exploded/hive-exec"/>
- target>
- configuration>
- execution>
- executions>
- plugin>
- plugins>
- build>
虽然上述两种方法都能达到目的,第一种看着简洁一点,第二种看着操作复杂一点。不过我还是倾向于第二种,首先第二种不会减少和增加包的数量,包体积变化也不会太大,看着也更加符合预期的目的,其次第二种操作完以后看起来很简洁,更加适合强迫症或者代码洁癖患者。
参考链接: