根据查看spark sql源码(3.1.3)的源码,找到hive表输出文件压缩格式的设定方式:
| 文件输出格式 | 表属性 |
|---|---|
| text | compression |
| csv | compression > codec |
| json | compression |
| parquet | compression > parquet.compression |
| orc | compression > orc.compress |
| 文件输出格式 | 配置项 |
|---|---|
| orc | spark.sql.orc.compression.codec 可用值:"none", "uncompressed", "snappy", "zlib", "lzo" |
| parquet | spark.sql.parquet.compression.codec 可用值:"none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd" |
| orc,parquet以外 | hive.exec.compress.output 可用值:"true","false" 可用值:"RECORD","BLOCK","NONE" |
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.SaveAsHiveFile
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.HiveOptions
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.parquet.ParquetOptions
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.orc.OrcOptions
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.SaveAsHiveFile
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.text.TextFileFormat

spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.text.TextOptions
spark sql源码 (3.1.3):org.apache.spark.sql.catalyst.util.CompressionCodecs