SparkML_lr_train :读取py处理后的train表用于训练,将训练模型保存好。
SparkML_lr_predict :读取训练好的模型,读取py处理后的test表用于预测。将预测结果写入normal_data中,根据id修改stream_is_normal的值。
提交spark任务
bin/spark-submit \
--class SparkML_lr_train \
--master yarn \
--deploy-mode cluster \
./SparkML_lr_train1.jar \
10
bin/spark-submit \
--class SparkML_lr_train \
--master yarn \
--deploy-mode client \
./SparkML_lr_train4.jar \
10
bin/spark-submit \
--class SparkML_lr_predict \
--master yarn \
--deploy-mode client \
./SparkML_lr_predict.jar \
10
bin/spark-submit \
--class lr_train\
--master yarn \
--deploy-mode client \
./lr_train.jar \
10
bin/spark-submit \
--class lr_predict\
--master yarn \
--deploy-mode client \
./lr_predict.jar \
10
启动hadoop(启动脚本)
hdp.sh start
启动spark(命令行启动)
sbin/start-all.sh
bin/spark-submit \
--class SparkSQL_lr_train \
--master yarn \
--deploy-mode client \
./SparkSQL_lr_train.jar \
10
bin/spark-submit \
--class SparkML_lr_predict \
--master yarn \
--deploy-mode client \
./SparkML_lr_predict.jar \
10
{
"file": "hdfs://hadoop102:8020/spark_jar/SparkSQL_lr_train1.jar",
"className": "SparkSQL_lr_train",
"driverMemory": "1g",
"executorMemory": "1g",
"numExecutors": 1,
"driverCores": 1,
"executorCores": 1,
"conf":{
"spark.master":"yarn",
"deploy-mode":"client "
},
"args": ["10"]
}
{
"file": "hdfs://hadoop102:8020/spark_jar/SparkML_lr_predict1.jar",
"className": "SparkML_lr_predict",
"driverMemory": "1g",
"executorMemory": "1g",
"numExecutors": 1,
"driverCores": 1,
"executorCores": 1,
"conf":{
"spark.master":"yarn",
"deploy-mode":"client "
},
"args": ["10"]
}