• Xgboost系列-XGB实际参数调优指南附源码


     Xgboost模型在机器学习、深度学习中经久不衰,不论是分类还是回归任务都是一个不错的baseline甚至最终可用的模型,XGB对任务的普适性也决定了其具有大量的可调节参数,针对同一个任务,不同的参数设置可能带来不同甚至相差甚远的性能结果,因为寻找当前任务下可用、有效的参数是一个必不可少的过程,在上一篇文章XGB系列-XGB参数指南_wwlsm_zql的博客-CSDN博客在运行 XGBoost 之前,我们必须设置三种类型的参数: 通用参数、提升参数和任务参数。本文提供了对XGB模型的全部参数的介绍,用于指导对参数的选择https://blog.csdn.net/wwlsm_zql/article/details/126192959介绍了XGB的所有参数,针对如果繁多的参数,试探枚举是一个非常庞大的工作量,因此本文介绍通过hyperopt实现自动参数寻优,找到适合自己任务的最佳参数。

    代码链接:colab代码https://colab.research.google.com/drive/1dm3Bk0VlEuBed8FMMeoZGWUR8xY84Ho9#scrollTo=ILPR3vXWdAvY

    安装依赖的包

    !pip install xgboost sklearn hyperopt

    导入基本库

    1. # 导入基本包
    2. import pandas as pd
    3. import numpy as np
    4. import xgboost as xgb
    5. from sklearn.metrics import accuracy_score
    6. from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
    7. from sklearn.model_selection import train_test_split

    加载数据,并拆分

    1. df = pd.read_csv("drive/MyDrive/data_daily/Wholesalecustomersdata.csv")
    2. x = df.drop('Channel', axis=1)
    3. y = df['Channel']
    4. """将分类任务转换为0-1"""
    5. y[y == 2] = 0
    6. y[y == 1] = 1
    7. """切分数据集"""
    8. X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

    使用优化器进行参数寻优

    1. 定义参数空间,指定参数的所有候选空间
    2. 定义训练过程和评估的目标(损失函数)
    3. 执行寻优过程
    4. 获取最优的参数组合

    初始化参数空间

    The available hyperopt optimization algorithms are -

    • hp.choice(label, options) — Returns one of the options, which should be a list or tuple.

    • hp.randint(label, upper) — Returns a random integer between the range [0, upper).

    • hp.uniform(label, low, high) — Returns a value uniformly between low and high.

    • hp.quniform(label, low, high, q) — Returns a value round(uniform(low, high) / q) * q, i.e it rounds the decimal values and returns an integer.

    • hp.normal(label, mean, std) — Returns a real value that’s normally-distributed with mean and standard deviation sigma.

    1. space={'max_depth': hp.quniform("max_depth", 3, 18, 1),
    2. 'gamma': hp.uniform ('gamma', 1,9),
    3. 'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
    4. 'reg_lambda' : hp.uniform('reg_lambda', 0,1),
    5. 'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
    6. 'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
    7. 'n_estimators': 180,
    8. 'seed': 0
    9. }

    定义优化目标

    1. def objective(space):
    2. clf=xgb.XGBClassifier(
    3. n_estimators =space['n_estimators'], max_depth = int(space['max_depth']), gamma = space['gamma'],
    4. reg_alpha = int(space['reg_alpha']),min_child_weight=int(space['min_child_weight']),
    5. colsample_bytree=int(space['colsample_bytree']))
    6. evaluation = [( X_train, y_train), ( X_test, y_test)]
    7. clf.fit(X_train, y_train,
    8. eval_set=evaluation, eval_metric="auc",
    9. early_stopping_rounds=10,verbose=False)
    10. pred = clf.predict(X_test)
    11. accuracy = accuracy_score(y_test, pred>0.5)
    12. print ("SCORE:", accuracy)
    13. return {'loss': -accuracy, 'status': STATUS_OK }

    寻优过程

    1. trials = Trials()
    2. best_hyperparams = fmin(fn = objective,
    3. space = space,
    4. algo = tpe.suggest,
    5. max_evals = 100,
    6. trials = trials)

    打印结果

    • Here best_hyperparams gives us the optimal parameters that best fit model and better loss function value.

    • trials is an object that contains or stores all the relevant information such as hyperparameter, loss-functions for each set of parameters that the model has been trained.

    • 'fmin' is an optimization function that minimizes the loss function and takes in 4 inputs - fn, space, algo and max_evals.

    • Algorithm used is tpe.suggest.

    1. print("The best hyperparameters are : ","\n")
    2. print(best_hyperparams)

  • 相关阅读:
    2022牛客多校10 I-Yet Another FFT Problem?(鸽巢原理)
    牛客算法課 (算法入門班) 二分, 三分, 01分數規劃
    Hbuilder开发运行真机上“同步资源失败,未得到同步资源的授权...” 错误解决
    ESP32 使用 LVGL 的简单介绍(ESP32 for Arduino)
    【python基础】函数-参数形式
    openfoam 智能指针探索
    MySQL-基础
    RxJava的前世【RxJava系列之设计模式】
    剑指Offer10- I. 斐波那契数列
    Databend 源码阅读系列(一): 开篇
  • 原文地址:https://blog.csdn.net/wwlsm_zql/article/details/126193231