【算法】第二代遗传算法NSGA-II优化SVR超参数模型

第二代遗传算法NSGA-II优化SVR超参数模型

一、NSGA-II介绍
二、建模目的
三、NSGA-II优化SVR超参数模型
四、模型测试

一、NSGA-II介绍

NSGA-II（Non-dominated Sorting Genetic Algorithm II）是一种多目标优化算法，用于解决具有多个冲突目标的优化问题。它通过模拟进化过程中的自然选择和遗传操作，逐步改进种群中的解，以找到一组尽可能好的解，这些解在多个目标下都是非支配的。

二、建模目的

用NSGA-II实现对SVR超参数的寻优，找到SVR最优的超参数C，输出对应的评价指标MSE，超参数范围设置如下：

超参数C范围(0.01, 10)
迭代次数5
种群大小5

超参数范围、迭代次数、种群大小可自定义

三、NSGA-II优化SVR超参数模型

3.1 超参数设置

首先，以全局变量的形式进行超参数的设置，代码如下：

# 设置参数
pop_size = 5  # 种群大小
gen_size = 5  # 进化代数
pc = 1  # 交叉概率
pm = 0.3  # 变异概率
num_obj = 1  # 目标函数个数
x_range = (0.01, 10)  # 自变量取值范围
1
2
3
4
5
6
7

3.2 导入数据集

其次，采用read_excel读取excel的方式导入数据集，并划分训练集和测试接，代码如下：

data = pd.read_excel('C:/Users/孙海涛/Desktop/x.xlsx', sheet_name='Sheet1')  # 读取数据
target = pd.read_excel('C:/Users/孙海涛/Desktop/y.xlsx', sheet_name='Sheet1')  # 读取数据
x_train, x_test, y_train, y_test = train_test_split(data, target, random_state=22, test_size=0.25)
1
2
3

3.3 模型搭建

实现第二代遗传算法NSGA-II优化SVR超参数模型的编写与封装。

3.3.1 定义自变量的类

# 定义自变量的类
class Individual:
    def __init__(self, x):
        self.x = x
        self.objs = [None] * num_obj
        self.rank = None
        self.distance = 0.0

    # 计算目标函数的值
    def evaluate(self):
        c = self.x
        model_svr = SVR(C=c)
        model_svr.fit(x_train, y_train)
        predict_results = model_svr.predict(x_test)
        #rmse
        self.objs[0] =np.sqrt(mean_squared_error(y_test, predict_results))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

3.3.2 初始化种群

# 初始化种群
pop = [Individual(random.uniform(*x_range)) for _ in range(pop_size)]
1
2

3.3.3 进化

进化包括计算目标函数值、非支配排序、计算拥挤度距离、交叉、变异等操作，整合后的代码如下：

# 进化
for _ in range(gen_size):
    print(f"第{_}次迭代")
    # 计算目标函数的值
    for ind in pop:
        ind.evaluate()
 
    # 非支配排序
    fronts = [set()]
    for ind in pop:
        ind.domination_count = 0
        ind.dominated_set = set()
 
        for other in pop:
            if ind.objs[0] < other.objs[0] :
                ind.dominated_set.add(other)
            elif ind.objs[0] > other.objs[0] :
                ind.domination_count += 1
 
        if ind.domination_count == 0:
            ind.rank = 1
            fronts[0].add(ind)
 
    rank = 1
    while fronts[-1]:
        next_front = set()
 
        for ind in fronts[-1]:
            ind.rank = rank
 
            for dominated_ind in ind.dominated_set:
                dominated_ind.domination_count -= 1
 
                if dominated_ind.domination_count == 0:
                    next_front.add(dominated_ind)
 
        fronts.append(next_front)
        rank += 1
 
    # 计算拥挤度距离
    pop_for_cross=set()
    for front in fronts:
        if len(front) == 0:
            continue
 
        sorted_front = sorted(list(front), key=lambda ind: ind.rank)
        for i in range(num_obj):
            sorted_front[0].objs[i] = float('inf')
            sorted_front[-1].objs[i] = float('inf')
            for j in range(1, len(sorted_front) - 1):
                delta = sorted_front[j + 1].objs[i] - sorted_front[j - 1].objs[i]
                if delta == 0:
                    continue
 
                sorted_front[j].distance += delta / (x_range[1] - x_range[0])
 
        front_list = list(sorted_front)
        front_list.sort(key=lambda ind: (-ind.rank, -ind.distance))
        selected_inds =front_list
        if len(pop_for_cross) + len(selected_inds)<=pop_size:
            pop_for_cross.update(selected_inds)
        elif len(pop_for_cross)+len(selected_inds)>=pop_size and len(pop_for_cross)<pop_size:
            part_selected_inds=selected_inds[:(pop_size-len(pop_for_cross))]
            pop_for_cross.update(part_selected_inds)
            break
    # 交叉
    new_pop=set()
    while len(new_pop) < len(pop_for_cross):
        x1, x2 = random.sample(pop_for_cross, 2)
        if random.random() < pc:
            new_x = (x1.x + x2.x) / 2
            delta_x = abs(x1.x - x2.x)
            new_x += delta_x * random.uniform(-1, 1)
            new_x = max(x_range[0], min(x_range[1], new_x))
            new_pop.add(Individual(new_x))
 
    # 变异
    for ind in new_pop:
        if random.random() < pm:
            delta_x = random.uniform(-1, 1) * (x_range[1] - x_range[0])
            ind.x += delta_x
            ind.x = max(x_range[0], min(x_range[1], ind.x))
 
    # 更新种群,把原来的精英（pop_for_cross）保留下来
    pop = list(new_pop)+list(pop_for_cross)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

3.3.4 输出最优解集合

# 输出最优解集合
for ind in pop:
    ind.evaluate()
 
pareto_front = set()
for ind in pop:
    dominated = False
    for other in pop:
        if other.objs[0] < ind.objs[0] :
            dominated = True
            break
    if not dominated:
        pareto_front.add(ind)
 
print("Pareto front:")
for ind in pareto_front:
    print(f"x={ind.x:.4f}, y1={ind.objs[0]:.4f}")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

四、模型测试

最终模型输出最优的超参数C为7.6418，对应的评价指标MSE为87.2814。在这里插入图片描述

相关阅读:
JSP项目进度管理系统myeclipse开发sql数据库BS模式java编程网页结构
 Nginx的代理和负载均衡
 解锁网页开发的力量：深入探讨 JavaScript 编程
 vxe Table 复选框分页数据记忆选中问题
 Golang处理gRPC请求/响应元数据
 java继承——super关键字的使用
 spark学习笔记（一）——模拟分布式计算
 文献阅读-Robsut Wrod Reocginiton via Semi-Character Recurrent Neural Network
Qt的信号与槽的使用
 【架构师】解决方案架构师常用的5种类型架构图
原文地址：https://blog.csdn.net/weixin_48618536/article/details/134298687