目录
训练集中给出美国某些州五天COVID-19的感染人数(及相关特征数据),测试集中给出前四天的相关数据,预测第五天的感染人数。 下载地址:ML2022Spring-hw1 | Kaggle
特征包括:
● States (37, 独热编码)
● COVID-like illness (4)
○ cli、ili …
● Behavior Indicators (8)
○ wearing_mask、travel_outside_state …
● Mental Health Indicators (3)
○ anxious、depressed …
● Tested Positive Cases (1)
○ tested_positive (this is what we want to predict)
训练集有2699行, 118列 (id + 37 states + 16 features x 5 days)
测试集有1078,117列 (without last day's positive rate)
- # Numerical Operations
- import math
- import numpy as np
- from sklearn.model_selection import train_test_split
-
- # Reading/Writing Data
- import pandas as pd
- import os
- import csv
-
- # For Progress Bar
- from tqdm import tqdm
- from d2l import torch as d2l
-
- # Pytorch
- import torch
- import torch.nn as nn
- from torch.utils.data import Dataset, DataLoader, random_split, TensorDataset
-
- # For plotting learning curve
- from torch.utils.tensorboard import SummaryWriter
我的理解是,模型初始化和验证集划分都要用到seed,这里固定下来
- def same_seed(seed):
- '''Fix