多导睡眠图（PSG）数据的睡眠阶段分类

根据AASM规则，确定了5个睡眠阶段：唤醒(W)、快速眼动(REM)、非快速眼动(N1)、非快速眼动(N2)和非快速眼动(N3，也称为慢波睡眠甚至深度睡眠)。【关于睡眠EEG波形更详细地描述和分类比较可参见此文：干货分享 | EEG波形判别上手指南】它们的特征是具有不同的时间和频率模式，并且在整晚睡眠阶段所占的比例也不同。例如，像N1这样的过渡阶段的频率低于REM或N2。在AASM规则下，还记录了两个不同阶段之间的过渡情况，并且可能会调节睡眠评分者的最终决定。然而实际上，某些过渡阶段或转换过程终止，以及转换被加强等情况，这些取决于某些事件的发生，例如关于N1-N2过渡阶段的唤醒、K-复合波或纺锤波【点此查看纺锤波的更多信息→纺锤波：EEG中纺锤波参数分析和检测框架，并应用于睡眠纺锤波】。尽管通过睡眠专家的检查可以收集非常宝贵的信息，但睡眠评估是一项繁琐且耗时的任务，而且还受制于评分者的主观性和可变性。

自动睡眠评分方法引起了许多研究人员的兴趣。从统计机器学习的角度来看，该问题是一个不平衡的多类预测问题。最先进的自动方法可以分为两类，这取决于用于分类的特征是使用专家知识提取的，还是从原始信号中学习的。第一类方法依赖于有关信号和事件的先验知识。第二类方法包括从转换后的数据或直接来自卷积神经网络的原始数据中学习适当的特征表征。最近，提出了一种使用对抗性深度神经网络对脑电信号进行睡眠阶段分类的方法。

统计机器学习主要的挑战之一是分类任务的不平衡性质，但是这对于应用过程具有重要的实际意义。一般来说，与N2阶段相比，像N1这样的睡眠阶段通常是比较少见的。当学习一个具有非常不平衡类的预测算法时，通常会发生的情况是系统往往不会预测最稀少的类。解决此问题的一种方法是重加权模型的损失函数。与神经网络中使用的在线训练方法一样，利用平衡采样向网络提供批量数据，其中包含每个类的尽可能多的数据点。统计学习的另一个挑战与处理过渡阶段或转换规则的方式有关。实际上，由于该过程可能会影响评分者的最终决定，因此预测模型可能会考虑这一点以提高其表现。这可以通过向最终分类器提供来自相邻时间段的特征来实现。这就是所谓的睡眠阶段分类。

本文使用端到端深度学习方法，使用多元时间序列来进行时间睡眠阶段分类。并使用来自给定Sleep Physionet数据集中的两名被试，即Alice和Bob(因演示需求所取的名字)，本文将阐述如何从Alice的数据中预测Bob的睡眠阶段。该问题可被作为有监督的多类分类任务来解决，目标是从5个可能的阶段预测每30s数据块的睡眠阶段。基于Python工具包MNE进行分析。

import numpy as np

import matplotlib.pyplot as plt

import mne

from mne.datasets.sleep_physionet.age import fetch_data

from mne.time_frequency import psd_welch

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

from sklearn.metrics import confusion_matrix

from sklearn.metrics import classification_report

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import FunctionTransformer

加载数据

MNE-Python提供了mne.datasets.sleep_physionet.age.fetch_data()，能够方便地从Sleep Physionet数据集中下载数据。给定被试和记录列表，fetcher下载数据并为每个被试提供一对文件：

-PSG.edf包含多导睡眠图。

-Hypnogram.edf包含由专家记录的注释。

将这两者结合在一个mne.io.Raw对象中，然后根据注释的描述提取事件(events)以获得epochs。

读取PSG数据和睡眠图以创建一个原始对象

ALICE, BOB = 0, 1

[alice_files, bob_files] = fetch_data(subjects=[ALICE, BOB], recording=[1])

raw_train = mne.io.read_raw_edf(alice_files[0], stim_channel='Event marker',

misc=['Temp rectal'])

annot_train = mne.read_annotations(alice_files[1])

raw_train.set_annotations(annot_train, emit_warning=False)

# plot some data

# scalings were chosen manually to allow for simultaneous visualization of

# different channel types in this specific dataset

raw_train.plot(start=60, duration=60,

scalings=dict(eeg=1e-4, resp=1e3, eog=1e-4, emg=1e-7,

misc=1e-1))

Using default location ~/mne_data for PHYSIONET_SLEEP...

Extracting EDF parameters from /home/circleci/mne_data/physionet-sleep-data/SC4001E0-PSG.edf...

EDF file detected

Setting channel info structure...

Creating raw.info structure...

从注释中提取30s的事件

Sleep Physionet数据集使用8个标签进行注释：Wake(W)、Stage1、Stage 2、Stage 3、Stage 4，对应于从轻度睡眠到深度睡眠的范围、REM睡眠(R)，其中REM是快速眼动睡眠的缩写，运动(M)和没有记分的阶段(?)。本文将只使用5个阶段：Wake(W)、Stage 1、Stage 2、Stage 3/4、REM睡眠(R)。为此，使用event_id参数in mne.events_from_annotations()来选择我们感兴趣的事件，并为每个事件关联一个事件标识符。

此外，这些记录包含每晚前后的长时间Wake(W)区域。为了限制类不平衡带来的影响，只保留第一个W时间段出现的前30min到最后一个睡眠阶段出现的后30min来修剪每个记录数据。

annotation_desc_2_event_id = {'Sleep stage W': 1,

'Sleep stage 1': 2,

'Sleep stage 2': 3,

'Sleep stage 3': 4,

'Sleep stage 4': 4,

'Sleep stage R': 5}

# keep last 30-min wake events before sleep and first 30-min wake events after

# sleep and redefine annotations on raw data

annot_train.crop(annot_train[1]['onset'] - 30 * 60,

annot_train[-2]['onset'] + 30 * 60)

raw_train.set_annotations(annot_train, emit_warning=False)

events_train, _ = mne.events_from_annotations(

raw_train, event_id=annotation_desc_2_event_id, chunk_duration=30.)

# create a new event_id that unifies stages 3 and 4

event_id = {'Sleep stage W': 1,

'Sleep stage 1': 2,

'Sleep stage 2': 3,

'Sleep stage 3/4': 4,

'Sleep stage R': 5}

# plot events

fig = mne.viz.plot_events(events_train, event_id=event_id,

sfreq=raw_train.info['sfreq'],

first_samp=events_train[0, 0])

# keep the color-code for further plotting

stage_colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

OUT：
Used Annotations descriptions: ['Sleep stage 1', 'Sleep stage 2', 'Sleep stage 3', 'Sleep stage 4', 'Sleep stage R', 'Sleep stage W']

根据事件创建Epochs

tmax = 30. - 1. / raw_train.info['sfreq'] # tmax in included

epochs_train = mne.Epochs(raw=raw_train, events=events_train,

event_id=event_id, tmin=0., tmax=tmax, baseline=None)

print(epochs_train)

OUT：

Not setting metadata

841 matching events found

No baseline correction applied

0 projection items activated

'Sleep stage W': 188

'Sleep stage 1': 58

'Sleep stage 2': 250

'Sleep stage 3/4': 220

'Sleep stage R': 125>

应用相同的步骤，加载Bob的数据(测试数据)

raw_test = mne.io.read_raw_edf(bob_files[0], stim_channel='Event marker',

misc=['Temp rectal'])

annot_test = mne.read_annotations(bob_files[1])

annot_test.crop(annot_test[1]['onset'] - 30 * 60,

annot_test[-2]['onset'] + 30 * 60)

raw_test.set_annotations(annot_test, emit_warning=False)

events_test, _ = mne.events_from_annotations(

raw_test, event_id=annotation_desc_2_event_id, chunk_duration=30.)

epochs_test = mne.Epochs(raw=raw_test, events=events_test, event_id=event_id,

tmin=0., tmax=tmax, baseline=None)

print(epochs_test)

OUT：

Extracting EDF parameters from /home/circleci/mne_data/physionet-sleep-data/SC4011E0-PSG.edf...

EDF file detected

Setting channel info structure...

Creating raw.info structure...

Used Annotations descriptions: ['Sleep stage 1', 'Sleep stage 2', 'Sleep stage 3', 'Sleep stage 4', 'Sleep stage R', 'Sleep stage W']

Not setting metadata

1103 matching events found

No baseline correction applied

0 projection items activated

'Sleep stage W': 157

'Sleep stage 1': 109

'Sleep stage 2': 562

'Sleep stage 3/4': 105

'Sleep stage R': 170>

特征工程

观察不同睡眠阶段的各个epochs的功率谱密度图(PSD)，可以看到不同睡眠阶段具有不同的特征。这些特征在Alice和Bob的数据之间是相似的。接下来，本节将根据特定频段中的相对功率来创建EEG特征，以捕获数据中睡眠阶段之间的这种差异。

# visualize Alice vs. Bob PSD by sleep stage.

fig, (ax1, ax2) = plt.subplots(ncols=2)

# iterate over the subjects

stages = sorted(event_id.keys())

for ax, title, epochs in zip([ax1, ax2],

['Alice', 'Bob'],

[epochs_train, epochs_test]):

for stage, color in zip(stages, stage_colors):

epochs[stage].plot_psd(area_mode=None, color=color, ax=ax,

fmin=0.1, fmax=20., show=False,

average=True, spatial_colors=False)

ax.set(title=title, xlabel='Frequency (Hz)')

ax2.set(ylabel='µV^2/Hz (dB)')

ax2.legend(ax2.lines[2::3], stages)

plt.show()