• Hugging Face实战-系列教程3:AutoModelForSequenceClassification文本2分类


    🚩🚩🚩Hugging Face 实战系列 总目录

    有任何问题欢迎在下面留言
    本篇文章的代码运行界面均在notebook中进行
    本篇文章配套的代码资源已经上传

    下篇内容:
    Hugging Face实战-系列教程4:padding与attention_mask

    ​输出我们需要几个输出呢?比如说这个cls分类,我们做一个10分类,可以吗?对每一个词做10分类可以吗?预测下一个词是什么可以吗?是不是也可以!

    在我们的NLP任务中,相比图像任务有分类有回归,NLP有回归这一说吗?我们要做的所有任务都是分类,就是把分类做到哪儿而已,不管做什么都是分类。

    比如我们刚刚导入的两个英语句子,是对序列做情感分析,就是一个二分类,用序列做分类,你想导什么输出头,你就导入什么东西就可以了,简不简单?好简单是不是,上代码:

    from transformers import AutoModelForSequenceClassification
    checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
    model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
    outputs = model(**inputs)
    print(outputs.logits.shape)
    
    • 1
    • 2
    • 3
    • 4
    • 5

    导入一个序列分类的包,还是选择checkpoint这个名字,选择分词器,导入模型,将模型打印一下:

    DistilBertForSequenceClassification(
    (distilbert): DistilBertModel(
    (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
    (layer): ModuleList(
    (0): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    (1): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    (2): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    (3): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    (4): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    (5): TransformerBlock(
    (attention): MultiHeadSelfAttention(
    (dropout): Dropout(p=0.1, inplace=False)
    (q_lin): Linear(in_features=768, out_features=768, bias=True)
    (k_lin): Linear(in_features=768, out_features=768, bias=True)
    (v_lin): Linear(in_features=768, out_features=768, bias=True)
    (out_lin): Linear(in_features=768, out_features=768, bias=True)
    )
    (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (ffn): FFN(
    (dropout): Dropout(p=0.1, inplace=False)
    (lin1): Linear(in_features=768, out_features=3072, bias=True)
    (lin2): Linear(in_features=3072, out_features=768, bias=True)
    )
    (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    )
    )
    )
    )
    (pre_classifier): Linear(in_features=768, out_features=768, bias=True)
    (classifier): Linear(in_features=768, out_features=2, bias=True)
    (dropout): Dropout(p=0.2, inplace=False)
    )

    看看多了什么?前面我们说对每一个词生成一个768向量,最后就连了两个全连接层:

    (pre_classifier): Linear(in_features=768, out_features=768, bias=True)
    (classifier): Linear(in_features=768, out_features=2, bias=True)
    (dropout): Dropout(p=0.2, inplace=False)

    这个logits就是输出结果了:

    print(outputs.logits.shape)
    torch.Size([2, 2])

    这个2*2表示的就是样本为2(两个英语句子),分类是2分类,但是我们需要得到最后的分类概率,再加上softmax:

    import torch
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    print(predictions)
    
    • 1
    • 2
    • 3

    dim=-1就是沿着最后一个维度进行计算,最后返回的就是概率值:

    tensor([[1.5446e-02, 9.8455e-01], [9.9946e-01, 5.4418e-04]], grad_fn=SoftmaxBackward0)

    概率知道了,类别的概率是什么呢?调一个内置的id to label配置:

    model.config.id2label
    {0: 'NEGATIVE', 1: 'POSITIVE'}
    
    • 1
    • 2

    也就是说,第一个句子负面情感的概率为1.54%,正面的概率情感为98.46%

    下篇内容:
    Hugging Face实战-系列教程4:padding与attention_mask

  • 相关阅读:
    《Vue》——从新电脑开始搭建一个已有Vue2项目的环境
    Java手写斐波那契数列算法和斐波那契数列算法应用拓展案例
    OpenCV在Windows系统上安装编译
    scala/java redis的cluster模式 删除固定前缀的key
    nginx可以转发telegraf发送给kafka的数据吗?
    SpringBoot+Mybatis实现代码获取建表语句并实现动态建表
    【面经】讲一下spring aop
    【密码学】块加密(分组加密)的工作模式
    Centos - openldap
    Java:SpringBoot整合SSE(Server-Sent Events)实现后端主动向前端推送数据
  • 原文地址:https://blog.csdn.net/weixin_50592077/article/details/132641216