• pytorch 训练时raise EOFError EOFError


    训练到一半时获取验证数据报错

    报错代码

    imgs = next(iter(val_dataloader))

        val_dataloader = DataLoader(
            ImageDataset("data/%s" % opt.dataset_name, transforms_=transforms_, unaligned=True, mode="test"),
            batch_size=5,
            shuffle=True,
            num_workers=2,
        )
    
    def sample_images(batches_done):
        """Saves a generated sample from the test set"""
        imgs = next(iter(val_dataloader))
        G_AB.eval()
        G_BA.eval()
        real_A = Variable(imgs["A"].type(Tensor))
        fake_B = G_AB(real_A)
        real_B = Variable(imgs["B"].type(Tensor))
        fake_A = G_BA(real_B)
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    错误

    File "/content/cyclegan/cyclegan.py", line 324, in <module>
        sample_images(batches_done)
      File "/content/cyclegan/cyclegan.py", line 56, in sample_images
        imgs = next(iter(val_dataloader))
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in __next__
        data = self._next_data()
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1328, in _next_data
        idx, data = self._get_data()
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1294, in _get_data
        success, data = self._try_get_data()
      File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1132, in _try_get_data
        data = self._data_queue.get(timeout=timeout)
      File "/usr/lib/python3.10/multiprocessing/queues.py", line 122, in get
        return _ForkingPickler.loads(res)
      File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/reductions.py", line 307, in rebuild_storage_fd
        fd = df.detach()
      File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 58, in detach
        return reduction.recv_handle(conn)
    Traceback (most recent call last):
      File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 145, in _serve
        send(conn, destination_pid)
      File "/usr/lib/python3.10/multiprocessing/resource_sharer.py", line 50, in send
      File "/usr/lib/python3.10/multiprocessing/reduction.py", line 189, in recv_handle
        reduction.send_handle(conn, new_fd, pid)
        return recvfds(s, 1)[0]
      File "/usr/lib/python3.10/multiprocessing/reduction.py", line 184, in send_handle
      File "/usr/lib/python3.10/multiprocessing/reduction.py", line 159, in recvfds
        raise EOFError
    EOFError
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    解决方法

    尝试了很多方法没解决,最终找到了解决方案
    在dataset.py里加上以下代码

    import torch.multiprocessing
    torch.multiprocessing.set_sharing_strategy('file_system')
    
    • 1
    • 2
  • 相关阅读:
    LLVM学习笔记(60)
    单调栈leetcode.907
    【2022版】Spring面试题整理(含答案解析)
    联邦学习:对“数据隐私保护”和“数据孤岛”困境的破局
    车路协同 智能路侧决策系统边缘计算系统功能技术要求
    Nginx全家桶配置详解
    Jetpack Compose 和 SwiftUI 与 Flutter 的比较
    Android R 11.x quickstep 手势导航架构和详细实现
    vite + react + typescript + uni-app + node 开发一个生态系统
    基于FME Desktop和FME Server的数据增量自动更新
  • 原文地址:https://blog.csdn.net/flysnownet/article/details/133844643