• 【DEBUG】mmseg训练报错RuntimeError: Trying to resize storage that is not resizable


    🚀debug专栏


    目录

    ❓❓问题1: 

     🌻🌻解决方案:

    ❓❓问题2: 

    🌻🌻解决方案:


            mmseg训练,遇到了个数据加载过程中的bug,特此记录下debug过程和思路。其他debug请参考上文中【debug专栏】

    ❓❓问题1: 

            先是在dataloder那报了这样一个错
    RuntimeError: Caught RuntimeError in DataLoader worker process 0.

            这是经常在数据加载过程中遇到的问题,主要还是看后面的详细报错说明。


            然后后面报错
    RuntimeError: Trying to resize storage that is not resizable

     🌻🌻解决方案

            报错这种思路,首先应该定位到详细的报错位置“RuntimeError: Trying to resize storage that is not resizable”这一句,完整报错如下:

    1. Traceback (most recent call last):
    2. File "train.py", line 100, in
    3. for data in train_dataloader:
    4. File "/data0/thw/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
    5. data = self._next_data()
    6. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
    7. return self._process_data(data)
    8. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
    9. data.reraise()
    10. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
    11. raise exception
    12. RuntimeError: Caught RuntimeError in DataLoader worker process 0.
    13. Original Traceback (most recent call last):
    14. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    15. data = fetcher.fetch(index)
    16. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
    17. return self.collate_fn(data)
    18. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
    19. return collate(batch, collate_fn_map=default_collate_fn_map)
    20. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in collate
    21. return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
    22. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in
    23. return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
    24. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
    25. return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
    26. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fn
    27. return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)
    28. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
    29. return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
    30. File "/XXX/anaconda3/envs/Mmseg/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn
    31. out = elem.new(storage).resize_(len(batch), *list(elem.size()))
    32. RuntimeError: Trying to resize storage that is not resizable

            解决方法:网上很多网友说是设置的num_works不对导致的,需要设置为0 或 和显卡相同的数。但是我修改了此处仍然报错。

            后来检查输入数据image和label的尺寸,报错原因果然是因为尺寸不一致,检查后修改成一致尺寸,解决了!!!


    ❓❓问题2: 

            训练模型加载数据时,报错DataLoader worker (pid xxx) is killed by signal: Killed.

            此处的报错信息没有其他的详细内容了,只有这一句,这就愁人了,冲浪搜索了一下,发现还是刚才的解决方案。

    🌻🌻解决方案:

            num_works设置有问题,需要设置为0 或 和显卡相同的数量。个人经验,可以设置成 与显卡数量成倍数的数字 ,解决了!!!

            举例:显卡数量是2,这里的 num_works就可以设置成4/8/16等,只要不爆显存,越大越好,但是也不建议超过64。


    总结:

            训练报错不要慌,检查下报错停止为止的上方是否有其他报错信息,详细的报错信息可能在上方,需要翻找一下第一个报错位置,一般就是真实的报错了,其他的模糊报错可能就是因为这个报错导致的,改了第一个报错位置可能后面的报错也就解决了。

    整理不易,欢迎一键三连!!!


    送你们一条美丽的--分割线--

    🌷🌷🍀🍀🌾🌾🍓🍓🍂🍂🙋🙋🐸🐸🙋🙋💖💖🍌🍌🔔🔔🍉🍉🍭🍭🍋🍋🍇🍇🏆🏆📸📸⛵⛵⭐⭐🍎🍎👍👍🌷🌷

  • 相关阅读:
    哪些有哪些搜索引擎及app的下拉词可以·5月昔年优化新盘点
    收藏这篇文章,教你学会如何录音转文字
    SpringMVC中的视图
    阿里云大数据分析师职业认证
    vue基础3(六)解构赋值与解构插槽,动态插槽,插槽缩写#
    由浅到深的操作系统学习
    Cocos Creator实现不规则区域点击
    java并发面试题
    CSS -- a:link 失效问题 及 属性选择器
    深度学习遇到报错Bug解决方法(不定时更新)
  • 原文地址:https://blog.csdn.net/qq_38308388/article/details/134017869