RuntimeError: Trying to resize storage that is not resizable

thwwu 2024-07-06 12:31:02 阅读 81

问题

今天模型训练,遇到了个bug

先是在dataloder那报了这样一个错

RuntimeError: Caught RuntimeError in DataLoader worker process 0.

然后后面报

RuntimeError: Trying to resize storage that is not resizable

完整错误代码如下

<code>Traceback (most recent call last):

File "train_temp.py", line 100, in <module>

for data in train_dataloader:

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__

data = self._next_data()

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data

return self._process_data(data)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data

data.reraise()

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise

raise exception

RuntimeError: Caught RuntimeError in DataLoader worker process 0.

Original Traceback (most recent call last):

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop

data = fetcher.fetch(index)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch

return self.collate_fn(data)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate

return collate(batch, collate_fn_map=default_collate_fn_map)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in collate

return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in <listcomp>

return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate

return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fn

return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate

return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)

File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn

out = elem.new(storage).resize_(len(batch), *list(elem.size()))

RuntimeError: Trying to resize storage that is not resizable

解决

一开始,在博客上看到是num_works设置有问题,需要设置为0 或 和显卡相同的数

当时,还是有点怀疑,因为之前还设置了16,显卡是4张,也没报错,还是尝试了下,看看问题解决没,(因为当时没想法了),果然,仍然报错

后来,看到这篇博客,感谢博主大大(点击),作者在末尾,提到数据维度不统一的问题,于是,就在dataloder中打印了下自己的数据维度,结果发现,输入的input和label的shape竟然不一样!!!!

一个是384*384*1,一个是256*256*1

要怀疑人生了>_<

然后,改了裁剪的大小,就好了^_^

琐碎

1 num_works是有多少个进程去加载数据,与显卡数量无关,只不过一般是相等,可以在训练的时候慢慢增加num_works直到加载数据速度无明显提升

2 数据集数据集!



声明

本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。