RuntimeError: Trying to resize storage that is not resizable
thwwu 2024-07-06 12:31:02 阅读 81
问题
今天模型训练,遇到了个bug
先是在dataloder那报了这样一个错
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
然后后面报
RuntimeError: Trying to resize storage that is not resizable
完整错误代码如下
<code>Traceback (most recent call last):
File "train_temp.py", line 100, in <module>
for data in train_dataloader:
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data
return self._process_data(data)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data
data.reraise()
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collate
return collate(batch, collate_fn_map=default_collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in collate
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in <listcomp>
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fn
return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn
out = elem.new(storage).resize_(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable
解决
一开始,在博客上看到是num_works设置有问题,需要设置为0 或 和显卡相同的数
当时,还是有点怀疑,因为之前还设置了16,显卡是4张,也没报错,还是尝试了下,看看问题解决没,(因为当时没想法了),果然,仍然报错
后来,看到这篇博客,感谢博主大大(点击),作者在末尾,提到数据维度不统一的问题,于是,就在dataloder中打印了下自己的数据维度,结果发现,输入的input和label的shape竟然不一样!!!!
一个是384*384*1,一个是256*256*1
要怀疑人生了>_<
然后,改了裁剪的大小,就好了^_^
琐碎
1 num_works是有多少个进程去加载数据,与显卡数量无关,只不过一般是相等,可以在训练的时候慢慢增加num_works直到加载数据速度无明显提升
2 数据集数据集!
上一篇: 万字长文,详细解读AI大模型技术原理!!
下一篇: AI知识库和Agent简介及实现
本文标签
RuntimeError: Trying to resize storage that is not resizable
声明
本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。