Skip to content

[MAE] RuntimeError: The size of tensor a (1025) must match the size of tensor b (577) at non-singleton dimension 1 Exception in thread Thread-3: #1782

@MIMIWAWA

Description

@MIMIWAWA

Checklist

  1. I have searched related issues but cannot get the expected help.
    masked autoencoder customize dataset #1613
    RuntimeError: The size of tensor a (125) must match the size of tensor b (128) at non-singleton dimension 2 #80
    Use of RGB2Gray in train pipeline causes RuntimeError #1706

Describe the bug
RuntimeError: The size of tensor a (1025) must match the size of tensor b (577) at non-singleton dimension 1 Exception in thread Thread-3:

Reproduction

  1. What command or script did you run?
    upernet_mae-base_fp16_512x512_160k_ade20k_ms.py
python tools/train.py configs/mae/upernet_mae-base_fp16_512x512_160k_ade20k_ms.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
upernet_mae-base_fp16_512x512_160k_ade20k_ms.py
        # img_scale=(2048, 512),
        img_scale=(384, 384),
  1. What dataset did you use?
    custom dataset

Environment

sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.64
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.8.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.0+cu111
OpenCV: 4.5.5
MMCV: 1.6.0
MMCV Compiler: GCC 9.4
MMCV CUDA Compiler: 11.7
MMSegmentation: 0.26.0+891448f

Error traceback

2022-07-15 04:30:17,403 - mmseg - INFO - Exp name: upernet_mae-base_fp16_512x512_160k_ade20k_ms.py
2022-07-15 04:30:17,403 - mmseg - INFO - Epoch [1][371/371]     lr: 9.120e-08, eta: 11:51:55, time: 2.282, data_time: 0.083, memory: 28193, decode.loss_ce: 0.3979, decode.acc_seg: 79.8963, aux.loss_ce: 0.1981, aux.acc_seg: 75.4642, loss: 0.5960
[                                                  ] 0/1073, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/train.py", line 243, in <module>
    main()
  File "tools/train.py", line 232, in main
    train_segmentor(
  File "/mmsegmentation/mmseg/apis/train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/mmcv/mmcv/runner/epoch_based_runner.py", line 136, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/mmcv/mmcv/runner/epoch_based_runner.py", line 58, in train
    self.call_hook('after_train_epoch')
  File "/mmcv/mmcv/runner/base_runner.py", line 317, in call_hook
    getattr(hook, fn_name)(self)
  File "/mmcv/mmcv/runner/hooks/evaluation.py", line 271, in after_train_epoch
    self._do_evaluate(runner)
  File "/mmsegmentation/mmseg/core/evaluation/eval_hooks.py", line 51, in _do_evaluate
    results = single_gpu_test(
  File "/mmsegmentation/mmseg/apis/test.py", line 91, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mmcv/mmcv/parallel/data_parallel.py", line 51, in forward
    return super().forward(*inputs, **kwargs)
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mmcv/mmcv/runner/fp16_utils.py", line 146, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/mmsegmentation/mmseg/models/segmentors/base.py", line 110, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/mmsegmentation/mmseg/models/segmentors/base.py", line 94, in forward_test
    return self.aug_test(imgs, img_metas, **kwargs)
  File "/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 281, in aug_test
    seg_logit = self.inference(imgs[0], img_metas[0], rescale)
  File "/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 245, in inference
    seg_logit = self.slide_inference(img, img_meta, rescale)
  File "/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 179, in slide_inference
    crop_seg_logit = self.encode_decode(crop_img, img_meta)
  File "/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 73, in encode_decode
    x = self.extract_feat(img)
  File "/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 65, in extract_feat
    x = self.backbone(img)
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mmsegmentation/mmseg/models/backbones/mae.py", line 246, in forward
    x = x + self.pos_embed
RuntimeError: The size of tensor a (1025) must match the size of tensor b (577) at non-singleton dimension 1
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/opt/conda/envs/cnx38/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/envs/cnx38/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/opt/conda/envs/cnx38/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
    fd = df.detach()
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/cnx38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions