Skip to content
This repository was archived by the owner on Mar 14, 2025. It is now read-only.
This repository was archived by the owner on Mar 14, 2025. It is now read-only.

Tensorrt fp32 to int8 convertion error on model that has 2 inputs  #41

@FatihcanUslu

Description

@FatihcanUslu

Description

I am trying to run a transformer model on tensorrt. To be more specific my model is vitb_256_mae based. I already converted it into onnx file format. I am trying to run it on tensorrt runtime. So in order to do that i made a python script that creates an engine for fp32 to fp16 convertion. It works just fine for this situration. After that in order to convert fp32 to int8, i used your snippet. But i got stuck with some cuda driver errors.
My model takes template and search as input, and produce 5 outputs.

Important changes

In order to create optimization profiles for multiple inputs i tweaked your onnx_to_tensorrt.py a little bit:

def create_optimization_profiles(builder, inputs, batch_sizes=[1,8,16,32,64]): 
    # Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
    if all([inp.shape[0] > -1 for inp in inputs]):
        profile = builder.create_optimization_profile()
        for inp in inputs:
            fbs, shape = inp.shape[0], inp.shape[1:]
            profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
        #print(profile.get_shape("template"))
        #print(profile.get_shape("search"))
        return [profile]

Environment

TensorRT Version: 8.6.1 (from pip not builded from source)
GPU Type: RTX3090 ti
Nvidia Driver Version: nvidia-driver-530
CUDA Version: 12.1
CUDNN Version: dont have
Operating System + Version: ubuntu 22.04.2 lts
Python Version (if applicable): 3.10.12

More info

I tryed int8 conversion with 1 dynamic input using this link:
NVIDIA/TensorRT#289
for alexnet code worked. Thats why i think the error might because of the number of input. Or i might have to make some changes at imagenetcalibrator file.

Thanks for reading good days !

Output errors

2023-08-10 11:27:20 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR
[08/10/2023-11:27:23] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:177: DeprecationWarning: Use set_memory_pool_limit instead.
config.max_workspace_size = 4**30 # 1GiB
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.FP16
2023-08-10 11:27:23 - main - INFO - Setting BuilderFlag.INT8
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Collecting calibration files from: /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/imagenet/val
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Number of Calibration Files found: 10869
2023-08-10 11:27:23 - ImagenetCalibrator - WARNING - Capping number of calibration images to max_calibration_size: 512
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[08/10/2023-11:27:23] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
2023-08-10 11:27:23 - main - DEBUG - === Network Description ===
2023-08-10 11:27:23 - main - DEBUG - Input 0 | Name: template | Shape: (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - Input 1 | Name: search | Shape: (1, 3, 256, 256)
2023-08-10 11:27:23 - main - DEBUG - Output 0 | Name: pred_boxes | Shape: (1, 1, 4)
2023-08-10 11:27:23 - main - DEBUG - Output 1 | Name: score_map | Shape: (1, 1, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 2 | Name: size_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 3 | Name: offset_map | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - main - DEBUG - Output 4 | Name: backbone_feat | Shape: (1, 320, 768)
2023-08-10 11:27:23 - main - DEBUG - === Optimization Profiles ===
2023-08-10 11:27:23 - main - DEBUG - template - OptProfile 0 - Min (1, 3, 128, 128) Opt (1, 3, 128, 128) Max (1, 3, 128, 128)
2023-08-10 11:27:23 - main - DEBUG - search - OptProfile 0 - Min (1, 3, 256, 256) Opt (1, 3, 256, 256) Max (1, 3, 256, 256)
2023-08-10 11:27:23 - main - INFO - Building Engine...
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:222: DeprecationWarning: Use build_serialized_network instead.
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
[08/10/2023-11:27:26] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
2023-08-10 11:27:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0)
[08/10/2023-11:27:27] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
Traceback (most recent call last):
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 227, in
main()
File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 222, in main
with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: enter
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions