Tensorrt fp32 to int8 convertion error on model that has 2 inputs 

## Description

I am trying to run a transformer model on tensorrt. To be more specific my model is vitb_256_mae based. I already converted it into onnx file format. I am trying to run it on tensorrt runtime. So in order to do that i made a python script that creates an engine for fp32 to fp16 convertion. It works just fine for this situration. After that in order to convert fp32 to int8, i used your snippet. But i got stuck with some cuda driver errors. 
My model takes template and search as input, and produce 5 outputs. 

## Important changes
In order to create optimization profiles for multiple inputs i tweaked your onnx_to_tensorrt.py a little bit:

```
def create_optimization_profiles(builder, inputs, batch_sizes=[1,8,16,32,64]): 
    # Check if all inputs are fixed explicit batch to create a single profile and avoid duplicates
    if all([inp.shape[0] > -1 for inp in inputs]):
        profile = builder.create_optimization_profile()
        for inp in inputs:
            fbs, shape = inp.shape[0], inp.shape[1:]
            profile.set_shape(inp.name, min=(fbs, *shape), opt=(fbs, *shape), max=(fbs, *shape))
        #print(profile.get_shape("template"))
        #print(profile.get_shape("search"))
        return [profile]
```

## Environment

**TensorRT Version**: 8.6.1 (from pip not builded from source)
**GPU Type**: RTX3090 ti
**Nvidia Driver Version**: nvidia-driver-530
**CUDA Version**: 12.1
**CUDNN Version**: dont have
**Operating System + Version**: ubuntu 22.04.2 lts
**Python Version (if applicable)**: 3.10.12

## More info
I tryed int8 conversion with 1 dynamic input using this link:
https://github.com/NVIDIA/TensorRT/issues/289
for alexnet code worked. Thats why i think the error might because of the number of input. Or i might have to make some changes at imagenetcalibrator file.

Thanks for reading good days !


## Output errors

2023-08-10 11:27:20 - __main__ - INFO - TRT_LOGGER Verbosity: Severity.ERROR
[08/10/2023-11:27:23] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:177: DeprecationWarning: Use set_memory_pool_limit instead.
  config.max_workspace_size = 4**30 # 1GiB
2023-08-10 11:27:23 - __main__ - INFO - Setting BuilderFlag.FP16
2023-08-10 11:27:23 - __main__ - INFO - Setting BuilderFlag.INT8
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Collecting calibration files from: /home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/imagenet/val
2023-08-10 11:27:23 - ImagenetCalibrator - INFO - Number of Calibration Files found: 10869
2023-08-10 11:27:23 - ImagenetCalibrator - WARNING - Capping number of calibration images to max_calibration_size: 512
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/10/2023-11:27:23] [TRT] [W] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[08/10/2023-11:27:23] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output.
2023-08-10 11:27:23 - __main__ - DEBUG - === Network Description ===
2023-08-10 11:27:23 - __main__ - DEBUG - Input  0 | Name: template      | Shape: (1, 3, 128, 128)
2023-08-10 11:27:23 - __main__ - DEBUG - Input  1 | Name: search        | Shape: (1, 3, 256, 256)
2023-08-10 11:27:23 - __main__ - DEBUG - Output 0 | Name: pred_boxes    | Shape: (1, 1, 4)
2023-08-10 11:27:23 - __main__ - DEBUG - Output 1 | Name: score_map     | Shape: (1, 1, 16, 16)
2023-08-10 11:27:23 - __main__ - DEBUG - Output 2 | Name: size_map      | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - __main__ - DEBUG - Output 3 | Name: offset_map    | Shape: (1, 2, 16, 16)
2023-08-10 11:27:23 - __main__ - DEBUG - Output 4 | Name: backbone_feat | Shape: (1, 320, 768)
2023-08-10 11:27:23 - __main__ - DEBUG - === Optimization Profiles ===
2023-08-10 11:27:23 - __main__ - DEBUG - template - OptProfile 0 - Min (1, 3, 128, 128) Opt (1, 3, 128, 128) Max (1, 3, 128, 128)
2023-08-10 11:27:23 - __main__ - DEBUG - search - OptProfile 0 - Min (1, 3, 256, 256) Opt (1, 3, 256, 256) Max (1, 3, 256, 256)
2023-08-10 11:27:23 - __main__ - INFO - Building Engine...
/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py:222: DeprecationWarning: Use build_serialized_network instead.
  with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
[08/10/2023-11:27:26] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
2023-08-10 11:27:27 - ImagenetCalibrator - INFO - Calibration images pre-processed: 32/512
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0)
[08/10/2023-11:27:27] [TRT] [E] 1: [executionContext.cpp::executeInternal::1177] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [resizingAllocator.cpp::deallocate::105] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 3: [engine.cpp::~Engine::298] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::298, condition: mExecutionContextCounter.use_count() == 1. Destroying an engine object before destroying the IExecutionContext objects it created leads to undefined behavior.
)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaDriverHelpers.cpp::operator()::94] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[08/10/2023-11:27:27] [TRT] [E] 2: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
Traceback (most recent call last):
  File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 227, in <module>
    main()
  File "/home/pc-3730/fatihcan/OSTRACK_TENSORRT/tensorrt-utils-20.01/classification/imagenet/onnx_to_tensorrt.py", line 222, in main
    with builder.build_engine(network, config) as engine, open(args.output, "wb") as f:
AttributeError: __enter__
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tensorrt fp32 to int8 convertion error on model that has 2 inputs #41

Description

Important changes

Environment

More info

Output errors

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tensorrt fp32 to int8 convertion error on model that has 2 inputs #41

Description

Description

Important changes

Environment

More info

Output errors

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions