CV

TensorRT

TensorRT

Also known as: TensorRT-LLM

TensorRT is NVIDIA's inference SDK that compiles ONNX or PyTorch models into highly optimized GPU engines. It applies layer fusion, kernel auto-tuning, FP16 / INT8 / FP8 quantization, and dynamic shape handling. On Jetson Orin and DGX hardware, TensorRT typically delivers 3 to 8× faster inference than the unoptimized PyTorch…

Definition

TensorRT is NVIDIA's inference SDK that compiles ONNX or PyTorch models into highly optimized GPU engines. It applies layer fusion, kernel auto-tuning, FP16 / INT8 / FP8 quantization, and dynamic shape handling. On Jetson Orin and DGX hardware, TensorRT typically delivers 3 to 8× faster inference than the unoptimized PyTorch baseline. FI Tech ships every production CV model as a TensorRT engine pinned to the exact Jetson firmware and CUDA version of the deployment box — engines are not portable across GPU architectures, so we maintain a build matrix per customer hardware revision.