CV

TensorRT

Also known as: TensorRT-LLM

TensorRT is NVIDIA's inference SDK that compiles ONNX or PyTorch models into highly optimized GPU engines. It applies layer fusion, kernel auto-tuning, FP16 / INT8 / FP8 quantization, and dynamic shape handling. On Jetson Orin and DGX hardware, TensorRT typically delivers 3 to 8× faster inference than the unoptimized PyTorch…

Definition

TensorRT is NVIDIA's inference SDK that compiles ONNX or PyTorch models into highly optimized GPU engines. It applies layer fusion, kernel auto-tuning, FP16 / INT8 / FP8 quantization, and dynamic shape handling. On Jetson Orin and DGX hardware, TensorRT typically delivers 3 to 8× faster inference than the unoptimized PyTorch baseline. FI Tech ships every production CV model as a TensorRT engine pinned to the exact Jetson firmware and CUDA version of the deployment box — engines are not portable across GPU architectures, so we maintain a build matrix per customer hardware revision.

← Back to glossary

Where AI Meets the Real World

Send us a message. Let's help you achieve better outcomes.

Definition

Related terms

Where AI Meets the Real World