Background

TT-Forge™

TT-Forge is Tenstorrent’s MLIR-based compiler stack that lets you compile, optimize, debug, and extend models.

Now in public beta, TT-Forge is yours to shape. Pitch a feature, prototype it, or grab a bounty.


TT-Forge™

Engineered for Innovation

Designed for open-source flexibility, TT-Forge connects with OpenXLA, MLIR, ONNX, TVM, PyTorch, and TensorFlow. TT-Forge offers a modular foundation for pushing AI workloads on custom silicon. It lowers models into optimized IRs for execution on TT-NN and TT-Metalium, Tenstorrent’s low-level AI hardware SDK.

Engineered for Innovation

Why MLIR?

MLIR is modular, extensible, and enables multi-level abstraction. It spans multiple frameworks, supports custom dialects, and handles everything from AI to HPC. Thanks to MLIR’s flexible design, TT-Forge can quickly adopt new ops, frameworks, and hardware targets. As the MLIR ecosystem expands, TT-Forge evolves right alongside it.

Why MLIR?
Bring Your Model from Anywhere*
(almost)
TT-XLA

TT-XLA

Single chip projects with JAX and PyTorch

TT-XLA is Tenstorrent’s PJRT-based bridge for compiling and running models from JAX and PyTorch on Tenstorrent hardware. It supports just-in-time (JIT) compilation through StableHLO, feeding into TT-MLIR for optimized execution.

With native support in JAX and integration through PyTorch/XLA, TT-XLA compiles models to run on Tenstorrent hardware—with minimal changes to your existing code and support for multi-chip execution.

TT-Forge-FE

TT-Forge-FE

Multi-chip projects with ONNX and TensorFlow

TT-Forge-FE is Tenstorrent’s framework agnostic frontend that’s designed to optimize and transform computational graphs for deep learning models. Powered by TT-TVM, it supports the ingestion of ONNX, TensorFlow and similar ML frameworks–making it easier to bring your models to Tenstorrent hardware efficiently.

Features
Optimized compilation and custom dialects (TTIR, TTNN, TTKernel) enable efficient execution, maximizing inference performance on Tenstorrent hardware. Simplified performance optimization via tt-explorer.
Performance
Optimized compilation and custom dialects (TTIR, TTNN, TTKernel) enable efficient execution, maximizing inference performance on Tenstorrent hardware. Simplified performance optimization via tt-explorer.
TT-Forge™ doesn’t just compile models – it understands the hardware they run on. With custom dialects like TTIR and a compiler stack built around TT-MLIR, it’s optimized for Tenstorrent’s architecture, resulting in high utilization, efficient memory access, and scalable performance across chips.
Hardware-Aware Compilation
TT-Forge™ doesn’t just compile models – it understands the hardware they run on. With custom dialects like TTIR and a compiler stack built around TT-MLIR, it’s optimized for Tenstorrent’s architecture, resulting in high utilization, efficient memory access, and scalable performance across chips.
Tenstorrent’s toolchain simplifies ML model compilation, optimization, and execution on Tenstorrent hardware. Key tools include: TT-Blacksmith (ready-to-run training examples), TT-Explorer (a visual performance analyzer for models), and TT-NPE (a network-on-chip (NoC) simulator and profiler).
Tools
Tenstorrent’s toolchain simplifies ML model compilation, optimization, and execution on Tenstorrent hardware. Key tools include: TT-Blacksmith (ready-to-run training examples), TT-Explorer (a visual performance analyzer for models), and TT-NPE (a network-on-chip (NoC) simulator and profiler).

Want to learn more about TT-Forge?