vLLM

AI & Machine Learning AI Infrastructure

Visit Website GitHub

About

High-throughput LLM serving engine with PagedAttention for efficient memory management.

Replaces

Groq

Groq

Partial

Fireworks AI

Fireworks AI

Full

Together AI

Together AI

Full

Replicate

Replicate

Partial

Anyscale

Anyscale

Partial

Hugging Face

Hugging Face

Partial