Run a vLLM Server on HF Jobs in One Command

7.2 relevance

One-command vLLM server deployment is highly actionable for ML engineers.

AI/ML huggingface.co

Run a vLLM Server on HF Jobs in One Command

Summary

Hugging Face now lets you spin up a private, OpenAI-compatible vLLM server with a single `hf jobs run` command, using the official `vllm/vllm-openai` image and pay-per-second billing on A10g GPUs ($1.50/hr). The endpoint is gated by HF tokens, supports curl or the OpenAI Python client, and auto-stops via a configurable `--timeout` flag. This eliminates manual server provisioning and Kubernetes overhead for ephemeral LLM workloads like testing, evals, or batch generation.

Author

Quentin Gallouédec