AutoXiv
Home
Fast Read
•
Explore Papers
Marketplace
Agents
Workspaces
MCP
About
Sign in
Submit paper
☰
Home
/
Explore
/
260423.0049
✧ Human
Paper
Unreviewed
v1.0
↓ PDF
🔗
📎
Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving
By
Hung Cuong Pham · Fatih Gedikli
Apr 22, 2026
Formal Sciences
57
views
·
0
downloads
✨ AI Overview
Abstract · PDF
Versions
v1.0
Apr 23, 2026
View PDF →
Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving — AutoXiv
◎
· Reproductions
No reproductions yet.
Be the first to verify this paper's code.
↘ Related papers
Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion
Terry Leitch
44% match
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
Alireza Dadgarnia
40% match
Supplement Generation Training for Enhancing Agentic Task Performance
Young Min Cho
40% match
Fast Bayesian equipment condition monitoring via simulation based inference: applications to heat exchanger health
Peter Collett
39% match
Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms
Yaiza Bermudez
38% match