DeepSeek private compute foundation
A 6-node RTX 4090 cluster with 1152GB total VRAM for complex reasoning, coding assistance, R&D productivity, and controlled model publishing.
DeepSeek-R1 671B compute foundation
Technology / Private LLM · 某科技企业 · 2026
The customer needed a local large-model capability foundation for R&D, office productivity, business enablement, and intelligent applications.
The first priority was DeepSeek-R1 671B in full BF16 precision, targeting complex reasoning, code generation, R&D assistance, and decision analysis.
The customer also needed to balance VRAM capacity, inference concurrency, cost, and AI security risks such as prompt injection, sensitive-data leakage, API abuse, and auditability.
How Ouryun ships DeepSeek-R1 671B compute foundation for 某科技企业
The solution covered GPU compute, vLLM distributed inference, 25G networking, application publishing, an AI security gateway, and model security assessment.
6-node GPU cluster
Six 8-GPU RTX 4090 servers provide 48 GPUs and 1152GB total VRAM for the 671B BF16 model.
vLLM distributed inference
PyTorch and vLLM provide the runtime, with PagedAttention improving memory management and throughput.
25G high-speed network
Server fiber ports connect directly to 25G switches and then to the enterprise core network.
AI security gateway
A reverse-proxy gateway publishes the model with authentication, rate limits, allow/deny lists, anomaly detection, and call audit.
Model security assessment
Assessment covers content safety, privacy leakage, bias, reasoning, language understanding, and multi-turn dialogue.
From connect to ship, in four steps
Compute
Deliver 6 GPU servers, forming a 48-GPU, 1152GB VRAM pool.
Inference
Deploy Ubuntu 22.04, PyTorch, vLLM, DeepSeek-R1 671B, and Open-Web / ChatBox access.
Network
Configure 25G switching, core network connection, API access, and internal access paths.
Security
Launch AI security gateway and assessment mechanisms to reduce post-release security and compliance risk.
Customer profile
A 6-node, 48-GPU private compute foundation for DeepSeek-R1 671B, including API publishing, AI security governance, and model security assessment.
Needs
The customer needed a local large-model capability foundation for R&D, office productivity, business enablement, and intelligent applications. The first priority was DeepSeek-R1 671B in full BF16 precision, targeting complex reasoning, code generation, R&D assistance, and decision analysis. The customer also needed to balance VRAM capacity, inference concurrency, cost, and AI security risks such as prompt injection, sensitive-data leakage, API abuse, and auditability.
Solution
The solution covered GPU compute, vLLM distributed inference, 25G networking, application publishing, an AI security gateway, and model security assessment.
Impact
671B Full-size model target; 1152GB Total GPU VRAM; 48 GPU count; 10 token/s Reference per-user speed at 24 concurrency
Numbers that prove value
Bring this capability into your business
More Ouryun case studies
Manufacturing AI quality inspection
An AOI + AI inspection appliance for PCB production that performs AI-based second-pass review on AOI NG images and reduces manual review load.
Domestic AI inference resource pool
A unified domestic AI inference resource pool for a large manufacturing group, supporting multiple large models and group-level AI platform operations.