← Work2026某科技企业DeepSeek-R1 671B compute foundation

DeepSeek private compute foundation

A 6-node RTX 4090 cluster with 1152GB total VRAM for complex reasoning, coding assistance, R&D productivity, and controlled model publishing.

Technology / Private LLMAI 算力

Goal

DeepSeek-R1 671B compute foundation

Technology / Private LLM · 某科技企业 · 2026

The customer needed a local large-model capability foundation for R&D, office productivity, business enablement, and intelligent applications.

The first priority was DeepSeek-R1 671B in full BF16 precision, targeting complex reasoning, code generation, R&D assistance, and decision analysis.

The customer also needed to balance VRAM capacity, inference concurrency, cost, and AI security risks such as prompt injection, sensitive-data leakage, API abuse, and auditability.

Solution

How Ouryun ships DeepSeek-R1 671B compute foundation for 某科技企业

The solution covered GPU compute, vLLM distributed inference, 25G networking, application publishing, an AI security gateway, and model security assessment.

6-node GPU cluster

Six 8-GPU RTX 4090 servers provide 48 GPUs and 1152GB total VRAM for the 671B BF16 model.

↗

vLLM distributed inference

PyTorch and vLLM provide the runtime, with PagedAttention improving memory management and throughput.

↗

25G high-speed network

Server fiber ports connect directly to 25G switches and then to the enterprise core network.

↗

AI security gateway

A reverse-proxy gateway publishes the model with authentication, rate limits, allow/deny lists, anomaly detection, and call audit.

↗

Model security assessment

Assessment covers content safety, privacy leakage, bias, reasoning, language understanding, and multi-turn dialogue.

↗

Delivery path

From connect to ship, in four steps

Compute

Deliver 6 GPU servers, forming a 48-GPU, 1152GB VRAM pool.

↗

Inference

Deploy Ubuntu 22.04, PyTorch, vLLM, DeepSeek-R1 671B, and Open-Web / ChatBox access.

↗

Network

Configure 25G switching, core network connection, API access, and internal access paths.

↗

Security

Launch AI security gateway and assessment mechanisms to reduce post-release security and compliance risk.

↗

Full project record

Customer profile

A 6-node, 48-GPU private compute foundation for DeepSeek-R1 671B, including API publishing, AI security governance, and model security assessment.

Needs

The customer needed a local large-model capability foundation for R&D, office productivity, business enablement, and intelligent applications. The first priority was DeepSeek-R1 671B in full BF16 precision, targeting complex reasoning, code generation, R&D assistance, and decision analysis. The customer also needed to balance VRAM capacity, inference concurrency, cost, and AI security risks such as prompt injection, sensitive-data leakage, API abuse, and auditability.

Solution

The solution covered GPU compute, vLLM distributed inference, 25G networking, application publishing, an AI security gateway, and model security assessment.

Impact

671B Full-size model target; 1152GB Total GPU VRAM; 48 GPU count; 10 token/s Reference per-user speed at 24 concurrency

Impact

Numbers that prove value

Full-size model target

0GB

Total GPU VRAM

GPU count

0 token/s

Reference per-user speed at 24 concurrency

DeepSeek-R1 671B compute foundation

Bring this capability into your business

Key capabilities6-node GPU clustervLLM distributed inference25G high-speed networkAI security gateway

Get an Enterprise AI Implementation Plan→

More work

More Ouryun case studies

View all cases →

Manufacturing AI quality inspection

An AOI + AI inspection appliance for PCB production that performs AI-based second-pass review on AOI NG images and reduces manual review load.

View case→

Domestic AI inference resource pool

A unified domestic AI inference resource pool for a large manufacturing group, supporting multiple large models and group-level AI platform operations.

View case→

OURYUN · Connect · Govern · Deploy · Ship ·