DeepSeek

用DeepSeek建個自己的AI Server

DeepSeek是Opence Source，而且把ChatGPT的4T知識蒸餾過一番，
壓縮得更小，我們何不設個獨立的Server自己建個DeepSeek AI來用用看。

(版貓2025/04/16) 關於AI agent
慧穩科技報價大語言模型平台『WinChat』參考如下:

WinChat軟硬體方案

1.WinChat軟體 $600,000-
私有LLM WinChat軟體／永久授權
-LLM模型支援
(1) llama3.x 70B以下
(2) DeepSeek RI/V3 70B以下
(3）中英embedding model
(4）中英re-Ranking model
- RAG引擎
- Web UI介面
- 私有雲架構
- 管理員權限功能
- 支援檔案：Documcnts(PDF, DOCS)、Tables (XLSX)、Pictures (JPEG, JPG, PNG, TIP, GIF)
- 作業系統：Linux Ubuntu
- 系統服務：Docker
- Al對談API

2.電腦系統 $500,000
AITAS-TA703(2U Remote Workstation)
- Server Grade CPU: AMD 16C/32T 4.5G 64MB
- 32GB DDR5-4800 2Rx8 LP (10x4)ECC*4.
- 960GB NVMePCIe4x4 M.2 22x110mm*2..
- 1U 2000W 90-264Vac/47-63Hz.
- NVIDIA RTX4000 ADA Gen 20GBGDDR6*2..
- Storage HDD 1.9TB SATA 6Gb/s TLC 2.5"*2.

3―系統導人顧問服務 $100,000-
WinChat軟體導入顧問
- 8小時教學服務
- 40小時線上導入服務

備註：效能限制詳解
限制一：Apple GPU 無法啟用主流LLM推論加速
- 無CUDA->無法使用主流加速推論引擎(如vLLM, exllama, Triton)
- Metal加速尚未支援大型LLM推論(僅支援簡單模型轉譯)
限制二：缺乏獨立VRAM記憶體管理
- 70B模型效快速、穩定的VRAM管理
- Unified Memory是整體共用->記憶體thrashing導致頻繁緩衝、效能波動
限制三：無多卡並行/模型切分能力
- 伺服器及GPU (如A100, H100, RTX 6000 ADA)可以切分模型、並行處理

▼1 報價內容說明

報價內容說明

這份報價單主要是針對一套私有大型語言模型（LLM）應用系統及其硬體設備與導入服務的整體解決方案。以下是產品內容功能說明與報價單逐項解析：
1. WinChat軟體（$600,000）

產品功能說明：
WinChat是一款私有LLM對話軟體，提供永久授權，具備多種先進的語言模型支援與相關功能，適合企業內部部署使用，確保資料安全與私有化。
LLM模型支援
支援多種大型語言模型，包括：

llama3.x 70B以下版本（70B指模型參數規模，70億以下）
DeepSeek RI/V3 70B以下
中英文embedding模型（用於文字向量化表示）
中英文re-Ranking模型（用於結果排序優化）

RAG引擎
Retrieval-Augmented Generation（檢索增強生成）引擎，結合檢索技術與生成模型，提升回答的準確性與相關性。
Web UI介面
提供網頁使用者介面，方便用戶透過瀏覽器操作對話系統。
私有雲架構
支持在企業自有雲端或私有伺服器上部署，保障資料隱私與安全。
管理員權限功能
具備管理員角色設定與權限控管，方便系統維護與使用者管理。
支援檔案類型
可處理多種檔案格式，包括：

Documents：PDF、DOCS
Tables：XLSX
Pictures：JPEG、JPG、PNG、TIP、GIF

作業系統與系統服務

運行於Linux Ubuntu系統
使用Docker容器技術，方便部署與維護

AI對談API
提供API介面，方便與其他系統整合或開發二次應用。
2. 電腦系統（$500,000）

產品功能說明：
提供一台高效能遠端工作站（AITAS-TA703 2U機架式伺服器），專為AI模型推論與運算設計。
CPU
AMD 16核心32執行緒，4.5GHz主頻，64MB快取，具備伺服器級效能。
記憶體
32GB DDR5-4800 ECC記憶體，具備錯誤校正能力，穩定性高。
儲存裝置

960GB NVMe PCIe 4.0固態硬碟（M.2介面）x2，提供高速讀寫
1.9TB SATA硬碟 x2，作為資料儲存備用

電源
1U 2000W電源，支援90-264Vac電壓與47-63Hz頻率，確保穩定供電。
GPU
兩張NVIDIA RTX4000 ADA Gen顯示卡（20GB GDDR6），支援深度學習推論加速。
3. 系統導入顧問服務（$100,000）

服務內容說明：
提供WinChat軟體的專業導入與培訓服務，確保系統順利上線與使用。
8小時教學服務
現場或線上教學，說明軟體操作與功能。
40小時線上導入服務
包括系統安裝、配置、問題排除與優化指導。
備註：效能限制詳解

這部分說明了在不同硬體環境下，特別是Apple GPU平台，使用大型語言模型時可能遇到的限制：
限制一：Apple GPU無法啟用主流LLM推論加速
Apple GPU缺乏CUDA支持，無法使用像vLLM、exllama、Triton等主流加速推論引擎，Metal加速目前也不支援大型LLM推論。
限制二：缺乏獨立VRAM記憶體管理
大型模型（70B）需要快速且穩定的VRAM管理，Apple的Unified Memory會導致記憶體頻繁切換（thrashing），造成效能不穩。
限制三：無多卡並行/模型切分能力
高階伺服器GPU（如NVIDIA A100、H100、RTX 6000 ADA）支持多卡並行與模型切分，提升推論效率，Apple GPU不具備此能力。

綜合來看，這份報價單提供的是一套企業級私有LLM對話系統解決方案，包含軟體授權、專用伺服器硬體，以及專業導入與培訓服務，適合需要高效能、私有化部署且支援多種文件格式的應用場景。效能限制備註則提醒客戶選擇硬體時需注意GPU加速與記憶體管理的差異。

2025/0317一收到劉士豪line說: 那個我跟你說的MIRROR案子，我們學生學著做RAG跟MCP，公司有興趣嗎？做個潤滑油的AI AGENT之類的。
洪哲文回說：有啊！很有興趣，怎樣進行?

參考資料
MEDIM: AIagent關於RAG Retrieval-Augmented Generation
20230919 關於RAG什麼是 Retrieval-Augmented Generation (RAG)？
關於MCP 什麼是MCP？讓AI成為真正的AI Agent，而不只是「說話」的機器傳統的 AI 助理（像 ChatGPT 或 Claude）擅長對話與文字生成，但它們最大的限制是無法直接存取使用者的電腦、資料庫或網路服務。
關於MCP 什麼是 MCP? 為什麼 MCP 這麼熱門? MCP 的好處在哪?
MCP是啥？技术原理是什么？一个视频搞懂MCP的一切。Windows系统配置MCP，Cursor,Cline 使用MCP

李宏毅台大電機工程學系教授【生成式AI時代下的機器學習(2025)】第二講：一堂課搞懂 AI Agent 的原理 (AI如何透過經驗調整行為、使用工具和做計劃)
Wiki:
🗒️RAG檢索增強生成 Retrieval-augmented generation
🗒️MCP模型上下文協定 Model Context Protocol

約好了2025/0325二中午來林口午餐談！

2025/02/17:
我想建一部Server然後下載DeepSeek安裝成自有的AI系統。請建議Server的規格應如何？還有OS用ubunt可以嗎？以及去那裡下載DeepSeek和如何安裝使用？ Perplexity答：

To build a server and install DeepSeek as your own AI system, here's a breakdown of the recommended specifications, OS considerations, and installation process:

Server Specifications

Memory: The amount of memory affects the size of the models you can load. For CPU usage, aim for 16GB or more. If using a GPU, the VRAM size is critical.
Storage: Large models require significant SSD/hard drive space. The complete DeepSeek-R1-671b can take up to 404GB, while compressed versions range from 1GB to 40GB.
GPU: DeepSeek R1 NVIDIA Enterprise Platform is designed for NVIDIA GPUs, providing excellent computing performance and cost-effectiveness. Below are some DeepSeek models and their corresponding VRAM recommendations:

1.5B Qwen DeepSeek R1: 4GB VRAM
7B Qwen DeepSeek R1: 12GB VRAM
8B Llama DeepSeek R1: 16GB VRAM
14B Qwen DeepSeek R1: 24GB VRAM
32B Qwen DeepSeek R1: 48GB VRAM (single card)
70B Llama DeepSeek R1: 80GB+ VRAM (multi-card)

Operating System

Yes, you can use Ubuntu.

Downloading and Installing DeepSeek
Install Ollama: Ollama is an open-source framework that simplifies running and managing large language models (LLMs) locally. It supports models like Llama 3, Phi 3, Mistral, and Gemma.

Download Ollama from the official website for your OS (macOS, Linux, or Windows).
For Windows, after installation, a llama icon should appear in the bottom right corner.

Download DeepSeek:

Open your command prompt1. In Windows, search for "cmd" or "Command Prompt".
Use the following command to install DeepSeek, selecting a version based on your VRAM1:

ollama run deepseek-r1:1.5b (for the 1.5B model)
ollama run deepseek-r1:7b (for the 7B model)
ollama run deepseek-r1:8b (for the 8B model)
ollama run deepseek-r1:14b (for the 14B model)
ollama run deepseek-r1:32b (for the 32B model)
ollama run deepseek-r1:70b (for the 70B model)

LM Studio: Alternatively, you can use LM Studio to install DeepSeek. Download it from the LM Studio official website, which supports Windows, Mac, and Linux.

Keep in mind that DeepSeek is designed for enterprise-level AI applications, emphasizing natural language processing and AI reasoning. It can be deployed on-premises to ensure data ownership and compliance.
參考資料
★.DeepSeek 教學 | DeepSeek R1 PC 安裝教學，離線 AI 任用更安全！
如何在電腦本地安裝deepseek，建立私人的AI資料庫
DeepSeek 企業版 AI 本地部署完全指南與最佳實踐
Remote Desktop Solution
在家免上網就能用 DeepSeek R1，教你怎麼使用 LM Studio 無腦安裝部屬本地使用

▼2 其他資料

Others

Lorem ipsum dolor sit amet.
Lorem ipsum dolor sit amet.

| https://ppt.cc/fR65lx |