Title: 离线安装Ubuntu显卡机  
CreateTime: 2026-07-04 14:51:33  
UpdateTime: 2026-07-04 17:46:20  
CategoryName: Web  
---

# 说明
部分英伟达显卡机是不能联网的,需要离线安装服务器环境,这里使用[ubuntu-22.04.5-live-server-amd64.iso](https://mirrors.tuna.tsinghua.edu.cn/ubuntu-releases/22.04/ubuntu-22.04.5-live-server-amd64.iso)  , 需要有一台联网服务器(或者虚拟机)下载需要的deb包,拷贝到显卡机,进行安装

**注意:使用初始化干净的Ubuntu系统,已经安装的deb包,`apt-get install -y --download-only`不会再次下载**

# 基础环境
```shell
## 使用国内的apt源,这里使用 https://mirrors.aliyun.com
sed -i 's/http:\/\/cn.archive.ubuntu.com/https:\/\/mirrors.aliyun.com/g' /etc/apt/sources.list
sed -i 's/http:\/\/security.ubuntu.com/https:\/\/mirrors.aliyun.com/g' /etc/apt/sources.list

## 配置docker仓库
# 卸载旧版本
apt remove docker docker-engine docker.io containerd runc
# 更新软件源 
sudo apt update 
# 安装所需依赖 
sudo apt -y install apt-transport-https ca-certificates curl software-properties-common 

# 安装 Docker GPG 证书 
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker-aliyun.gpg
# 新增 Docker 软件源信息 
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker-aliyun.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# 安装 nvidia-container-runtime GPG 证书 
curl -fsSL https://mirrors.ustc.edu.cn/nvidia-container-runtime/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
# 新增 nvidia-container-runtime 软件源信息 
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn/nvidia-container-runtime/stable/deb/$(dpkg --print-architecture) /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

```

## 下载依赖
```shell
apt clean
apt update

## 下载docker依赖包
apt-get install -y --download-only docker-ce docker-ce-cli containerd.io docker-compose-plugin

## 下载nvidia-container-runtime依赖包
apt-get install -y --download-only nvidia-container-runtime

## 安装显卡驱动的依赖,linux-modules-extra-5.15.0-119-generic $(uname -sr)需要匹配实际的内核版本
apt-get install -y --download-only  build-essential libboost-program-options-dev cmake zip unzip rdma-core infiniband-diags ibverbs-providers libibverbs-dev dpkg-dev perl linux-modules-extra-5.15.0-119-generic 

###下载的deb包都在 /var/cache/apt/archives
mkdir -p ./deps/
cp -rf /var/cache/apt/archives/*.deb ./deps/


## 下载并安装以下包 

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvidia-fabricmanager_580.167.08-1ubuntu1_amd64.deb

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/libnvidia-nscq_580.167.08-1ubuntu1_amd64.deb

##B300需要
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvlsm_2025.10.14-1_amd64.deb

sudo depmod -a
sudo modprobe ib_umad
#sudo modprobe rdma_ucm
#sudo modprobe ib_uverbs
#sudo modprobe ib_verbs
#sudo modprobe ib_core
lsmod | grep ib

# nvidia-fabricmanager 服务
systemctl start nvidia-fabricmanager
systemctl enable nvidia-fabricmanager

## 下载显卡驱动,可能需要梯子使用浏览器下载
wget https://us.download.nvidia.com/tesla/580.167.08/NVIDIA-Linux-x86_64-580.167.08.run

wget https://developer.download.nvidia.com/compute/cuda/13.0.3/local_installers/cuda_13.0.3_580.126.20_linux.run


## 压测工具
#gpu-burn: https://github.com/wilicc/gpu-burn
#p2pBandwidthLatencyTest: https://github.com/NVIDIA/cuda-samples
#nvbandwidth: https://github.com/NVIDIA/nvbandwidth

```

# sglang运行GLM5.2-NVFP4
## docker-compose.yaml
```yaml
services:
  sglang-glm52-nvfp4:
    image: lmsysorg/sglang:dev-cu13-glm52-nvfp4
    container_name: sglang-glm52-nvfp4
    restart: unless-stopped
    network_mode: host
    privileged: true
    runtime: nvidia
    ipc: host
    shm_size: 128g
    #ports:
    #  - "8000:8000"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      stack:
        soft: 67108864
        hard: 67108864
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      # 宿主机模型路径改成你实际存放GLM5.2-NVFP4的目录
      - /data/.cache/huggingface/hub/models--nvidia--GLM-5.2-NVFP4/snapshots/aec724e8c7b8ee9db3b48c01c320f63f9cdaf8aa:/app/models/GLM-5.2-NVFP4
	  
    command: >
      sglang serve
      --model-path /app/models/GLM-5.2-NVFP4
      --served-model-name GLM5.2
      --tensor-parallel-size 8
      --quantization modelopt_fp4
      --tool-call-parser glm47
      --reasoning-parser glm45
      --trust-remote-code
      --chunked-prefill-size 65536
      --mem-fraction-static 0.85
      --host 0.0.0.0
      --port 8000
      --speculative-algorithm NEXTN
      --speculative-num-steps 3 
      --speculative-eagle-topk 1
      --speculative-num-draft-tokens 4
```
- `--model-path`: **模型本地路径**，指定加载大模型权重文件所在目录
- `--served-model-name`: **对外服务模型名**，接口请求时填写的模型名称，自定义别名
- `--tensor-parallel-size 8`: **张量并行数**，拆分模型权重分到 8 张 GPU 运行，多卡均分负载
- `--quantization modelopt_fp4`: **量化方案**，使用 NVIDIA ModelOpt FP4 4 比特权重量化，大幅省显存
- `--tool-call-parser glm47`: **工具调用解析器**，适配 GLM47 系列格式解析函数调用、插件调用逻辑
- `--reasoning-parser glm45`: **思维链推理解析器**，适配 GLM45 格式解析模型内部思考 / 推理内容
- `--trust-remote-code`: **信任远程代码**，自动执行模型仓库内自定义建模代码，加载非标准架构模型必备
- `--chunked-prefill-size 65536`: **分块预填充长度**，超长上下文分块编码，支持更大输入文本，数值越大支持越长 prompt
- `--mem-fraction-static 0.85`: **静态显存占用比例**，限制模型权重最多占用单卡 85% 显存，预留显存给推理 / 缓存
- `--host 0.0.0.0`: **监听地址**，允许局域网 / 外网所有 IP 访问服务
- `--port 8000`: **服务端口**，API 接口默认监听 8000 端口
- `--speculative-algorithm NEXTN`: **投机解码算法**，选用 NextN 极速投机推理算法，加速生成速度
- `--speculative-num-steps 3`: **投机迭代步数**，单次推理执行 3 轮投机校验
- `--speculative-eagle-topk 1`: **Eagle 采样候选数**，仅取 Top1 最优候选 token，提升稳定性
- `--speculative-num-draft-tokens 4`: **预生成草稿 token 数**，一次提前预推 4 个预测 token，显著提升生成吞吐

## 压测

```shell
# 设置 huggingface 国内镜像
export HF_ENDPOINT=https://hf-mirror.com

# --max-concurrency 1 单用户压测
python3 -m sglang.bench_serving   --backend sglang   --model nvidia/GLM-5.2-NVFP4   --dataset-name random   --random-input-len 307680  --random-output-len 2048   --num-prompts 10   --max-concurrency 1   --request-rate inf --host 192.168.0.12 --port 8000 --served-model-name GLM5.2 --random-range-ratio 1.0
```