seo 网站两个ipphp源码搭建网站流程
seo 网站两个ip,php源码搭建网站流程,事业单位网站建设的作用,想做网站 优帮云Stable-Diffusion-v1-5-archive GPU多实例部署#xff1a;MIG切分容器化资源隔离最佳实践
1. 引言#xff1a;从单机独占到高效共享
如果你在团队里负责AI应用部署#xff0c;可能遇到过这样的头疼事#xff1a;一台昂贵的A100 GPU服务器#xff0c;只跑一个Stable Diff…Stable-Diffusion-v1-5-archive GPU多实例部署MIG切分容器化资源隔离最佳实践1. 引言从单机独占到高效共享如果你在团队里负责AI应用部署可能遇到过这样的头疼事一台昂贵的A100 GPU服务器只跑一个Stable Diffusion服务大部分时间GPU都在“摸鱼”资源利用率低得可怜。等到其他同事也想部署自己的模型时要么排队等待要么就得再买一台机器成本直接翻倍。更麻烦的是不同用户的应用混跑在同一张GPU上一个应用崩溃可能把整张卡拖死或者某个“吃资源大户”把显存占满导致其他人的服务直接挂掉。这种“一损俱损”的局面在需要稳定服务的生产环境里简直是灾难。今天要聊的就是如何用NVIDIA MIGMulti-Instance GPU技术和容器化方案把一张物理GPU“切”成多个独立的小GPU让Stable Diffusion v1.5 Archive这类文生图模型能够安全、高效地多实例部署。简单说就是让一台机器顶好几台用而且各个服务之间互不干扰。通过这套方案你可以提升资源利用率一张A100 80GB可以同时服务多个用户或应用实现资源隔离每个实例独占分配的GPU资源互不影响简化运维管理容器化部署一键启动、停止、迁移控制成本用更少的硬件支撑更多的业务需求下面我就带你一步步实现这个目标。2. 理解MIGGPU的“虚拟化”技术2.1 MIG是什么为什么需要它你可以把MIG想象成GPU版的“虚拟机”。传统上一张物理GPU是一个整体所有应用共享它的计算核心和显存。这就像一栋大别墅所有人都住在一个大开间里虽然空间大但互相干扰隐私性差。MIG技术则把这栋“别墅”隔成了多个带独立卫浴的“公寓套房”物理隔离每个MIG实例拥有独立的计算单元、显存和缓存性能保障分配给实例的资源是独占的不会被其他实例抢占故障隔离一个实例崩溃或异常不会影响其他实例的正常运行对于Stable Diffusion这类对显存要求较高的应用SD1.5推理通常需要4-8GB显存MIG特别有用。你可以根据需求创建不同规格的实例比如7GB实例适合标准的512x512图像生成14GB实例适合768x768或更高分辨率的生成多个小实例同时服务多个轻量级应用2.2 支持MIG的GPU型号不是所有GPU都支持MIG目前主要支持的是NVIDIA的安培架构Ampere及更新的数据中心级GPUGPU型号显存最大MIG实例数实例配置示例A100 80GB80GB7个7x10GB, 3x20GB1x10GB等A100 40GB40GB7个7x5GB, 3x10GB1x5GB等A3024GB4个4x6GB, 2x12GB等H10080GB7个7x10GB等注意消费级GPU如RTX 3090/4090不支持MIG。如果你的团队用的是这些卡可以考虑后面会提到的容器化资源限制方案作为替代。2.3 MIG实例的配置策略配置MIG实例有点像玩拼图需要根据GPU的物理结构来划分。以A100 80GB为例它有7个GPC图形处理集群每个GPC可以配置成不同大小的计算实例CI和对应的内存实例GI。常见的配置策略# 查看GPU支持的MIG配置 sudo nvidia-smi mig -lgip # 创建7个10GB的实例均分资源 sudo nvidia-smi mig -cgi 9,9,9,9,9,9,9 -C # 创建3个20GB和1个10GB的实例混合配置 sudo nvidia-smi mig -cgi 14,14,14,9 -C选择哪种配置取决于你的业务需求均分模式7x10GB适合多租户、多应用场景每个实例资源平等混合模式如3x20GB1x10GB适合有不同资源需求的混合负载对于Stable Diffusion v1.5 Archive我建议至少分配7GB以上的实例以确保稳定的推理性能。3. 容器化部署标准化与隔离3.1 为什么选择容器化即使有了MIG的硬件隔离我们还需要软件层面的隔离和管理便利性。容器化Docker在这方面提供了完美解决方案环境一致性每个容器包含完整的运行环境避免“在我机器上能跑”的问题快速部署镜像拉取、容器启动几分钟就能上线一个服务资源限制可以精确控制CPU、内存使用量版本管理不同版本的模型、依赖可以打包成不同镜像弹性伸缩结合Kubernetes可以轻松实现自动扩缩容3.2 构建Stable Diffusion v1.5 Archive的Docker镜像首先我们需要一个基础的Dockerfile来构建服务镜像# Dockerfile.sd15-archive FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04 # 安装系统依赖 RUN apt-get update apt-get install -y \ python3.10 \ python3-pip \ git \ wget \ rm -rf /var/lib/apt/lists/* # 设置工作目录 WORKDIR /app # 复制模型文件假设已下载到本地 COPY stable-diffusion-v1-5-archive /app/models/stable-diffusion-v1-5-archive # 复制Web UI代码 COPY web-ui /app/web-ui # 安装Python依赖 COPY requirements.txt /app/ RUN pip3 install --no-cache-dir -r requirements.txt # 暴露端口 EXPOSE 7860 # 启动命令 CMD [python3, /app/web-ui/app.py, --model-path, /app/models/stable-diffusion-v1-5-archive]对应的requirements.txt包含主要依赖# requirements.txt torch2.0.1 torchvision0.15.2 transformers4.30.2 diffusers0.19.3 accelerate0.21.0 gradio3.41.0 pillow10.0.0 safetensors0.3.3构建镜像的命令# 构建镜像 docker build -f Dockerfile.sd15-archive -t sd15-archive:latest . # 查看镜像 docker images | grep sd15-archive3.3 使用Docker Compose编排多实例当我们需要部署多个实例时手动管理每个容器会很麻烦。Docker Compose可以帮我们一键启动多个服务实例。创建docker-compose.yml文件# docker-compose.yml version: 3.8 services: sd15-instance-1: image: sd15-archive:latest container_name: sd15-instance-1 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia device_ids: [0] capabilities: [gpu] environment: - NVIDIA_VISIBLE_DEVICESMIG-GPU-0-0-0 # 指定MIG实例 - CUDA_VISIBLE_DEVICES0 ports: - 7861:7860 # 主机端口:容器端口 volumes: - ./instance-1/output:/app/output - ./instance-1/logs:/app/logs restart: unless-stopped sd15-instance-2: image: sd15-archive:latest container_name: sd15-instance-2 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia device_ids: [0] capabilities: [gpu] environment: - NVIDIA_VISIBLE_DEVICESMIG-GPU-0-0-1 # 第二个MIG实例 - CUDA_VISIBLE_DEVICES0 ports: - 7862:7860 volumes: - ./instance-2/output:/app/output - ./instance-2/logs:/app/logs restart: unless-stopped # 可以继续添加更多实例... sd15-instance-3: image: sd15-archive:latest container_name: sd15-instance-3 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia device_ids: [0] capabilities: [gpu] environment: - NVIDIA_VISIBLE_DEVICESMIG-GPU-0-0-2 - CUDA_VISIBLE_DEVICES0 ports: - 7863:7860 volumes: - ./instance-3/output:/app/output - ./instance-3/logs:/app/logs restart: unless-stopped启动所有实例# 启动所有服务 docker-compose up -d # 查看运行状态 docker-compose ps # 查看日志 docker-compose logs -f sd15-instance-14. 实战MIG容器化完整部署流程4.1 环境准备与检查在开始之前确保你的系统满足以下条件# 1. 检查GPU驱动和CUDA版本 nvidia-smi # 2. 检查Docker和NVIDIA Container Toolkit docker --version docker run --rm --gpus all nvidia/cuda:11.8.0-base nvidia-smi # 3. 检查MIG支持状态 nvidia-smi mig -lgi # 如果显示GPU in use by XID需要先停止所有使用GPU的进程 sudo systemctl stop docker # 临时停止Docker sudo nvidia-smi -pm 1 # 启用持久化模式4.2 步骤一配置MIG实例假设我们有一张A100 80GB GPU要创建3个实例供Stable Diffusion使用# 1. 启用MIG模式如果需要 sudo nvidia-smi -mig 1 # 2. 查看可用的配置 sudo nvidia-smi mig -lgip # 输出示例 # ----------------------------------------------------------------------------- # | MIG GPU Instance Profile : 0 1 2 3 4 5 6 7 | # | GPU Instance ID : N/A 0 1 2 3 4 5 6 | # | GPU Slice Count : 7 1 2 4 4 4 4 4 | # | GPU Memory Slice Count : 7 1 2 4 4 4 4 4 | # | GPU Memory [MiB] : 81250 11562 23125 46250 46250 46250 46250 46250 | # | CE Slice Count : 7 1 2 4 4 4 4 4 | # | ENC Slice Count : 4 0 0 0 0 0 0 0 | # | DEC Slice Count : 2 0 0 0 0 0 0 0 | # | OFA Slice Count : 2 0 0 0 0 0 0 0 | # | JPG Slice Count : 2 0 0 0 0 0 0 0 | # ----------------------------------------------------------------------------- # 3. 创建3个实例每个约14GB # 使用配置ID 1414GB实例 sudo nvidia-smi mig -cgi 14,14,14 -C # 4. 查看创建的实例 sudo nvidia-smi mig -lgi # 输出示例 # ----------------------------------------------------------------------------- # | GPU instances: | # | GPU Name Instance ID Placement Profile | # | MIG GI CI ID Start SMs Memory CE ENC DEC OFA JPG | # || # | 0 MIG 1g.14gb 0 0 14 14326M 2 0 0 0 0 | # | 0 MIG 1g.14gb 1 14 14 14326M 2 0 0 0 0 | # | 0 MIG 1g.14gb 2 28 14 14326M 2 0 0 0 0 | # -----------------------------------------------------------------------------4.3 步骤二准备模型和代码下载Stable Diffusion v1.5 Archive模型并准备Web界面# 1. 创建项目目录结构 mkdir -p sd15-mig-deployment cd sd15-mig-deployment mkdir -p models web-ui instance-{1,2,3}/{output,logs} # 2. 下载模型需要huggingface token # 方式一直接下载如果有权限 git lfs install git clone https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive models/stable-diffusion-v1-5-archive # 方式二从已有位置复制 # cp -r /path/to/existing/model models/stable-diffusion-v1-5-archive # 3. 准备Web UI代码 # 这里使用简化的Gradio界面示例 cat web-ui/app.py EOF import gradio as gr import torch from diffusers import StableDiffusionPipeline import json import os from datetime import datetime # 加载模型 model_path os.getenv(MODEL_PATH, /app/models/stable-diffusion-v1-5-archive) pipe StableDiffusionPipeline.from_pretrained( model_path, torch_dtypetorch.float16 if torch.cuda.is_available() else torch.float32, safety_checkerNone ) pipe pipe.to(cuda if torch.cuda.is_available() else cpu) def generate_image(prompt, negative_prompt, steps, guidance_scale, width, height, seed): # 设置随机种子 generator torch.Generator(cuda).manual_seed(seed) if seed ! -1 else None # 生成图像 with torch.autocast(cuda): image pipe( promptprompt, negative_promptnegative_prompt, num_inference_stepssteps, guidance_scaleguidance_scale, widthwidth, heightheight, generatorgenerator ).images[0] # 保存图像 timestamp datetime.now().strftime(%Y%m%d_%H%M%S) output_dir /app/output os.makedirs(output_dir, exist_okTrue) image_path os.path.join(output_dir, foutput_{timestamp}.png) image.save(image_path) # 返回图像和参数 params { prompt: prompt, negative_prompt: negative_prompt, steps: steps, guidance_scale: guidance_scale, width: width, height: height, seed: seed, image_path: image_path, timestamp: timestamp } return image, json.dumps(params, indent2) # 创建界面 with gr.Blocks(titleStable Diffusion v1.5 Archive) as demo: gr.Markdown(# Stable Diffusion v1.5 Archive) gr.Markdown(经典SD1.5文生图模型归档版本) with gr.Row(): with gr.Column(): prompt gr.Textbox( labelPrompt, placeholderEnter your prompt here..., lines3 ) negative_prompt gr.Textbox( labelNegative Prompt, valuelowres, blurry, extra fingers, lines2 ) with gr.Row(): steps gr.Slider(minimum1, maximum100, value20, step1, labelSteps) guidance_scale gr.Slider(minimum1, maximum20, value7.5, step0.5, labelGuidance Scale) with gr.Row(): width gr.Slider(minimum256, maximum1024, value512, step64, labelWidth) height gr.Slider(minimum256, maximum1024, value512, step64, labelHeight) seed gr.Number(value-1, labelSeed (-1 for random)) generate_btn gr.Button(Generate, variantprimary) with gr.Column(): output_image gr.Image(labelGenerated Image, typepil) output_json gr.JSON(labelGeneration Parameters) # 绑定事件 generate_btn.click( fngenerate_image, inputs[prompt, negative_prompt, steps, guidance_scale, width, height, seed], outputs[output_image, output_json] ) if __name__ __main__: demo.launch(server_name0.0.0.0, server_port7860) EOF4.4 步骤三构建并运行多实例更新Docker Compose配置指定MIG实例# 更新docker-compose.yml指定具体的MIG实例UUID version: 3.8 services: sd15-instance-1: image: sd15-archive:latest container_name: sd15-instance-1 runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICESMIG-41b7986f-2c97-5c23-a3e7-0a9e9a6c0115 # 实例1的UUID - MODEL_PATH/app/models/stable-diffusion-v1-5-archive ports: - 7861:7860 volumes: - ./instance-1/output:/app/output - ./instance-1/logs:/app/logs restart: unless-stopped sd15-instance-2: image: sd15-archive:latest container_name: sd15-instance-2 runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICESMIG-7edb6a3c-8a12-4e8b-b5f2-1c9d8e7f6a23 # 实例2的UUID - MODEL_PATH/app/models/stable-diffusion-v1-5-archive ports: - 7862:7860 volumes: - ./instance-2/output:/app/output - ./instance-2/logs:/app/logs restart: unless-stopped如何获取MIG实例的UUID# 查看所有MIG实例的UUID nvidia-smi -L # 输出示例 # GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-12345678-...) # MIG 1g.14gb Device 0: (UUID: MIG-41b7986f-2c97-5c23-a3e7-0a9e9a6c0115) # MIG 1g.14gb Device 1: (UUID: MIG-7edb6a3c-8a12-4e8b-b5f2-1c9d8e7f6a23) # MIG 1g.14gb Device 2: (UUID: MIG-9a2b3c4d-5e6f-7a8b-9c0d-1e2f3a4b5c6d)启动服务并验证# 构建镜像 docker-compose build # 启动服务 docker-compose up -d # 检查容器状态 docker-compose ps # 检查GPU分配 docker exec sd15-instance-1 nvidia-smi docker exec sd15-instance-2 nvidia-smi # 测试服务 curl http://localhost:7861 curl http://localhost:78624.5 步骤四验证资源隔离效果现在我们来验证MIG容器化是否真的实现了资源隔离# 1. 在每个实例中运行压力测试查看资源使用 # 实例1运行推理任务 docker exec sd15-instance-1 python -c import torch print(fInstance 1 GPU: {torch.cuda.get_device_name(0)}) print(fInstance 1 Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB) # 运行一个推理测试 from diffusers import StableDiffusionPipeline import torch pipe StableDiffusionPipeline.from_pretrained( /app/models/stable-diffusion-v1-5-archive, torch_dtypetorch.float16 ).to(cuda) # 测试显存占用 torch.cuda.empty_cache() print(fBefore inference: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated) image pipe(a cat sitting on a bench).images[0] print(fAfter inference: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated) # 2. 同时在实例2中运行相同的测试 # 打开另一个终端 docker exec sd15-instance-2 python -c import torch print(fInstance 2 GPU: {torch.cuda.get_device_name(0)}) print(fInstance 2 Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB) # 运行推理测试 from diffusers import StableDiffusionPipeline import torch pipe StableDiffusionPipeline.from_pretrained( /app/models/stable-diffusion-v1-5-archive, torch_dtypetorch.float16 ).to(cuda) torch.cuda.empty_cache() print(fBefore inference: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated) image pipe(a dog running in the park).images[0] print(fAfter inference: {torch.cuda.memory_allocated() / 1024**3:.2f} GB allocated) # 3. 查看系统级GPU使用情况 nvidia-smi # 4. 模拟一个实例崩溃看是否影响其他实例 # 在实例1中故意制造错误 docker exec sd15-instance-1 python -c import torch # 分配超出可用显存的大张量 try: x torch.zeros((10000, 10000, 100), devicecuda) except RuntimeError as e: print(fInstance 1 crashed: {e}) # 检查实例2是否还在正常运行 docker exec sd15-instance-2 python -c import torch print(fInstance 2 is still working: {torch.cuda.is_available()}) 如果一切正常你会看到每个实例只能看到分配给自己的GPU资源约14GB实例间的显存使用互不影响一个实例崩溃不会影响其他实例系统nvidia-smi显示多个MIG实例各自独立运行5. 生产环境优化建议5.1 性能监控与告警在生产环境中我们需要监控每个实例的运行状态# docker-compose.monitor.yml version: 3.8 services: # Prometheus监控 prometheus: image: prom/prometheus:latest container_name: prometheus ports: - 9090:9090 volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_data:/prometheus command: - --config.file/etc/prometheus/prometheus.yml - --storage.tsdb.path/prometheus - --web.console.libraries/etc/prometheus/console_libraries - --web.console.templates/etc/prometheus/console_templates - --storage.tsdb.retention.time200h - --web.enable-lifecycle restart: unless-stopped # Grafana可视化 grafana: image: grafana/grafana:latest container_name: grafana ports: - 3000:3000 volumes: - grafana_data:/var/lib/grafana - ./grafana/provisioning:/etc/grafana/provisioning environment: - GF_SECURITY_ADMIN_PASSWORDadmin restart: unless-stopped # Node Exporter系统指标 node-exporter: image: prom/node-exporter:latest container_name: node-exporter ports: - 9100:9100 volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - --path.procfs/host/proc - --path.rootfs/rootfs - --path.sysfs/host/sys - --collector.filesystem.mount-points-exclude^/(sys|proc|dev|host|etc)($$|/) restart: unless-stopped # NVIDIA GPU Exporter nvidia-gpu-exporter: image: nvidia/dcgm-exporter:latest container_name: nvidia-gpu-exporter runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICESall ports: - 9400:9400 restart: unless-stopped volumes: prometheus_data: grafana_data:对应的Prometheus配置# prometheus/prometheus.yml global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: prometheus static_configs: - targets: [localhost:9090] - job_name: node-exporter static_configs: - targets: [node-exporter:9100] - job_name: nvidia-gpu static_configs: - targets: [nvidia-gpu-exporter:9400] - job_name: sd15-instances static_configs: - targets: [sd15-instance-1:7860, sd15-instance-2:7860, sd15-instance-3:7860] metrics_path: /metrics5.2 负载均衡与高可用当有多个实例时可以通过负载均衡器分发请求# load_balancer.py from flask import Flask, request, jsonify import requests import random import time from collections import defaultdict app Flask(__name__) # 实例配置 INSTANCES [ {url: http://localhost:7861, weight: 1, healthy: True}, {url: http://localhost:7862, weight: 1, healthy: True}, {url: http://localhost:7863, weight: 1, healthy: True}, ] # 健康检查 def health_check(): for instance in INSTANCES: try: response requests.get(f{instance[url]}/health, timeout5) instance[healthy] response.status_code 200 except: instance[healthy] False # 负载均衡算法加权轮询 def get_next_instance(): healthy_instances [i for i in INSTANCES if i[healthy]] if not healthy_instances: return None total_weight sum(i[weight] for i in healthy_instances) r random.uniform(0, total_weight) current 0 for instance in healthy_instances: current instance[weight] if r current: return instance[url] return healthy_instances[0][url] app.route(/generate, methods[POST]) def generate(): # 获取下一个可用实例 instance_url get_next_instance() if not instance_url: return jsonify({error: No healthy instances available}), 503 try: # 转发请求 response requests.post( f{instance_url}/generate, jsonrequest.json, timeout60 ) return jsonify(response.json()), response.status_code except requests.exceptions.RequestException as e: return jsonify({error: fInstance error: {str(e)}}), 500 app.route(/health, methods[GET]) def health(): health_check() healthy_count sum(1 for i in INSTANCES if i[healthy]) return jsonify({ status: healthy if healthy_count 0 else unhealthy, healthy_instances: healthy_count, total_instances: len(INSTANCES) }) if __name__ __main__: # 启动时先做一次健康检查 health_check() app.run(host0.0.0.0, port5000)5.3 资源调度与弹性伸缩对于更复杂的生产环境可以考虑使用Kubernetes进行资源调度# k8s/sd15-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: sd15-archive spec: replicas: 3 selector: matchLabels: app: sd15-archive template: metadata: labels: app: sd15-archive spec: containers: - name: sd15-archive image: sd15-archive:latest resources: limits: nvidia.com/gpu: 1 memory: 16Gi cpu: 4 requests: nvidia.com/gpu: 1 memory: 14Gi cpu: 2 env: - name: NVIDIA_VISIBLE_DEVICES value: all ports: - containerPort: 7860 volumeMounts: - name: model-storage mountPath: /app/models - name: output-storage mountPath: /app/output volumes: - name: model-storage persistentVolumeClaim: claimName: model-pvc - name: output-storage persistentVolumeClaim: claimName: output-pvc nodeSelector: nvidia.com/gpu.product: A100-SXM4-80GB --- apiVersion: v1 kind: Service metadata: name: sd15-archive-service spec: selector: app: sd15-archive ports: - port: 7860 targetPort: 7860 type: LoadBalancer6. 替代方案无MIG环境的资源隔离如果你的GPU不支持MIG比如RTX 3090/4090仍然可以通过以下方式实现一定程度的资源隔离6.1 Docker GPU资源限制# docker-compose.no-mig.yml version: 3.8 services: sd15-instance-1: image: sd15-archive:latest container_name: sd15-instance-1 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] limits: cpus: 4.0 memory: 16G environment: - CUDA_VISIBLE_DEVICES0 - NVIDIA_VISIBLE_DEVICES0 # 通过环境变量限制显存使用 - PYTORCH_CUDA_ALLOC_CONFmax_split_size_mb:128 ports: - 7861:7860 volumes: - ./instance-1/output:/app/output restart: unless-stopped sd15-instance-2: image: sd15-archive:latest container_name: sd15-instance-2 runtime: nvidia deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] limits: cpus: 4.0 memory: 16G environment: - CUDA_VISIBLE_DEVICES0 - NVIDIA_VISIBLE_DEVICES0 - PYTORCH_CUDA_ALLOC_CONFmax_split_size_mb:128 ports: - 7862:7860 volumes: - ./instance-2/output:/app/output restart: unless-stopped6.2 使用CUDA MPSMulti-Process Service# 启动MPS服务 sudo nvidia-smi -i 0 -c EXCLUSIVE_PROCESS sudo nvidia-cuda-mps-control -d # 在容器中通过MPS共享GPU docker run --gpus device0 \ --runtimenvidia \ -e CUDA_VISIBLE_DEVICES0 \ -e CUDA_MPS_PIPE_DIRECTORY/tmp/nvidia-mps \ -e CUDA_MPS_LOG_DIRECTORY/tmp/nvidia-log \ -v /tmp/nvidia-mps:/tmp/nvidia-mps \ -v /tmp/nvidia-log:/tmp/nvidia-log \ sd15-archive:latest6.3 应用层资源管理在应用代码中主动管理资源# resource_manager.py import torch import psutil import threading import time class ResourceManager: def __init__(self, max_memory_gb8, max_utilization0.8): self.max_memory max_memory_gb * 1024**3 # 转换为字节 self.max_utilization max_utilization self.lock threading.Lock() def allocate_memory(self, required_gb): 分配显存如果超过限制则等待 with self.lock: # 检查当前显存使用 allocated torch.cuda.memory_allocated() reserved torch.cuda.memory_reserved() required_bytes required_gb * 1024**3 # 如果超过限制等待 while allocated required_bytes self.max_memory: print(fMemory limit reached ({allocated/1024**3:.1f}/{self.max_memory/1024**3:.1f} GB), waiting...) time.sleep(1) allocated torch.cuda.memory_allocated() # 分配显存 tensor torch.zeros((required_bytes // 4,), dtypetorch.float32, devicecuda) return tensor def cleanup(self): 清理显存 torch.cuda.empty_cache() def get_usage(self): 获取资源使用情况 gpu_memory torch.cuda.memory_allocated() / 1024**3 gpu_util torch.cuda.utilization() if hasattr(torch.cuda, utilization) else 0 cpu_percent psutil.cpu_percent() memory_percent psutil.virtual_memory().percent return { gpu_memory_gb: round(gpu_memory, 2), gpu_utilization: gpu_util, cpu_percent: cpu_percent, memory_percent: memory_percent } # 在Stable Diffusion中使用 resource_manager ResourceManager(max_memory_gb10) def generate_with_resource_control(prompt, **kwargs): # 检查资源 usage resource_manager.get_usage() if usage[gpu_memory_gb] 8: # 超过8GB等待 print(GPU memory high, waiting...) time.sleep(5) # 生成图像 result pipe(prompt, **kwargs) # 立即清理 resource_manager.cleanup() return result7. 总结通过MIG切分结合容器化部署我们成功实现了Stable Diffusion v1.5 Archive模型的多实例资源隔离部署。这套方案的核心价值在于资源利用率最大化一张A100 80GB GPU可以同时服务3-7个Stable Diffusion实例相比单实例部署资源利用率提升300%-700%。真正的故障隔离每个MIG实例拥有独立的计算和内存资源一个实例崩溃不会影响其他实例大大提高了系统稳定性。灵活的资源配置可以根据业务需求创建不同规格的实例比如为高分辨率生成分配更多资源为轻量级应用分配较少资源。标准化的部署流程容器化让部署、升级、迁移变得简单一致结合Docker Compose或Kubernetes可以实现自动化运维。成本效益显著对于中小团队或预算有限的项目用一张高端GPU服务多个用户比给每个用户配一张中端GPU更经济。在实际部署时有几点建议根据业务负载调整实例大小如果主要是512x512的生成7GB实例足够如果需要768x768或更高分辨率建议14GB以上。预留缓冲资源不要将GPU资源100%切分预留一部分给系统和其他进程。实施监控告警监控每个实例的资源使用、服务状态设置合理的告警阈值。定期维护定期更新驱动、CUDA版本、容器镜像确保安全性和性能。对于不支持MIG的GPU环境虽然无法实现硬件级的完全隔离但通过Docker资源限制、CUDA MPS和应用层管理仍然可以实现一定程度的资源控制和多实例部署。这套方案不仅适用于Stable Diffusion同样可以扩展到其他AI模型的多实例部署为团队提供高效、稳定、经济的AI服务基础设施。获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。