陕西手机网站建设公司排名汉子由来 外国人做的网站
陕西手机网站建设公司排名,汉子由来 外国人做的网站,鲜花网站建设项目概述,wordpress 调用编辑器LiveTalk作为GAIR-NLP推出的实时数字人生成框架#xff0c;能够基于单张参考图片和文本生成高逼真度的说话数字人视频。本文基于nvcr.io/nvidia/pytorch:23.08-py3镜像#xff0c;针对5090显卡环境#xff0c;提供一套一键部署脚本#xff0c;同时解决了高分辨率下KV Cache…LiveTalk作为GAIR-NLP推出的实时数字人生成框架能够基于单张参考图片和文本生成高逼真度的说话数字人视频。本文基于nvcr.io/nvidia/pytorch:23.08-py3镜像针对5090显卡环境提供一套一键部署脚本同时解决了高分辨率下KV Cache溢出的问题将视频时长优化为2秒无需修改源码并将分辨率稳定在512x512。一、环境准备1. 基础环境显卡NVIDIA RTX 5090镜像nvcr.io/nvidia/pytorch:23.08-py3系统LinuxUbuntu/CentOS均可Python版本3.10镜像内置2. 核心依赖说明ffmpeg音视频处理基础工具transformers4.44.0解决版本兼容问题opencv-python-headless4.9.0.80无GUI的OpenCV版本避免依赖冲突flash-attn加速注意力计算提升推理速度gradio搭建可视化Web界面edge-tts文本转语音生成音频二、一键部署脚本以下是完整的一键部署脚本run_livetalk.sh包含环境配置、代码拉取、模型下载、配置修复、应用启动全流程#!/bin/bash# # Script Name: run_livetalk.sh# Description: One-click deployment for LiveTalk.# Resolution: 512x512.# Duration: Reduced to 2 seconds to prevent KV Cache overflow# without modifying source code.# set-e# ------------------------------------------------------------------------------# Configuration Variables# ------------------------------------------------------------------------------REPO_URLhttps://github.com/GAIR-NLP/livetalk.gitPROJECT_DIRlivetalkOMNI_AVATAR_REPOhttps://github.com/Omni-Avatar/OmniAvatarCONFIG_FILEconfigs/causal_inference.yaml# ------------------------------------------------------------------------------# Helper Functions# ------------------------------------------------------------------------------log_info(){echo[INFO]$1}log_error(){echo[ERROR]$12}# ------------------------------------------------------------------------------# Step 1: System Dependencies# ------------------------------------------------------------------------------log_infoInstalling system dependencies (ffmpeg) via apt...ifcommand-vapt-get/dev/null;thenapt-getupdateapt-getinstall-yffmpegfi# ------------------------------------------------------------------------------# Step 2: Clone Repositories# ------------------------------------------------------------------------------if[-d$PROJECT_DIR];thenlog_infoProject directory exists. Updating...cd$PROJECT_DIRgitpull||log_infoGit pull failed, continuing...elselog_infoCloning LiveTalk repository...gitclone$REPO_URLcd$PROJECT_DIRfiif[!-dOmniAvatar];thenlog_infoCloning OmniAvatar repository...gitclone$OMNI_AVATAR_REPOfi# ------------------------------------------------------------------------------# Step 3: Install Python Dependencies# ------------------------------------------------------------------------------log_infoInstalling Python dependencies...# 1. Core requirementsif[-frequirements.txt];thenpipinstall-rrequirements.txtfi# 2. FIX: Resolve Version Conflictslog_infoFixing dependency versions...pipinstalltransformers4.44.0 pipinstallpandas2.0.0# 3. FIX: Uninstall conflicting packagelog_infoRemoving conflicting livetalk package...pip uninstall-ylivetalk2/dev/null||truerm-rflivetalk.egg-info# 4. FIX: Remove broken OpenCV fileslog_infoFixing OpenCV files...rm-rf/usr/local/lib/python3.10/dist-packages/cv2rm-rf/usr/local/lib/python3.10/dist-packages/opencv*# 5. Install clean versionspipinstall--no-cache-diropencv-python-headless4.9.0.80numpy2.0# 6. Flash Attentionlog_infoInstalling Flash Attention...pipinstallhttps://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3cu12torch2.8cxx11abiFALSE-cp310-cp310-linux_x86_64.whl||{pipinstallflash-attn --no-build-isolation}# 7. App Dependenciespipinstallgradio edge-tts pydub imageio imageio-ffmpeg huggingface_hub# ------------------------------------------------------------------------------# Step 4: Apply Patches# ------------------------------------------------------------------------------log_infoApplying patches on OmniAvatar...if[-fscripts/add_patch.sh];thenbashscripts/add_patch.shelselog_errorscripts/add_patch.sh not found! Skipping patches.fi# ------------------------------------------------------------------------------# Step 5: Download Models# ------------------------------------------------------------------------------log_infoGenerating model download script...catEOFdownload_models.pyimport os from huggingface_hub import snapshot_download def download_model(repo_id, local_dir): if os.path.exists(local_dir): return try: snapshot_download(repo_idrepo_id, local_dirlocal_dir, local_dir_use_symlinksFalse, resume_downloadTrue) except Exception as e: print(f[Error] Failed to download {repo_id}: {e}) if __name__ __main__: os.makedirs(pretrained_checkpoints, exist_okTrue) download_model(Wan-AI/Wan2.1-T2V-1.3B, pretrained_checkpoints/Wan2.1-T2V-1.3B) download_model(GAIR/LiveTalk-1.3B-V0.1, pretrained_checkpoints/LiveTalk-1.3B-V0.1) download_model(facebook/wav2vec2-base-960h, pretrained_checkpoints/wav2vec2) EOFlog_infoStarting model downloads...python download_models.py# ------------------------------------------------------------------------------# Step 6: Fix Config# ------------------------------------------------------------------------------log_infoFixing configuration file...sed-is/local_attn_size: 15/local_attn_size: -1/g$CONFIG_FILElog_infoGenerating app.py (512x512, 2s duration)...# Use Python to write the filepythonPYTHON_SCRIPT_WRITER import os app_code import os import sys import yaml import argparse import tempfile import asyncio import math import gradio as gr import edge_tts import numpy as np import cv2 import torch import imageio import subprocess # ------------------------------------------------------------------------------ # Environment Setup # ------------------------------------------------------------------------------ current_dir os.path.dirname(os.path.abspath(__file__)) sys.path.append(os.path.join(current_dir, OmniAvatar)) try: from scripts.inference_example import CausalInferencePipeline except ImportError as e: print(fError importing modules: {e}) sys.exit(1) # ------------------------------------------------------------------------------ # Configuration # ------------------------------------------------------------------------------ CONFIG_PATH configs/causal_inference.yaml PORT 8383 DEFAULT_TEXT Hello, I am a digital human assistant. DEFAULT_VOICE zh-CN-XiaoxiaoNeural DEFAULT_IMAGE_PATH examples/inference/example1.jpg # ------------------------------------------------------------------------------ # Helper Functions # ------------------------------------------------------------------------------ def load_yaml_config(config_path): with open(config_path, r) as f: return yaml.safe_load(f) def get_default_ref_image(): if os.path.exists(DEFAULT_IMAGE_PATH): return DEFAULT_IMAGE_PATH return None async def generate_tts_audio(text, output_path, voiceDEFAULT_VOICE): print(f[TTS] Generating audio...) communicate edge_tts.Communicate(text, voice) await communicate.save(output_path) # ------------------------------------------------------------------------------ # Main Application Class # ------------------------------------------------------------------------------ class LiveTalkApp: def __init__(self, config_path): self.config_path config_path self.args self._load_args(config_path) print(Initializing CausalInferencePipeline...) self.device torch.device(cuda:0) # Debug: Print args to verify print(f[DEBUG] Args frame_seq_length: {self.args.frame_seq_length}) print(f[DEBUG] Args latent_h: {self.args.latent_h}, latent_w: {self.args.latent_w}) print(f[DEBUG] Max Duration: {self.args.video_duration}s) self.pipeline CausalInferencePipeline.from_pretrained(argsself.args, deviceself.device) print(Pipeline loaded successfully.) def _load_args(self, config_path): config load_yaml_config(config_path) args argparse.Namespace() # 1. Model Paths (Match inference_example.py defaults) default_dit_path pretrained_checkpoints/LiveTalk-1.3B-V0.1/model.safetensors args.dit_path config.get(dit_path, default_dit_path) args.wav2vec_path config.get(wav2vec_path, pretrained_checkpoints/wav2vec2) base_wan_path pretrained_checkpoints/Wan2.1-T2V-1.3B args.text_encoder_path config.get(text_encoder_path, os.path.join(base_wan_path, models_t5_umt5-xxl-enc-bf16.pth)) args.vae_path config.get(vae_path, os.path.join(base_wan_path, Wan2.1_VAE.pth)) # 2. Resolution Parameters (Strict Math for 512x512) args.latent_h 64 args.latent_w 64 args.frame_seq_length args.latent_h * args.latent_w # 4096 # Update image size args to be consistent max_hw config.get(max_hw, 720) args.max_hw max_hw image_sizes_key fimage_sizes_{max_hw} setattr(args, image_sizes_key, [(512, 512)]) print(f[Config] Resolution: 512x512, Latent: {args.latent_h}x{args.latent_w}, SeqLen: {args.frame_seq_length}) # 3. Other Parameters args.dtype config.get(dtype, bf16) args.fps config.get(fps, 16) args.sample_rate config.get(sample_rate, 16000) # CRITICAL FIX: Limit duration to 2 seconds to avoid KV Cache overflow # 512x512 resolution creates large sequences. # 2s * 16fps 32 frames (approx 9 groups). # Total tokens approx 36,864. Fits in default cache. args.video_duration 2 args.prompt config.get(prompt, A realistic video of a person speaking directly to the camera.) # KV Cache settings: -1 usually means use default/maximum available args.local_attn_size -1 args.model_kwargs config.get(model_kwargs, {}) args.denoising_step_list config.get(denoising_step_list, [1000, 750, 500, 250]) args.warp_denoising_step config.get(warp_denoising_step, True) args.num_transformer_blocks config.get(num_transformer_blocks, 30) args.num_frame_per_block config.get(num_frame_per_block, 3) args.independent_first_frame config.get(independent_first_frame, False) return args def process(self, ref_image_input, text_input): temp_dir tempfile.mkdtemp() audio_path os.path.join(temp_dir, tts_output.wav) asyncio.run(generate_tts_audio(text_input, audio_path)) if ref_image_input is not None: image_path ref_image_input else: image_path get_default_ref_image() if not image_path: raise ValueError(Default reference image not found.) output_video_path os.path.join(temp_dir, result.mp4) self.args.audio_path audio_path self.args.image_path image_path self.args.output_path output_video_path # We keep the forced duration logic just in case TTS generates something longer # But for 512x512, we force max 2 seconds to prevent crash. try: from pydub import AudioSegment duration len(AudioSegment.from_wav(audio_path)) / 1000.0 # Cap duration at 2 seconds for stability actual_duration min(duration, 2.0) self.args.video_duration int(actual_duration) print(f[Inference] Using duration: {self.args.video_duration}s (User audio was {duration}s)) except: self.args.video_duration 2 num_frames (self.args.video_duration * self.args.fps 4) // 4 dtype torch.bfloat16 if self.args.dtype bf16 else torch.float16 # Generate noise matching latent dimensions (512x512 - 64x64 latent) noise torch.randn([1, num_frames, 16, self.args.latent_h, self.args.latent_w], deviceself.device, dtypedtype) print(f[Inference] Noise shape: {noise.shape}) with torch.no_grad(): video_tensor self.pipeline( noisenoise, text_promptsself.args.prompt, image_pathimage_path, audio_pathaudio_path, initial_latentNone, return_latentsFalse ) video_np (video_tensor.squeeze(0).permute(0, 2, 3, 1).cpu().float().numpy() * 255).astype(np.uint8) temp_video_path os.path.join(temp_dir, silent.mp4) imageio.mimsave(temp_video_path, video_np, fpsself.args.fps, codeclibx264, macro_block_sizeNone, ffmpeg_params[-crf, 18, -preset, veryfast]) cmd [ffmpeg, -y, -loglevel, error, -i, temp_video_path, -i, audio_path, -map, 0:v:0, -map, 1:a:0, -c:v, copy, -c:a, aac, -shortest, output_video_path] subprocess.run(cmd, checkTrue) return output_video_path # ------------------------------------------------------------------------------ # Main Entry # ------------------------------------------------------------------------------ if __name__ __main__: print(Initializing Application...) app_instance LiveTalkApp(CONFIG_PATH) demo gr.Interface( fnapp_instance.process, inputs[ gr.Image(labelReference Image (Optional), typefilepath), gr.Textbox(labelInput Text, valueDEFAULT_TEXT, lines5) ], outputsgr.Video(labelGenerated Result), titleLiveTalk: Real-time Digital Human (512x512), descriptionRunning at 512x512. Duration limited to 2s for stability. ) print(fLaunching Gradio on port {PORT}...) demo.queue().launch(server_name0.0.0.0, server_portPORT) with open(app.py, w, encodingutf-8) as f: f.write(app_code) print(app.py generated successfully.) PYTHON_SCRIPT_WRITER# ------------------------------------------------------------------------------# Step 7: Launch Application# ------------------------------------------------------------------------------log_infoStarting Gradio App...python app.py--configconfigs/causal_inference.yaml三、脚本核心优化点说明1. 解决KV Cache溢出问题512x512分辨率下序列长度大幅增加默认时长会导致KV Cache溢出。脚本通过以下方式解决强制将视频时长限制为2秒args.video_duration 2即使TTS生成更长音频也会自动截断为2秒actual_duration min(duration, 2.0)调整local_attn_size为-1使用最大可用缓存2. 分辨率精准控制显式设置latent维度为64x64对应512x512像素固定frame_seq_length 64*64 4096保证分辨率一致性覆盖配置文件中的image_sizes参数强制使用512x5123. 依赖冲突修复卸载冲突的livetalk包删除损坏的OpenCV文件指定transformers、opencv-python-headless等依赖的精确版本优先安装预编译的flash-attn包加速安装过程四、使用方法1. 赋予脚本执行权限chmodx run_livetalk.sh2. 运行脚本./run_livetalk.sh3. 访问Web界面脚本运行完成后会自动启动Gradio服务默认端口为8383。在浏览器中访问http://服务器IP:83834. 生成数字人视频上传参考图片可选默认使用示例图片输入想要数字人说的文本点击提交等待2-3分钟即可生成512x512分辨率的数字人视频五、常见问题解决1. 模型下载失败检查网络是否能访问Hugging Face手动下载模型到pretrained_checkpoints目录Wan2.1-T2V-1.3Bhttps://huggingface.co/Wan-AI/Wan2.1-T2V-1.3BLiveTalk-1.3B-V0.1https://huggingface.co/GAIR/LiveTalk-1.3B-V0.1wav2vec2-base-960hhttps://huggingface.co/facebook/wav2vec2-base-960h2. CUDA Out of Memory确认显卡为5090显存足够检查是否有其他进程占用显存nvidia-smi重启脚本确保仅运行LiveTalk一个进程3. Gradio界面无法访问检查服务器防火墙是否开放8383端口确认脚本中server_name0.0.0.0允许外部访问查看脚本运行日志确认Gradio服务正常启动总结本文提供的一键部署脚本针对5090显卡和512x512分辨率做了深度优化核心亮点无需手动修改源码通过配置覆盖解决KV Cache溢出问题全自动化处理依赖安装、模型下载、配置修复提供可视化Web界面降低使用门槛精准控制分辨率和时长保证生成稳定性。通过该脚本你可以快速部署LiveTalk数字人框架体验高分辨率数字人视频生成的效果。如果遇到问题欢迎在评论区交流