网站数据采集怎么做建e室内设计网如何使用

张

张建站

2026/4/21 13:05:35

10分钟阅读

网站数据采集怎么做,建e室内设计网如何使用,酷虎云建站工具,网站开发专业实习报告如果大家玩过类似 cherry stutio, ima 等本地客户端#xff0c;把自己的专业领域文档导入然后做问答#xff0c;这很容易做到。但是想要产出一份长篇幅有深度的报告#xff0c;就力不从心了。如果大家在国内外的网站上体验过 deep research 功能#xff0c;针对通用话题出…如果大家玩过类似 cherry stutio, ima 等本地客户端把自己的专业领域文档导入然后做问答这很容易做到。但是想要产出一份长篇幅有深度的报告就力不从心了。如果大家在国内外的网站上体验过 deep research 功能针对通用话题出一份报告问题不大。但是想要出一份自己专业领域的报告就无从下手了。好消息NVIDIA DLI 的在线课程《Build a Deep Research Agent》可以让我们自己动手做一个包含RAG 人工干预以及反思环节的 Deep Research 智能体。我们直接看看最终的成果长什么样子。成果展示是一个在线版的网页应用选择一个模型开始。step1选择数据集本质上是选择做 Research 过程中要调用的本地知识库。在实验的过程中会下载经济学和生物医疗两个数据集。也支持自定义数据集。step2选择是否要使用网络搜索。step3深度思考之后确定了一个执行计划我们可以通过对话提出修改建议。plan如果满意的话点击 Execute Planagent 就开始给我们打工了之后可以看到具体的执行流程。先用RAG从知识库里获取一些内容检查内容是否相关再去网络搜索一些内容总结整理以上收集到的所有资料先输出一份摘要通过反思检查摘要是否有执行不到位的地方输出总结报告最后再更新摘要全部完成之后我们还可以通过对话修改报告或者基于报告提问。image如果看到这里你已经动心了但是怕自己代码能力不强做不出来请大可不必有这种担忧因为这个实验里不需要写一行代码我们除了在 notebook 里一路点运行只需要把两个 API key 填进去就ok了一共四个 Notebook 按顺序操作就可以了。实验过程notebook 里面的背景知识介绍和代码注释都非常详细是非常好的学习材料下面只简单介绍一下每个 notebook 的内容。Notebook 1: Prerequisites (准备工作)获取 NGC API Key用于访问 NVIDIA 托管模型和镜像。NCG API key获取 Tavily API Key用于实现网页搜索工具调用。TavilyNotebook 2: Deploy NIM (部署推理微服务)Nemotron 是 Nvidia 的一系列模型这个 notebook 可以学习到如何使用 OpenAI 接口调用Nvidia 提供服务 Nemotron 模型深度思考模型的使用方式是否开启深度思考的效果对比stream 模式调用模型。最后还介绍了如何用 NVIDIA NIM 在本地部署推理服务如果只想做 deep research可以跳过本地部署这部分。Notebook 3: Deploy RAG (部署 RAG 流水线)image这一部分介绍了 RAG 架构包含文档摄取 (Ingestion) 和检索 (Retrieval)。使用 NVIDIA NeMo Retriever 和 PaddleOCR 处理多模态数据文本、图表、表格。Ingestor Server负责“存”把知识塞进数据库。Vector DB负责“放”把知识存稳并提供快速搜索。RAG Server负责“用”根据用户的问题去查数据库并生成回答。为了 RAG 的整套功能用到了很多模型分别负责 OCR文本、图表位置识别表格格式识别等。RAG model这个章节的实操内容是首先从wikipedia 下载了 The_White_Lotus_Season_3 问 Nicholas Duvernay在其中饰演了谁这样一个问题对比 use_knowledge_base:False/True enable_reranker:False/True这两个参数取不同值时候的效果。Notebook 4: Deploy Researcher (部署研究智能体)image这一章节会整合前三章内容使用 NVIDIA AI Research Assistant Blueprint集成网页搜索Tavily、私有 RAG 库和推理模型构建完整的研究工作流。把 Notebook 一路点下来会把前端和后端一共 10 个 docker 都启动。image之后把 Notebook 所在页面的网页地址 copy 出来保留 southcentralus.cloudapp.azure.com 之前的部分加上:3000端口就可以打开文章开头展示的成果了。访问的地址类似这样 http://dli-efd94ec9e8b34b85b048fabeafd054e0-97c5a9.southcentralus.cloudapp.azure.com:3000中间的随机字符串你会跟我不一样。最好一次性把 4 个 Notebook 串下来完成因为 Notebook 4 依赖 Notebook 3 中的 RAG 服务。给超级小白补充一点基础知识git clone 是把 github 上面的代码下载下来docker 镜像(image)是把代码代码的运行环境都打包到一起Dockerfile 是制作 Docker 镜像的操作说明书docker run 把一个模块级别的代码运行起来docker compose 是把多个相互配合的模块都运行起来并且把它们组成一个队伍docker-compose.yaml 是每个队员以及如何组队所有事情的操作说明书源码解读因为 Notebook并不包含工作流的实现代码我是去官方代码仓库扒代码才写出下面的内容的。Agent 部分用 Langraph 实现分为两个部分。phase1第一部分是准备问题先把要做的研究分解成几个方向。也就是图中 Plan 的内容。plan第二部分是 deep research 执行。定义了网络搜索资料摘要反思和最终总结4个节点并依次相连组成图。image执行出来的效果如图。image最后的编辑报告和基于报告对话代码在这里。image关于状态维护的设计后端接口是无状态的AIRAState 定义状态前端负责传递状态可惜前端的代码不开源。imageprompt 赏析从源码的 aiq-research-assistant/aira/src/aiq_aira /prompts.py 文件中我们还可以看到 prompt 学习一下一个 meta_prompt 定义了不同阶段中通用的要求后面的摘要、反思、报告都在 meta 的基础上叠加。为了输出的行文风格更像报告prompt 中要求禁止 bullet lists的方式。通常你问 ai 问题他都会列个1234每一条是一两句话这不符合严谨的长篇个报告的风格。后面对报告的调研方式风格语言都做了要求。meta_prompt You are working with a team of research experts to deliver a publication-ready long-form report that can stand alone as an excellent comprehensive reference on the provided topic. Below is the goal of the team. ### Guidelines - Introduction - Begin with an engaging, context-rich introduction that frames the central questions, scope, and intellectual journey ahead. Hook the reader. - Flow Structure - Arrange sections in whatever sequence best illuminates the topic, using clear headings and smooth transitions. Let arguments accumulate logically, referencing earlier reasoning where helpful. - Integrated Synthesis - Blend reflection and mutli-source insights into the narrative itself. Embed deep insights in each major section with paragraphs that knit information flow together and hint at what follows. **Avoid explicit standalone Takeaways/Insights etc. subsections.** - Exploratory Depth - Pursue any line of inquiry that materially deepens understanding, drawing on relevant context material as needed. Use reflection rounds to further sharpen understanding. - Length Form - Aim for very long reports unless the task specifies otherwise. Write in multiple coherent paragraphs in each section/subsection. Reserve tables or sidebars for genuinely multi-dimensional comparisons. **Avoid bullet lists unless absolutely necessary.** ### In-depth and detailed analysis - Move from surface-level observations to underlying mechanisms and their broader implications. - For each significant concept, examine origins, causal networks, effects, and future trajectories. - Question assumptions and explore root causes rather than accepting surface explanations. - Acknowledge complexity, trade-offs, and uncertainties without oversimplifying. - Ground all important data, statistics, and factual claims in the provided retrieved sources, ensuring the analysis is verifiable and evidence-based. - Weave multi-layered deep insights naturally into the narrative flow. ### Style and tone - Write for an intelligent, curious reader without presuming specialised knowledge. - Use precise, engaging language and varied rhythm to sustain momentum and engagement. - Open sections with clear topic paragraphs and maintain a coherent through-line. - Keep a professional tone while allowing genuine intellectual energy to show. - Your goal is not just to inform but to provide deep understanding. ### Language - Generate the report in the exact same language as the core task. - If the prompt is in Chinese → write the entire report in Chinese. - If the prompt is in English → write the entire report in English. - Maintain consistent language throughout the report. Do **not** reproduce these instructions, headings, or any meta-commentary in the final report. Your role within the team is: query_writer_instructions meta_prompt You are the search-query architect for a deep-research agent that produces comprehensive, long-form reports. Generate {number_of_queries} search queries that will help with planning the sections of the final report. # Report topic {topic} # Report should address the following questions: {report_organization} # Instructions - First, carefully analyze the task to understand the core objectives. - Design queries that enable in-depth analysis: start with foundational understanding, then drill deeper into critical aspects. Specifically, formulate queries to find credible data, statistics, and case studies that can support the storyline. - Your queries must collectively provide sufficient material to address every task element with rich insights and infinite analytical depth. - Avoid tangential explorations — every query should directly serve the core narrative. - Target material that reveals the why and how, not merely the what. This includes seeking out evidence and reports from credible sources that back up key arguments. - Format your response as a JSON object with the following keys: - query: The actual search query string - report_section: The section of report the query is generated for - rationale: Brief explanation of why this query is relevant to this report section **Output example** json [ {{ query: What is a transformer?, report_section: Introduction, rationale: Introduces the user to transformer }}, {{ query: machine learning transformer architecture explained, report_section: technical architecture, rationale: Understanding the fundamental structure of transformer models }} ] summarizer_instructions meta_prompt Based on all the research conducted, create a comprehensive, well-structured report to fully address the overall research question: {report_organization} CRITICAL: Make sure the answer is written in the same language as the human messages! For example, if the users messages are in English, then MAKE SURE you write your response in English. If the users messages are in Chinese, then MAKE SURE you write your entire response in Chinese. This is critical. The user will only understand the answer if it is written in the same language as their input message. Here are the findings from the research that you conducted: Findings {source} /Findings Please create a detailed answer to the overall research question that: 1. Is well-organized with proper headings (# for title, ## for sections, ### for subsections) 2. Includes specific facts and insights from the research 3. Provides a balanced, thorough analysis. Be as comprehensive as possible, and include all information that is relevant to the overall research question. People are using you for deep research and will expect detailed, comprehensive answers. 4. Do not include any source citations, as these will be added to the report in post processing. REMEMBER: Section is a VERY fluid and loose concept. You can structure your report however you think is best! Make sure that your sections are cohesive, and make sense for the reader. For each section of the report, do the following: - Use simple, clear language - Use ## for section title (Markdown format) for each section of the report - Do NOT ever refer to yourself as the writer of the report. This should be a professional report without any self-referential language. - Do not say what you are doing in the report. Just write the report without any commentary from yourself. - Each section should be as long as necessary to deeply answer the question with the information you have gathered. It is expected that sections will be fairly long and verbose. You are writing a deep research report, and users will expect a thorough answer. - Use bullet points to list out information when appropriate, but by default, write in paragraph form. - Again, do not include any source citations, as these will be added to the report in post processing. REMEMBER: The brief and research may be in English, but you need to translate this information to the right language when writing the final answer. Make sure the final answer report is in the SAME language as the human question. report_extender meta_prompt Based on the current report draft below and the new sources you just discovered, you need to incorporate these additional sources into the current draft report. The new report should be a comprehensive, well-structured report to fully address the overall research question: {report_organization} CRITICAL: Make sure the answer is written in the same language as the human messages! For example, if the users messages are in English, then MAKE SURE you write your response in English. If the users messages are in Chinese, then MAKE SURE you write your entire response in Chinese. This is critical. The user will only understand the answer if it is written in the same language as their input message. REPORT DRAFT {report} /REPORT DRAFT NEW SOURCES {source} /NEW SOURCES # Instructions 1. Preserve the draft report structure (same title, sections, headings etc) 2. Seamlessly use information from the new sources to enhance the draft reports argument, insights, and analysis. 3. Although you can quote new sources directly where appropriate, you should focus on generating additional insight and analysis from the new sources to provide a rich and comprehensive report. 4. Do not include any source citations, as these will be added to the report in post processing. The new report should be a detailed answer to the overall research question that: 1. Is well-organized with proper headings (# for title, ## for sections, ### for subsections) 2. Includes specific facts and insights from the research 3. Provides a balanced, thorough analysis. Be as comprehensive as possible, and include all information that is relevant to the overall research question. People are using you for deep research and will expect detailed, comprehensive answers. 4. Does not include any source citations, as these will be added to the report in post processing. REMEMBER: Section is a VERY fluid and loose concept. You can structure your report however you think is best! Make sure that your sections are cohesive, and make sense for the reader. Each section of the report should obey the following rules: - Use simple, clear language - Use ## for section title (Markdown format) for each section of the report - Do NOT ever refer to yourself as the writer of the report. This should be a professional report without any self-referential language. - Do not say what you are doing in the report. Just write the report without any commentary from yourself. - Each section should be as long as necessary to deeply answer the question with the information you have gathered. It is expected that sections will be fairly long and verbose. You are writing a deep research report, and users will expect a thorough answer. - Use bullet points to list out information when appropriate, but by default, write in paragraph form. - Again, do not include any source citations, as these will be added to the report in post processing. REMEMBER: The brief and research may be in English, but you need to translate this information to the right language when writing the final answer. Make sure the final answer report is in the SAME language as the human question. reflection_instructions meta_prompt Using report topic and questions as a guide, identify knowledge gaps and/or areas that have not been addressed comprehensively in the report draft. # Report topic {topic} # Report should address the following questions: {report_organization} # Draft Report {report} # Instructions 1. Focus on details that are necessary to understanding the key concepts as a whole that have not been fully covered 2. Ensure the follow-up question is self-contained and includes necessary context for web search. 3. Format your response as a JSON object with the following keys: - query: Write a specific follow up question to address this gap - report_section: The section of report the query is generated for - rationale: Describe what information is missing or needs clarification **Output example** json {{ query: What are typical performance benchmarks and metrics used to evaluate [specific technology]? report_section: Deep dive, rationale: The report lacks information about performance metrics and benchmarks }} relevancy_checker Determine if the Context contains proper information to answer the Question. # Question {query} # Context {document} # Instructions 1. Give a binary score yes or no to indicate whether the context is able to answer the question. **Output example** json {{ score: yes }} finalize_report meta_prompt Given the report draft below, format a final report to best achieve the report goal. Do not add a sources section, sources are added in post processing. You should use proper markdown syntax when appropriate, as the text you generate will be rendered in markdown. Do NOT wrap the report in markdown blocks (e.g triple backticks). Return only the final report without any other commentary or justification. Based on the report draft below, create a comprehensive, well-structured report to fully address the overall research question: REPORT GOAL The report should address the following questions: {report_organization} /REPORT GOAL CRITICAL: Make sure the answer is written in the same language as the human messages! For example, if the users messages are in English, then MAKE SURE you write your response in English. If the users messages are in Chinese, then MAKE SURE you write your entire response in Chinese. This is critical. The user will only understand the answer if it is written in the same language as their input message. Here is the report draft: REPORT DRAFT {report} /REPORT DRAFT Please create a detailed answer to the overall research question that: 1. Is well-organized with proper headings (# for title, ## for sections, ### for subsections) 2. Includes specific facts and insights from the research 3. Provides a balanced, thorough analysis. Be as comprehensive as possible, and include all information that is relevant to the overall research question. People are using you for deep research and will expect detailed, comprehensive answers. 4. Do not include any source citations, as these will be added to the report in post processing. REMEMBER: Section is a VERY fluid and loose concept. You can structure your report however you think is best! Make sure that your sections are cohesive, and make sense for the reader. For each section of the report, do the following: - Use simple, clear language - Use ## for section title (Markdown format) for each section of the report - Do NOT ever refer to yourself as the writer of the report. This should be a professional report without any self-referential language. - Do not say what you are doing in the report. Just write the report without any commentary from yourself. - Each section should be as long as necessary to deeply answer the question with the information you have gathered. It is expected that sections will be fairly long and verbose. You are writing a deep research report, and users will expect a thorough answer. - Use bullet points to list out information when appropriate, but by default, write in paragraph form. - Again, do not include any source citations, as these will be added to the report in post processing. REMEMBER: The brief and research may be in English, but you need to translate this information to the right language when writing the final answer.Make sure the final answer report is in the SAME language as the human question. 一点思考之前的文章LangChain 1.0 版本入门agent 创建和 context engineering 都简单多了介绍了 LangChain 1.0 版本内置了 middleware 可以处理人工干预等逻辑和这里后端无状态前端维护状态是完全相反的两种技术选择路线。关于这两种路线的选择我的想法是这样的如果 Agent 的交互逻辑比较简单、稳定不会经常大幅度变更选前端维护状态。这样灵活度高前后端配合容易。如果 Agent 的交互逻辑复杂包含多个步骤或者经常需要变动选 middleware。把复杂的变化都留给后端自己处理前端保持简单。最后我在一线科技企业深耕十二载见证过太多因技术卡位而跃迁的案例。那些率先拥抱 AI 的同事早已在效率与薪资上形成代际优势我意识到有很多经验和知识值得分享给大家也可以通过我们的能力和经验解答大家在大模型的学习中的很多困惑。我整理出这套 AI 大模型突围资料包✅AI大模型学习路线图✅Agent行业报告✅100集大模型视频教程✅大模型书籍PDF✅DeepSeek教程✅AI产品经理入门资料如果你也想通过学大模型技术去帮助自己升职和加薪可以扫描下方链接为什么我要说现在普通人就业/升职加薪的首选是AI大模型人工智能技术的爆发式增长正以不可逆转之势重塑就业市场版图。从DeepSeek等国产大模型引发的科技圈热议到全国两会关于AI产业发展的政策聚焦再到招聘会上排起的长队AI的热度已从技术领域渗透到就业市场的每一个角落。智联招聘的最新数据给出了最直观的印证2025年2月AI领域求职人数同比增幅突破200%远超其他行业平均水平整个人工智能行业的求职增速达到33.4%位居各行业榜首其中人工智能工程师岗位的求职热度更是飙升69.6%。AI产业的快速扩张也让人才供需矛盾愈发突出。麦肯锡报告明确预测到2030年中国AI专业人才需求将达600万人人才缺口可能高达400万人这一缺口不仅存在于核心技术领域更蔓延至产业应用的各个环节。资料包有什么①从入门到精通的全套视频教程包含提示词工程、RAG、Agent等技术点② AI大模型学习路线图还有视频解说全过程AI大模型学习路线③学习电子书籍和技术文档市面上的大模型书籍确实太多了这些是我精选出来的④各大厂大模型面试题目详解⑤ 这些资料真的有用吗?这份资料由我和鲁为民博士共同整理鲁为民博士先后获得了北京清华大学学士和美国加州理工学院博士学位在包括IEEE Transactions等学术期刊和诸多国际会议上发表了超过50篇学术论文、取得了多项美国和中国发明专利同时还斩获了吴文俊人工智能科学技术奖。目前我正在和鲁博士共同进行人工智能的研究。所有的视频教程由智泊AI老师录制且资料与智泊AI共享相互补充。这份学习大礼包应该算是现在最全面的大模型学习资料了。资料内容涵盖了从入门到进阶的各类视频教程和实战项目无论你是小白还是有些技术基础的这份资料都绝对能帮助你提升薪资待遇转行大模型岗位。智泊AI始终秉持着“让每个人平等享受到优质教育资源”的育人理念‌通过动态追踪大模型开发、数据标注伦理等前沿技术趋势‌构建起前沿课程智能实训精准就业的高效培养体系。课堂上不光教理论还带着学员做了十多个真实项目。学员要亲自上手搞数据清洗、模型调优这些硬核操作把课本知识变成真本事‌如果说你是以下人群中的其中一类都可以来智泊AI学习人工智能找到高薪工作一次小小的“投资”换来的是终身受益应届毕业生‌无工作经验但想要系统学习AI大模型技术期待通过实战项目掌握核心技术。零基础转型‌非技术背景但关注AI应用场景计划通过低代码工具实现“AI行业”跨界‌。业务赋能 ‌突破瓶颈传统开发者Java/前端等学习Transformer架构与LangChain框架向AI全栈工程师转型‌。获取方式有需要的小伙伴可以保存图片到wx扫描二v码免费领取【保证100%免费】