信誉好的东莞网站设计,西安网站推广公司,网站建设社区,网站的域名每年都要续费我简单地拍了几张营养标签的照片#xff0c;并使用提示转换为 JSON。然后我针对一堆自托管模型和 Mistral Open API 进行了测试。它们都表现得相当好#xff0c;但 Mistral API 是最好的。 你知道#xff0c;击败基准测试是一回事。但基准测试通常是自我报告的随机变量…我简单地拍了几张营养标签的照片并使用提示转换为 JSON。然后我针对一堆自托管模型和 Mistral Open API 进行了测试。它们都表现得相当好但 Mistral API 是最好的。你知道击败基准测试是一回事。但基准测试通常是自我报告的随机变量就像驾照考试和驾驶并不完全一样它们不能保证给定的现实生活问题会以可行的方式解决。这就是为什么我想给它一个真实的测试用例。测试很简单我拍了一堆营养标签的照片就像下面这张视觉模型在将其转换为 JPEG 方面有多好我尝试提取以下格式的 JSON{ calories: 180, serving_size: 28.0, unit: g }我使用的提示The image is the label of a packaged food product. Extract the following information from the nutrition facts label and return ONLY a JSON object with no additional text: - calories: total calories per serving (integer) - serving_size: numeric serving size value (float) - unit: unit of measurement for serving size (string, e.g., g, ml, oz, cup) Use this exact format: json { $schema: http://json-schema.org/draft-07/schema#, type: object, properties: { calories: { type: integer, description: Total calories per serving }, serving_size: { type: number, description: Numeric serving size }, unit: { type: string, description: Unit of measurement (e.g., g, ml, oz, cups) } }, required: [calories, serving_size, unit] }Example:{calories:180,serving_size:28.0,unit:g}If any value cannot be determined from the label, use null for that field.我运行同样的东西 50 次看看是否能得到期望的结果。 我使用的评估标准prompt_path: “…/prompts/calories_and_serving_size.md”repeat: 50threshold: 0cases:id: calories-label-180steps:input:image_path: “images/IMG_B768CE83-9FEC-461A-BE63-CDDF64EBEB58.jpeg”max_tokens: 64expectations:type: equalsvalue:calories: 180serving_size: 40.0unit: gid: calories-label-harvest-trail-mix-1steps:input:image_path: “images/IMG_3236.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 220serving_size: 50.0unit: gcalories: 220serving_size: 0.5unit: cupcalories: 220serving_size: 50.0unit: gramsid: calories-label-chunky-supreme-granola-exactsteps:input:image_path: “images/IMG_3228.HEIC”max_tokens: 64expectations:type: equalsvalue:calories: 630serving_size: 140.0unit: gid: calories-label-maple-eh-granola-exactsteps:input:image_path: “images/IMG_3231.HEIC”max_tokens: 64expectations:type: equalsvalue:calories: 570serving_size: 130.0unit: gid: calories-label-harvest-trail-mix-2steps:input:image_path: “images/IMG_3232.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 220serving_size: 50.0unit: gcalories: 220serving_size: 0.5unit: cupcalories: 220serving_size: 50.0unit: gramsid: calories-label-harvest-trail-mix-3steps:input:image_path: “images/IMG_3237.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 220serving_size: 50.0unit: gcalories: 220serving_size: 0.5unit: cupcalories: 220serving_size: 50.0unit: gramsid: calories-label-hersheys-christmas-cookies-exactsteps:input:image_path: “images/IMG_3276.HEIC”max_tokens: 64expectations:type: equalsvalue:calories: 200serving_size: 41.0unit: gid: calories-label-mnm-mint-chocolate-candy-1steps:input:image_path: “images/IMG_3279.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 190serving_size: 40.0unit: gcalories: 220serving_size: 0.25unit: cupid: calories-label-hersheys-candy-cane-kissessteps:input:image_path: “images/IMG_3280.HEIC”max_tokens: 64expectations:type: equalsvalue:calories: 220serving_size: 42.0unit: gid: calories-label-milk-chocolate-christmas-ballssteps:input:image_path: “images/IMG_3282.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 200serving_size: 40.0unit: gcalories: 200serving_size: 6.0unit: pcscalories: 200serving_size: 6.0unit: piecesid: calories-label-mnm-mint-chocolate-candy-2steps:input:image_path: “images/IMG_3284.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 190serving_size: 40.0unit: gcalories: 220serving_size: 0.25unit: cupid: calories-label-jumbo-sour-sukkerssteps:input:image_path: “images/IMG_3286.HEIC”max_tokens: 64expectations:type: equalsvalue:calories: 150serving_size: 40.0unit: gtype: oneOfvalues:calories: 150serving_size: 40.0unit: gcalories: 150serving_size: 2.0unit: pcscalories: 150serving_size: 2.0unit: piecesid: calories-label-quality-streetsteps:input:image_path: “images/IMG_3288.HEIC”max_tokens: 64expectations:type: oneOfvalues:calories: 200serving_size: 42.0unit: gcalories: 200serving_size: 5.0unit: pcscalories: 200serving_size: 5.0unit: pieces不再废话以下是结果 ![无](https://i-blog.csdnimg.cn/img_convert/4d6a03e2259cc129d7eae5c3fae24b4c.webp?x-oss-processimage/format,png) 测试用例的平均模型准确度让我们分解一下我认为 Llava 失败了。我认为 Llava 是一个 OCR 模型旨在像 DeepSeek-OCR 一样使用。这些模型不属于这里它们不遵循像以 JSON 格式输出这样的具体指令。 第二个学习是Qwen38B 参数在 16GB-VRAM 机器上工作得很好。但是在现实设置中它们可能并没有比在更小机器上也能工作的 Qwen3:2B 带来太多 —— 它们可能即使在 6GB VRAM 上也能正常工作。使用 8B 来实现与统计噪音无法区分的性能提升可能是不可行的。 另一个观点是Qwen3:2B 似乎比 Qwen3:4B 更好。我不知道这是否真的是改进或者只是统计波动。我目前的想法是如果应用程序实际上需要一个小窗口那么增加上下文窗口确实没有意义。不仅会增加成本实际上可能还会降低性能。 最后一个学习托管 Ollama 时打开流式传输否则当请求超过 60 秒时请求会被丢弃。我听说这是可定制的但默认为流式传输可能更容易尽管需要额外的代码。 --- 原文链接[Ollama视觉模型实测 - 汇智网](https://www.hubwiz.com/blog/ollama-vision-models-test/)