高级工具按页计费2 credits

process_document

处理 PDF、DOCX 和 TXT 文档，提供文本提取、图像提取和可选的 OCR 支持。非常适合解析学术论文、发票、表单以及多格式文档处理。

使用场景

文档解析

从 PDF、Word 文档和文本文件中提取文本和元数据

学术研究

处理研究论文、学位论文和学术出版物以供分析

发票处理

从发票、收据和财务文档中提取结构化数据

表单提取

解析申请表、调查问卷和问卷

法律文档

从合同、协议和法律文件中提取文本

扫描文档 OCR

通过 OCR 将扫描图像和 PDF 转换为可搜索的文本

Endpoint

POST/api/v1/tools/process_document

Auth Required

Free 计划 2 req/s

2 credits

Parameters

Name	Type	Required	Default	Description
source	string	Required	-	文档来源（根据 sourceType 为 URL 或文件路径） Example: https://example.com/document.pdf
sourceType	string	Required	-	来源类型："url"、"pdf_url"、"file" 或 "pdf_file" Example: pdf_url
options	object	Optional	-	处理选项 Example: {"extractImages": true, "ocrEnabled": false}
options.extractImages	boolean	Optional	false	是否从文档中提取图像 Example: true
options.ocrEnabled	boolean	Optional	false	为扫描文档启用 OCR（每页增加 2 credits） Example: false
options.maxPages	number	Optional	-	要处理的最大页数（默认：所有页面） Example: 10

credits 费用： 每页 2 credits + 如启用 OCR 则每页额外 2 credits。一份 10 页的 PDF 费用为 20 credits（启用 OCR 则为 40 credits）。

请求示例

terminalBash

响应示例

200 OK3450ms

{
  "success": true,
  "data": {
    "pages": [
      {
        "pageNumber": 1,
        "text": "Introduction\n\nThis research paper explores the applications of machine learning...",
        "wordCount": 523,
        "images": [
          "image_1_base64..."
        ]
      },
      {
        "pageNumber": 2,
        "text": "Methodology\n\nOur approach involves collecting data from multiple sources...",
        "wordCount": 612,
        "images": []
      }
    ],
    "metadata": {
      "title": "Machine Learning Applications in Healthcare",
      "author": "Dr. Jane Smith",
      "creationDate": "2024-01-15",
      "pageCount": 10,
      "fileSize": 2456789,
      "format": "PDF"
    },
    "extractedText": "Introduction\n\nThis research paper explores the applications of machine learning...\n\nMethodology\n\nOur approach involves...",
    "images": [
      "image_1_base64..."
    ],
    "totalPages": 10,
    "processedPages": 10
  },
  "credits_used": 20,
  "credits_remaining": 980,
  "processing_time": 3450
}

Field Descriptions

data.pages包含每页文本和图像的页面对象数组

data.metadata文档元数据（标题、作者、日期、格式）

data.extractedText所有页面合并的文本

data.imagesbase64 格式的提取图像数组（若 extractImages 为 true）

data.totalPages文档的总页数

credits_used扣除的 credits（每页 2 × 10 页 = 20 credits）

processing_time总处理时间（毫秒）

错误处理

不支持的格式（400 Bad Request）

不支持该文档格式。支持的格式：PDF、DOCX、TXT。

文件过大（413 Payload Too Large）

文档超过 50MB 的最大文件大小。请将大型文档拆分为更小的文件。

文档损坏（422 Unprocessable Entity）

文档已损坏或受密码保护。请确保文件有效且未加密。

credits 不足（402 Payment Required）

您的账户没有足够的 credits 处理此文档（需要 {pageCount} × 2 credits）。购买更多 credits。

超出速率限制（429 Too Many Requests）

您已超出计划的速率限制。请稍候片刻或升级您的计划以获得更高的限制。

专业提示： 在处理大型文档时，使用 maxPages 参数来限制 credits 用量。如果只需要特定部分，可分批处理。

credits 费用

2 credits

每页 2 credits（启用 OCR 为 4 credits）

每处理一页费用为 2 credits。启用 OCR 每页额外增加 2 credits。

示例： 10 页的 PDF = 20 credits（启用 OCR 则为 40 credits）

Free 计划： 1,000 个一次性试用 credits = 500 页（启用 OCR 为 250 页）

Hobby 计划： 每月 5,000 credits = 2,500 页（$19/mo）

Professional 计划： 每月 50,000 credits = 25,000 页（$99/mo）

相关工具

summarize_content

对提取的文档文本进行摘要（4 credits）

extract_text

从 HTML 文档中提取干净的文本（1 credit）

准备好试用 process_document 了吗？免费注册并获得 1,000 credits 开始构建。

使用场景

文档解析

从 PDF、Word 文档和文本文件中提取文本和元数据

学术研究

处理研究论文、学位论文和学术出版物以供分析

发票处理

从发票、收据和财务文档中提取结构化数据

表单提取

解析申请表、调查问卷和问卷

法律文档

从合同、协议和法律文件中提取文本

扫描文档 OCR

通过 OCR 将扫描图像和 PDF 转换为可搜索的文本

Parameters

Name	Type	Required	Default	Description
source	string	Required	-	文档来源（根据 sourceType 为 URL 或文件路径） Example: https://example.com/document.pdf
sourceType	string	Required	-	来源类型："url"、"pdf_url"、"file" 或 "pdf_file" Example: pdf_url
options	object	Optional	-	处理选项 Example: {"extractImages": true, "ocrEnabled": false}
options.extractImages	boolean	Optional	false	是否从文档中提取图像 Example: true
options.ocrEnabled	boolean	Optional	false	为扫描文档启用 OCR（每页增加 2 credits） Example: false
options.maxPages	number	Optional	-	要处理的最大页数（默认：所有页面） Example: 10

响应示例

200 OK3450ms

{
  "success": true,
  "data": {
    "pages": [
      {
        "pageNumber": 1,
        "text": "Introduction\n\nThis research paper explores the applications of machine learning...",
        "wordCount": 523,
        "images": [
          "image_1_base64..."
        ]
      },
      {
        "pageNumber": 2,
        "text": "Methodology\n\nOur approach involves collecting data from multiple sources...",
        "wordCount": 612,
        "images": []
      }
    ],
    "metadata": {
      "title": "Machine Learning Applications in Healthcare",
      "author": "Dr. Jane Smith",
      "creationDate": "2024-01-15",
      "pageCount": 10,
      "fileSize": 2456789,
      "format": "PDF"
    },
    "extractedText": "Introduction\n\nThis research paper explores the applications of machine learning...\n\nMethodology\n\nOur approach involves...",
    "images": [
      "image_1_base64..."
    ],
    "totalPages": 10,
    "processedPages": 10
  },
  "credits_used": 20,
  "credits_remaining": 980,
  "processing_time": 3450
}

Field Descriptions

data.pages包含每页文本和图像的页面对象数组

data.metadata文档元数据（标题、作者、日期、格式）

data.extractedText所有页面合并的文本

data.imagesbase64 格式的提取图像数组（若 extractImages 为 true）

data.totalPages文档的总页数

credits_used扣除的 credits（每页 2 × 10 页 = 20 credits）

processing_time总处理时间（毫秒）

错误处理

不支持的格式（400 Bad Request）

不支持该文档格式。支持的格式：PDF、DOCX、TXT。

文件过大（413 Payload Too Large）

文档超过 50MB 的最大文件大小。请将大型文档拆分为更小的文件。

文档损坏（422 Unprocessable Entity）

文档已损坏或受密码保护。请确保文件有效且未加密。

credits 不足（402 Payment Required）

您的账户没有足够的 credits 处理此文档（需要 {pageCount} × 2 credits）。购买更多 credits。

超出速率限制（429 Too Many Requests）

您已超出计划的速率限制。请稍候片刻或升级您的计划以获得更高的限制。

专业提示： 在处理大型文档时，使用 maxPages 参数来限制 credits 用量。如果只需要特定部分，可分批处理。

credits 费用

2 credits

每页 2 credits（启用 OCR 为 4 credits）

每处理一页费用为 2 credits。启用 OCR 每页额外增加 2 credits。

示例： 10 页的 PDF = 20 credits（启用 OCR 则为 40 credits）

Free 计划： 1,000 个一次性试用 credits = 500 页（启用 OCR 为 250 页）

Hobby 计划： 每月 5,000 credits = 2,500 页（$19/mo）

Professional 计划： 每月 50,000 credits = 25,000 页（$99/mo）