BetaPDF Developer API

REST API to convert PDFs and images (jpg/png/webp) into Markdown + structured JSON, optimized for ChatGPT, Claude, and RAG pipelines.

See pricing →

Quickstart

Get a key, send a PDF or photo, get Markdown back. Three commands.

# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"
# Or:  -F "file=@photo.jpg"

Try the API live

Upload a PDF and see bbox overlays in the interactive playground (Pro or Business plan, signed-in users only).

Open Playground →

Authentication

Every request must include a Bearer token. API access is available on Pro ($9.99/mo, 1,000 pages) and Business ($29.99/mo, 5,000 pages) — the Free tier cannot mint or use API keys. Get your key at /account/api-keys after upgrading. The plaintext is shown ONCE on creation — store it like a password.

Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx

Endpoints

POST /v1/parse — Synchronous

Use for PDFs ≤ 10 pages or a single image (jpg/png/webp). Blocks until done (typically 7-30s). Returns the full result inline. For larger PDFs use the async endpoint.

# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"
# Or:  -F "file=@photo.jpg"

POST /v1/parse/jobs — Asynchronous

Returns 202 + job_id immediately. Use for PDFs up to 50 pages (images work here too, but the sync endpoint is faster for them). Poll GET /v1/parse/jobs/{id}; fetch with GET /v1/parse/jobs/{id}/result when status=completed.

# Accepts PDF (≤50 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse/jobs \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"

curl https://betapdf.com/api/v1/parse/jobs/JOB_ID \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

curl https://betapdf.com/api/v1/parse/jobs/JOB_ID/result \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

GET /v1/usage

Check your current month's page count and remaining quota.

curl https://betapdf.com/api/v1/usage \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

POST /v1/classify — Vietnamese Category Classifier

Classify Vietnamese product titles into a category_id (46 leaf categories). JSON in/out — no file upload. Up to 256 titles per request. Each result carries a confidence and a level: leaf (high confidence), root (medium — backs off to the parent category), or null (low — category_id is null). Metered at 1 credit per 10,000 titles.

# Phân loại title sản phẩm tiếng Việt → category_id (46 nhóm)
curl -X POST https://betapdf.com/api/v1/classify \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"titles":["Nước hoa nữ Chanel Coco 50ml","Tã dán Bobby M 64 miếng"]}'

{
  "results": [
    { "title": "Nước hoa nữ Chanel Coco 50ml",
      "slug": "nuoc-hoa", "category_id": "222",
      "confidence": 0.96, "level": "leaf" },
    { "title": "Tã dán Bobby M 64 miếng",
      "slug": "ta-bim", "category_id": "226",
      "confidence": 1.00, "level": "leaf" }
  ]
}
// level: leaf (conf cao) | root (trung bình, lùi nhóm cha) | null (thấp, category_id=null)

POST /v1/classify/jobs — Async (large batches)

Submit up to 100,000 titles in one job — returns 202 + job_id immediately. Poll GET /v1/classify/jobs/{id} for {status, total, done}; fetch GET /v1/classify/jobs/{id}/result when status=done. Jobs are kept 6 hours. Use this instead of /classify when you have thousands of titles (e.g. a full catalog backfill).

# Batch lớn (tới 100.000 title/job) → trả 202 + job_id
curl -X POST https://betapdf.com/api/v1/classify/jobs \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{"titles":["Nước hoa nữ Chanel Coco 50ml", "... (nhiều nghìn title)"]}'

curl https://betapdf.com/api/v1/classify/jobs/JOB_ID \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"
# → {"status":"processing"|"done"|"failed", "total":50000, "done":32768}

curl https://betapdf.com/api/v1/classify/jobs/JOB_ID/result \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"
# → {"results":[{title,slug,category_id,confidence,level}, ...]}   (409 nếu chưa xong)

GET /v1/categories

Returns the full taxonomy the classifier can output — 46 leaf categories + 12 roots, each with slug and category_id. No key required.

# Toàn bộ taxonomy (46 leaf + 12 root) — KHÔNG cần key
curl https://betapdf.com/api/v1/categories
# → { "leaves": [{slug, category_id, root_id, name}, ...], "roots": [{category_id, name}, ...] }

Response Shape

{
  "markdown": "# Heading\n\nParagraph text...\n",
  "chunks": [
    {
      "id": "a1b6...",
      "type": "title",      // title | paragraph | table | figure | formula
      "markdown": "Heading",
      "grounding": {
        "page": 0,
        "box":            {"left": 386, "top": 51, "right": 687, "bottom": 82},
        "box_normalized": {"left": 0.386, "top": 0.051, "right": 0.687, "bottom": 0.082}
      }
    },
    {
      "id": "f2c3...",
      "type": "table",
      "markdown": "<table><tr><td>STT</td><td>Item</td></tr>...</table>",
      "grounding": {
        "page": 1,
        "box":            {"left": 23, "top": 250, "right": 977, "bottom": 720},
        "box_normalized": {"left": 0.023, "top": 0.25, "right": 0.977, "bottom": 0.72}
      }
    }
  ],
  "metadata": {
    "filename": "input.pdf",
    "page_count": 9,
    "duration_ms": 22458,
    "credits_used": 9,
    "job_id": "994dc1f9-d20c-4384-8e3c-f99b8e59c524",
    "version": "2026.05",
    "pages": [
      { "page_no": 0, "width": 595.3, "height": 841.9, "unit": "pdf_point",
        "bbox_width": 1000.0, "bbox_height": 1000.0 }
    ]
  }
}

Category Taxonomy (46 leaf, 12 root)

Every classify result returns one of the slugs / category_ids below. On medium confidence it backs off to the root category_id (100–111); on low confidence category_id is null.

root_id	Root	category_id	slug	Leaf category
`100`	Thời Trang	`200`	`thoi-trang-nu`	Thời Trang Nữ
		`201`	`thoi-trang-nam`	Thời Trang Nam
		`202`	`thoi-trang-tre-em`	Thời Trang Trẻ Em
		`203`	`giay-dep-nu`	Giày Dép Nữ
		`204`	`giay-dep-nam`	Giày Dép Nam
		`205`	`tui-vi-nu`	Túi Ví Nữ
		`206`	`balo-tui-vi-nam`	Balo & Túi Ví Nam
		`207`	`phu-kien-thoi-trang`	Phụ Kiện Thời Trang
`101`	Điện Tử & Công Nghệ	`208`	`dien-thoai-phu-kien`	Điện Thoại & Phụ Kiện
		`209`	`may-tinh-laptop`	Máy Tính & Laptop
		`210`	`may-anh-may-quay`	Máy Ảnh & Máy Quay
		`211`	`am-thanh-tai-nghe`	Âm Thanh & Tai Nghe
		`212`	`thiet-bi-dien-tu`	Thiết Bị Điện Tử Khác
`102`	Gia Dụng & Đời Sống	`213`	`thiet-bi-dien-gia-dung`	Thiết Bị Điện Gia Dụng
		`214`	`nha-bep-an-uong`	Nhà Bếp & Ăn Uống
		`215`	`noi-that-trang-tri`	Nội Thất & Trang Trí
		`216`	`giat-giu-cham-soc-nha-cua`	Giặt Giũ & Chăm Sóc Nhà Cửa
		`217`	`dung-cu-thiet-bi-tien-ich`	Dụng Cụ & Thiết Bị Tiện Ích
`103`	Sức Khỏe & Làm Đẹp	`218`	`cham-soc-da-mat`	Chăm Sóc Da Mặt
		`219`	`trang-diem`	Trang Điểm
		`220`	`cham-soc-co-the`	Chăm Sóc Cơ Thể
		`221`	`cham-soc-toc`	Chăm Sóc Tóc
		`222`	`nuoc-hoa`	Nước Hoa
		`223`	`thuc-pham-chuc-nang`	Thực Phẩm Chức Năng
		`224`	`thiet-bi-lam-dep`	Dụng Cụ & Thiết Bị Làm Đẹp
		`225`	`cham-soc-suc-khoe`	Chăm Sóc Sức Khỏe & Y Tế
`104`	Mẹ & Bé	`226`	`ta-bim`	Tã & Bỉm
		`227`	`sua-thuc-pham-cho-be`	Sữa & Thực Phẩm Cho Bé
		`228`	`do-dung-me-be`	Đồ Dùng Mẹ & Bé
		`229`	`do-choi`	Đồ Chơi
`105`	Thể Thao & Du Lịch	`230`	`dung-cu-the-thao`	Dụng Cụ Thể Thao
		`231`	`trang-phuc-the-thao`	Trang Phục Thể Thao
		`232`	`du-lich-da-ngoai`	Du Lịch & Dã Ngoại
`106`	Phương Tiện	`233`	`o-to-phu-kien`	Ô Tô & Phụ Kiện
		`234`	`xe-may-phu-kien`	Xe Máy & Phụ Kiện
		`235`	`xe-dap-phu-kien`	Xe Đạp & Phụ Kiện
`107`	Bách Hóa & Thực Phẩm	`236`	`thuc-pham-do-uong`	Thực Phẩm & Đồ Uống
		`237`	`banh-keo-do-an-vat`	Bánh Kẹo & Đồ Ăn Vặt
		`238`	`bach-hoa-hang-ngay`	Bách Hóa & Đồ Dùng Hằng Ngày
`108`	Nhà Sách & Văn Phòng Phẩm	`239`	`sach`	Sách
`108`	Nhà Sách & Văn Phòng Phẩm	`240`	`van-phong-pham`	Văn Phòng Phẩm
`109`	Chăm Sóc Thú Cưng	`241`	`thuc-an-thu-cung`	Thức Ăn Thú Cưng
`109`	Chăm Sóc Thú Cưng	`242`	`phu-kien-thu-cung`	Phụ Kiện Thú Cưng
`110`	Đồng Hồ & Trang Sức	`243`	`dong-ho`	Đồng Hồ
`110`	Đồng Hồ & Trang Sức	`244`	`trang-suc-phu-kien`	Trang Sức & Phụ Kiện
`111`	Voucher & Dịch Vụ	`245`	`voucher-the-dich-vu`	Voucher & Thẻ Dịch Vụ

Code Snippets

Python

import httpx

with httpx.Client(timeout=90) as c, open("input.pdf", "rb") as f:
    r = c.post(
        "https://betapdf.com/api/v1/parse",
        headers={"Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"},
        files={"file": ("input.pdf", f, "application/pdf")},
    )
data = r.json()
print(data["markdown"])
for chunk in data["chunks"]:
    print(chunk["type"], chunk["grounding"]["page"], chunk["markdown"][:60])

Python (async)

import httpx, time

KEY = "beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

# 1. submit
with httpx.Client(timeout=60) as c, open("input.pdf", "rb") as f:
    r = c.post(
        "https://betapdf.com/api/v1/parse/jobs",
        headers={"Authorization": f"Bearer {KEY}"},
        files={"file": ("input.pdf", f, "application/pdf")},
    )
job_id = r.json()["job_id"]

# 2. poll
while True:
    s = httpx.get(
        f"https://betapdf.com/api/v1/parse/jobs/{job_id}",
        headers={"Authorization": f"Bearer {KEY}"},
    ).json()
    if s["status"] in ("completed", "failed"):
        break
    time.sleep(3)

# 3. result
if s["status"] == "completed":
    r = httpx.get(
        f"https://betapdf.com/api/v1/parse/jobs/{job_id}/result",
        headers={"Authorization": f"Bearer {KEY}"},
    )
    print(r.json()["markdown"])

Node.js / TypeScript

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("input.pdf")]), "input.pdf");

const r = await fetch("https://betapdf.com/api/v1/parse", {
  method: "POST",
  headers: { "Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" },
  body: form,
});
const data = await r.json();
console.log(data.markdown);
data.chunks.forEach(c => console.log(c.type, c.grounding?.page, c.markdown.slice(0, 60)));

Result Retention & Re-download

Result ZIPs are kept on our servers for re-download during your plan's retention window: Free 6 hours, Pro 7 days, Business 14 days. Re-fetch the raw ZIP with GET /v1/parse/jobs/{id}/download (returns 410 EXPIRED past the window). DELETE /v1/parse/jobs/{id} purges a result on demand for GDPR self-serve. Anonymous web uploads always use the 6-hour window.

Quotas & Limits

Pro: 1,000 pages/mo. Business: 5,000 pages/mo. Both tiers: 30 requests/minute per key. PDFs up to 100MB (sync ≤10 pages, async ≤50 pages); images (jpg/png/webp) up to 20MB count as 1 page. Hard caps — no overage in v1.

Errors

HTTP	Code	Description
401	`UNAUTHENTICATED`	Missing Authorization header
401	`INVALID_API_KEY`	Invalid or revoked key
403	`PLAN_REQUIRED`	API requires Pro or Business plan
403	`PLAN_EXPIRED`	Admin-granted plan has expired; renew to continue
403	`API_NOT_AVAILABLE_FOR_TOOL`	Tool not exposed via API
408	`SYNC_TIMEOUT`	Sync exceeded 60s — fetch via /jobs/{id}/result
413	`LIMIT_FILE_SIZE`	File exceeds 100MB
413	`TOO_MANY_PAGES_FOR_SYNC`	Use POST /v1/parse/jobs for > 10 pages
413	`TOO_MANY_PAGES`	File exceeds 50-page hard cap; split first
415	`UNSUPPORTED_FORMAT`	Only PDF accepted today
429	`RATE_LIMIT_EXCEEDED`	30 req/min per key
429	`QUOTA_EXCEEDED`	Monthly page cap reached
413	`BATCH_TOO_LARGE`	More than 256 titles in one /classify request
502	`UPSTREAM_ERROR`	Classifier backend temporarily unavailable (credits refunded)
503	`CLASSIFY_UNAVAILABLE`	Classifier not configured
404	`NOT_FOUND`	Async job id unknown or expired (6h)
409	`NOT_READY`	Async job still processing — poll status first
410	`EXPIRED`	Result expired past your plan's retention window
503	`DISK_FULL`	Service temporarily unavailable while freeing storage; retry shortly
500	`JOB_FAILED`	Processing error

Why BetaPDF

⚡ 15× faster than the previous generation. ~22-30s for a 9-page Vietnamese PDF (vLLM on GB10).
🇻🇳 99.7% Vietnamese diacritic accuracy on both digital and 300 DPI scanned PDFs.
💰 From $9.99/mo (Pro) or $29.99/mo (Business). Compare to Landing AI ADE Team at $250/mo.
🔁 Both sync and async endpoints. Same JSON shape (markdown + chunks + metadata) on both.