BetaPDF Developer API

REST API to convert PDFs and images (jpg/png/webp) into Markdown + structured JSON, optimized for ChatGPT, Claude, and RAG pipelines.

See pricing →

Quickstart

Get a key, send a PDF or photo, get Markdown back. Three commands.

# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"
# Or:  -F "file=@photo.jpg"

Try the API live

Upload a PDF and see bbox overlays in the interactive playground (Pro or Business plan, signed-in users only).

Open Playground →

Authentication

Every request must include a Bearer token. API access is available on Pro ($9.99/mo, 1,000 pages) and Business ($29.99/mo, 5,000 pages) — the Free tier cannot mint or use API keys. Get your key at /account/api-keys after upgrading. The plaintext is shown ONCE on creation — store it like a password.

Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx

Endpoints

POST /v1/parse — Synchronous

Use for PDFs ≤ 10 pages or a single image (jpg/png/webp). Blocks until done (typically 7-30s). Returns the full result inline. For larger PDFs use the async endpoint.

# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"
# Or:  -F "file=@photo.jpg"

POST /v1/parse/jobs — Asynchronous

Returns 202 + job_id immediately. Use for PDFs up to 50 pages (images work here too, but the sync endpoint is faster for them). Poll GET /v1/parse/jobs/{id}; fetch with GET /v1/parse/jobs/{id}/result when status=completed.

# Accepts PDF (≤50 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse/jobs \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
  -F "file=@input.pdf"
curl https://betapdf.com/api/v1/parse/jobs/JOB_ID \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"
curl https://betapdf.com/api/v1/parse/jobs/JOB_ID/result \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

GET /v1/usage

Check your current month's page count and remaining quota.

curl https://betapdf.com/api/v1/usage \
  -H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

Response Shape

{
  "markdown": "# Heading\n\nParagraph text...\n",
  "chunks": [
    {
      "id": "a1b6...",
      "type": "title",      // title | paragraph | table | figure | formula
      "markdown": "Heading",
      "grounding": {
        "page": 0,
        "box":            {"left": 386, "top": 51, "right": 687, "bottom": 82},
        "box_normalized": {"left": 0.386, "top": 0.051, "right": 0.687, "bottom": 0.082}
      }
    },
    {
      "id": "f2c3...",
      "type": "table",
      "markdown": "<table><tr><td>STT</td><td>Item</td></tr>...</table>",
      "grounding": {
        "page": 1,
        "box":            {"left": 23, "top": 250, "right": 977, "bottom": 720},
        "box_normalized": {"left": 0.023, "top": 0.25, "right": 0.977, "bottom": 0.72}
      }
    }
  ],
  "metadata": {
    "filename": "input.pdf",
    "page_count": 9,
    "duration_ms": 22458,
    "credits_used": 9,
    "job_id": "994dc1f9-d20c-4384-8e3c-f99b8e59c524",
    "version": "2026.05",
    "pages": [
      { "page_no": 0, "width": 595.3, "height": 841.9, "unit": "pdf_point",
        "bbox_width": 1000.0, "bbox_height": 1000.0 }
    ]
  }
}

Code Snippets

Python

import httpx

with httpx.Client(timeout=90) as c, open("input.pdf", "rb") as f:
    r = c.post(
        "https://betapdf.com/api/v1/parse",
        headers={"Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"},
        files={"file": ("input.pdf", f, "application/pdf")},
    )
data = r.json()
print(data["markdown"])
for chunk in data["chunks"]:
    print(chunk["type"], chunk["grounding"]["page"], chunk["markdown"][:60])

Python (async)

import httpx, time

KEY = "beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"

# 1. submit
with httpx.Client(timeout=60) as c, open("input.pdf", "rb") as f:
    r = c.post(
        "https://betapdf.com/api/v1/parse/jobs",
        headers={"Authorization": f"Bearer {KEY}"},
        files={"file": ("input.pdf", f, "application/pdf")},
    )
job_id = r.json()["job_id"]

# 2. poll
while True:
    s = httpx.get(
        f"https://betapdf.com/api/v1/parse/jobs/{job_id}",
        headers={"Authorization": f"Bearer {KEY}"},
    ).json()
    if s["status"] in ("completed", "failed"):
        break
    time.sleep(3)

# 3. result
if s["status"] == "completed":
    r = httpx.get(
        f"https://betapdf.com/api/v1/parse/jobs/{job_id}/result",
        headers={"Authorization": f"Bearer {KEY}"},
    )
    print(r.json()["markdown"])

Node.js / TypeScript

import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("input.pdf")]), "input.pdf");

const r = await fetch("https://betapdf.com/api/v1/parse", {
  method: "POST",
  headers: { "Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" },
  body: form,
});
const data = await r.json();
console.log(data.markdown);
data.chunks.forEach(c => console.log(c.type, c.grounding?.page, c.markdown.slice(0, 60)));

Result Retention & Re-download

Result ZIPs are kept on our servers for re-download during your plan's retention window: Free 6 hours, Pro 7 days, Business 14 days. Re-fetch the raw ZIP with GET /v1/parse/jobs/{id}/download (returns 410 EXPIRED past the window). DELETE /v1/parse/jobs/{id} purges a result on demand for GDPR self-serve. Anonymous web uploads always use the 6-hour window.

Quotas & Limits

Pro: 1,000 pages/mo. Business: 5,000 pages/mo. Both tiers: 30 requests/minute per key. PDFs up to 100MB (sync ≤10 pages, async ≤50 pages); images (jpg/png/webp) up to 20MB count as 1 page. Hard caps — no overage in v1.

Errors

HTTPCodeDescription
401UNAUTHENTICATEDMissing Authorization header
401INVALID_API_KEYInvalid or revoked key
403PLAN_REQUIREDAPI requires Pro or Business plan
403PLAN_EXPIREDAdmin-granted plan has expired; renew to continue
403API_NOT_AVAILABLE_FOR_TOOLTool not exposed via API
408SYNC_TIMEOUTSync exceeded 60s — fetch via /jobs/{id}/result
413LIMIT_FILE_SIZEFile exceeds 100MB
413TOO_MANY_PAGES_FOR_SYNCUse POST /v1/parse/jobs for > 10 pages
413TOO_MANY_PAGESFile exceeds 50-page hard cap; split first
415UNSUPPORTED_FORMATOnly PDF accepted today
429RATE_LIMIT_EXCEEDED30 req/min per key
429QUOTA_EXCEEDEDMonthly page cap reached
410EXPIREDResult expired past your plan's retention window
503DISK_FULLService temporarily unavailable while freeing storage; retry shortly
500JOB_FAILEDProcessing error

Why BetaPDF

  • 15× faster than the previous generation. ~22-30s for a 9-page Vietnamese PDF (vLLM on GB10).
  • 🇻🇳 99.7% Vietnamese diacritic accuracy on both digital and 300 DPI scanned PDFs.
  • 💰 From $9.99/mo (Pro) or $29.99/mo (Business). Compare to Landing AI ADE Team at $250/mo.
  • 🔁 Both sync and async endpoints. Same JSON shape (markdown + chunks + metadata) on both.
See pricing →