BetaPDF Developer API
REST API to convert PDFs and images (jpg/png/webp) into Markdown + structured JSON, optimized for ChatGPT, Claude, and RAG pipelines.
See pricing →Quickstart
Get a key, send a PDF or photo, get Markdown back. Three commands.
# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
-F "file=@input.pdf"
# Or: -F "file=@photo.jpg"Try the API live
Upload a PDF and see bbox overlays in the interactive playground (Pro or Business plan, signed-in users only).
Authentication
Every request must include a Bearer token. API access is available on Pro ($9.99/mo, 1,000 pages) and Business ($29.99/mo, 5,000 pages) — the Free tier cannot mint or use API keys. Get your key at /account/api-keys after upgrading. The plaintext is shown ONCE on creation — store it like a password.
Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxxEndpoints
POST /v1/parse — Synchronous
Use for PDFs ≤ 10 pages or a single image (jpg/png/webp). Blocks until done (typically 7-30s). Returns the full result inline. For larger PDFs use the async endpoint.
# Accepts PDF (≤10 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
-F "file=@input.pdf"
# Or: -F "file=@photo.jpg"POST /v1/parse/jobs — Asynchronous
Returns 202 + job_id immediately. Use for PDFs up to 50 pages (images work here too, but the sync endpoint is faster for them). Poll GET /v1/parse/jobs/{id}; fetch with GET /v1/parse/jobs/{id}/result when status=completed.
# Accepts PDF (≤50 pages) or image (jpg/png/webp, ≤20MB)
curl -X POST https://betapdf.com/api/v1/parse/jobs \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" \
-F "file=@input.pdf"curl https://betapdf.com/api/v1/parse/jobs/JOB_ID \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"curl https://betapdf.com/api/v1/parse/jobs/JOB_ID/result \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"GET /v1/usage
Check your current month's page count and remaining quota.
curl https://betapdf.com/api/v1/usage \
-H "Authorization: Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"Response Shape
{
"markdown": "# Heading\n\nParagraph text...\n",
"chunks": [
{
"id": "a1b6...",
"type": "title", // title | paragraph | table | figure | formula
"markdown": "Heading",
"grounding": {
"page": 0,
"box": {"left": 386, "top": 51, "right": 687, "bottom": 82},
"box_normalized": {"left": 0.386, "top": 0.051, "right": 0.687, "bottom": 0.082}
}
},
{
"id": "f2c3...",
"type": "table",
"markdown": "<table><tr><td>STT</td><td>Item</td></tr>...</table>",
"grounding": {
"page": 1,
"box": {"left": 23, "top": 250, "right": 977, "bottom": 720},
"box_normalized": {"left": 0.023, "top": 0.25, "right": 0.977, "bottom": 0.72}
}
}
],
"metadata": {
"filename": "input.pdf",
"page_count": 9,
"duration_ms": 22458,
"credits_used": 9,
"job_id": "994dc1f9-d20c-4384-8e3c-f99b8e59c524",
"version": "2026.05",
"pages": [
{ "page_no": 0, "width": 595.3, "height": 841.9, "unit": "pdf_point",
"bbox_width": 1000.0, "bbox_height": 1000.0 }
]
}
}Code Snippets
Python
import httpx
with httpx.Client(timeout=90) as c, open("input.pdf", "rb") as f:
r = c.post(
"https://betapdf.com/api/v1/parse",
headers={"Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"},
files={"file": ("input.pdf", f, "application/pdf")},
)
data = r.json()
print(data["markdown"])
for chunk in data["chunks"]:
print(chunk["type"], chunk["grounding"]["page"], chunk["markdown"][:60])Python (async)
import httpx, time
KEY = "beta_live_xxxxxxxxxxxxxxxxxxxxxxxx"
# 1. submit
with httpx.Client(timeout=60) as c, open("input.pdf", "rb") as f:
r = c.post(
"https://betapdf.com/api/v1/parse/jobs",
headers={"Authorization": f"Bearer {KEY}"},
files={"file": ("input.pdf", f, "application/pdf")},
)
job_id = r.json()["job_id"]
# 2. poll
while True:
s = httpx.get(
f"https://betapdf.com/api/v1/parse/jobs/{job_id}",
headers={"Authorization": f"Bearer {KEY}"},
).json()
if s["status"] in ("completed", "failed"):
break
time.sleep(3)
# 3. result
if s["status"] == "completed":
r = httpx.get(
f"https://betapdf.com/api/v1/parse/jobs/{job_id}/result",
headers={"Authorization": f"Bearer {KEY}"},
)
print(r.json()["markdown"])Node.js / TypeScript
import fs from "node:fs";
const form = new FormData();
form.append("file", new Blob([fs.readFileSync("input.pdf")]), "input.pdf");
const r = await fetch("https://betapdf.com/api/v1/parse", {
method: "POST",
headers: { "Authorization": "Bearer beta_live_xxxxxxxxxxxxxxxxxxxxxxxx" },
body: form,
});
const data = await r.json();
console.log(data.markdown);
data.chunks.forEach(c => console.log(c.type, c.grounding?.page, c.markdown.slice(0, 60)));Result Retention & Re-download
Result ZIPs are kept on our servers for re-download during your plan's retention window: Free 6 hours, Pro 7 days, Business 14 days. Re-fetch the raw ZIP with GET /v1/parse/jobs/{id}/download (returns 410 EXPIRED past the window). DELETE /v1/parse/jobs/{id} purges a result on demand for GDPR self-serve. Anonymous web uploads always use the 6-hour window.
Quotas & Limits
Pro: 1,000 pages/mo. Business: 5,000 pages/mo. Both tiers: 30 requests/minute per key. PDFs up to 100MB (sync ≤10 pages, async ≤50 pages); images (jpg/png/webp) up to 20MB count as 1 page. Hard caps — no overage in v1.
Errors
| HTTP | Code | Description |
|---|---|---|
| 401 | UNAUTHENTICATED | Missing Authorization header |
| 401 | INVALID_API_KEY | Invalid or revoked key |
| 403 | PLAN_REQUIRED | API requires Pro or Business plan |
| 403 | PLAN_EXPIRED | Admin-granted plan has expired; renew to continue |
| 403 | API_NOT_AVAILABLE_FOR_TOOL | Tool not exposed via API |
| 408 | SYNC_TIMEOUT | Sync exceeded 60s — fetch via /jobs/{id}/result |
| 413 | LIMIT_FILE_SIZE | File exceeds 100MB |
| 413 | TOO_MANY_PAGES_FOR_SYNC | Use POST /v1/parse/jobs for > 10 pages |
| 413 | TOO_MANY_PAGES | File exceeds 50-page hard cap; split first |
| 415 | UNSUPPORTED_FORMAT | Only PDF accepted today |
| 429 | RATE_LIMIT_EXCEEDED | 30 req/min per key |
| 429 | QUOTA_EXCEEDED | Monthly page cap reached |
| 410 | EXPIRED | Result expired past your plan's retention window |
| 503 | DISK_FULL | Service temporarily unavailable while freeing storage; retry shortly |
| 500 | JOB_FAILED | Processing error |
Why BetaPDF
- ⚡ 15× faster than the previous generation. ~22-30s for a 9-page Vietnamese PDF (vLLM on GB10).
- 🇻🇳 99.7% Vietnamese diacritic accuracy on both digital and 300 DPI scanned PDFs.
- 💰 From $9.99/mo (Pro) or $29.99/mo (Business). Compare to Landing AI ADE Team at $250/mo.
- 🔁 Both sync and async endpoints. Same JSON shape (markdown + chunks + metadata) on both.