Overview
CAPA's backend is a FastAPI application that loads a trained PyTorch model on startup and exposes a REST API for inference. It runs inside Docker on HuggingFace Spaces — always-on, CPU-only, free tier.
The model pipeline: donor and recipient HLA alleles are resolved to protein sequences, embedded with frozen ESM-2 vectors (or looked up from a pre-built cache), passed through a cross-attention interaction network, and decoded by a DeepHit competing-risks survival head that jointly predicts the time-to-event distribution for GvHD, relapse, and TRM.
Base URL: https://coconutmocha-capa.hf.space
No authentication. CORS is open (*). Rate limiting may apply under heavy load.
What the model returns
For every prediction you get three cumulative incidence functions (CIF) — one per competing event — evaluated at 100 evenly-spaced time points from 0 to 730 days post-transplant. Each CIF is a monotone non-decreasing curve in [0, 1]. The scalar risk_score is simply cif[−1]: the estimated probability of the event occurring within two years.
Quick start
The fastest way to try the API is a single curl call. You need at least one HLA locus for both donor and recipient.
curl https://coconutmocha-capa.hf.space/health
curl -s -X POST https://coconutmocha-capa.hf.space/predict \ -H "Content-Type: application/json" \ -d '{ "donor_hla": { "A": "A*02:01", "B": "B*07:02", "DRB1": "DRB1*15:01" }, "recipient_hla": { "A": "A*01:01", "B": "B*08:01", "DRB1": "DRB1*03:01" } }'
import requests payload = { "donor_hla": { "A": "A*02:01", "B": "B*07:02", "C": "C*07:02", "DRB1": "DRB1*15:01", "DQB1": "DQB1*06:02" }, "recipient_hla": { "A": "A*01:01", "B": "B*08:01", "C": "C*07:01", "DRB1": "DRB1*03:01", "DQB1": "DQB1*02:01" }, "clinical": { "age_recipient": 12, "age_donor": 34, "disease": "ALL", "conditioning": "MAC", "donor_type": "MUD", "stem_cell_source": "BM", "sex_mismatch": 0 } } r = requests.post( "https://coconutmocha-capa.hf.space/predict", json=payload, timeout=30, ) r.raise_for_status() data = r.json() print(f"GvHD 2-yr risk: {data['gvhd']['risk_score']:.3f}") print(f"Relapse 2-yr risk: {data['relapse']['risk_score']:.3f}") print(f"TRM 2-yr risk: {data['trm']['risk_score']:.3f}") print(f"Mismatches: {data['mismatch_count']}")
const res = await fetch("https://coconutmocha-capa.hf.space/predict", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ donor_hla: { A: "A*02:01", DRB1: "DRB1*15:01" }, recipient_hla: { A: "A*01:01", DRB1: "DRB1*03:01" }, }), }); const data = await res.json(); console.log(data.gvhd.risk_score); // e.g. 0.331
Endpoints
Always returns HTTP 200. Check the ready field to know if the model loaded successfully. Use this before sending predictions in scripts to avoid silent 503s.
{
"status": "ok",
"model_version": "model",
"ready": true, // false if checkpoint missing or corrupt
"startup_error": null, // human-readable error string if ready=false
"uptime_seconds": 1382.4,
"device": "cpu"
}
Takes donor HLA typing, recipient HLA typing, and optional clinical covariates. Returns competing-risk CIF curves for GvHD, relapse, and TRM. Requires at least one HLA locus on each side; missing loci are filled with zero embeddings.
Returns 503 if the model checkpoint has not loaded; 422 if neither donor nor recipient has any HLA locus.
Accepts a recipient and a list of 2–20 candidate donors with optional labels. Runs /predict for each pair and returns them ranked by composite acute-risk score (GvHD + TRM, lower is better). Useful for donor selection simulations.
Request format
All endpoints accept and return JSON (Content-Type: application/json).
HLA typing object
All fields are optional strings. Supply whichever loci you have — the model runs on any subset. Use standard IMGT allele notation (A*02:01, not A2).
| Field | Type | Example | Notes |
|---|---|---|---|
| A | string | "A*02:01" | HLA-A allele |
| B | string | "B*07:02" | HLA-B allele |
| C | string | "C*07:02" | HLA-C allele |
| DRB1 | string | "DRB1*15:01" | HLA-DRB1 allele |
| DQB1 | string | "DQB1*06:02" | HLA-DQB1 allele |
| DPB1 | string | "DPB1*04:01" | HLA-DPB1 (optional 6th locus) |
Clinical covariates object optional
All fields are optional. Missing values are imputed with zeros or the "unknown" category — the model will still run, just with less information.
| Field | Type | Example | Notes |
|---|---|---|---|
| age_recipient | number | 12 | Years |
| age_donor | number | 34 | Years |
| cd34_dose | number | 5.2 | ×10⁶ cells/kg |
| sex_mismatch | 0 or 1 | 1 | 1 = donor/recipient sex differ |
| disease | string | "ALL" | ALL · AML · CML · MDS · NHL · HD · AA · MM · other |
| conditioning | string | "MAC" | MAC · RIC · NMA |
| donor_type | string | "MUD" | MSD · MUD · MMUD · haplo · cord |
| stem_cell_source | string | "BM" | BM · PBSC · cord |
Full /predict request body
{
"donor_hla": { // required — at least one locus
"A": "A*02:01",
"B": "B*07:02",
"C": "C*07:02",
"DRB1": "DRB1*15:01",
"DQB1": "DQB1*06:02"
},
"recipient_hla": { // required — at least one locus
"A": "A*01:01",
"B": "B*08:01",
"C": "C*07:01",
"DRB1": "DRB1*03:01",
"DQB1": "DQB1*02:01"
},
"clinical": { // optional — all sub-fields optional
"age_recipient": 12,
"age_donor": 34,
"disease": "ALL",
"conditioning": "MAC",
"donor_type": "MUD",
"stem_cell_source": "BM",
"sex_mismatch": 0
}
}
Response format
/predict response
{
"gvhd": {
"cumulative_incidence": [0.0, 0.003, …, 0.331], // 100 values, 0–730 days
"risk_score": 0.331, // = CIF at day 730
"time_points": [0.0, 7.37, …, 730.0] // 100 day values
},
"relapse": { /* same shape */ },
"trm": { /* same shape */ },
"attention_weights": [ // n_loci × n_loci matrix
[0.42, 0.12, 0.11, 0.21, 0.14],
…
],
"mismatch_count": 3, // allele-level loci mismatches
"model_version": "model"
}
| Field | Type | Description |
|---|---|---|
| gvhd / relapse / trm | object | Competing-risk event block — CIF array, risk score, time points |
| cumulative_incidence | float[100] | Monotone CIF values in [0, 1] at each of the 100 time points |
| risk_score | float | 2-year cumulative incidence probability (CIF at day 730) |
| time_points | float[100] | Day values from 0 to 730 corresponding to each CIF entry |
| attention_weights | float[][] | null | n_loci × n_loci cross-attention matrix (donor→recipient, last layer) |
| mismatch_count | int | Number of loci where donor and recipient alleles differ |
| model_version | string | Identifier of the loaded checkpoint |
/compare response
Returns all donors ranked by ascending composite score (GvHD risk + TRM risk). The best donor is first and identified by best_donor_label.
{
"donors": [
{
"label": "Donor A",
"rank": 1,
"gvhd_risk": 0.28,
"relapse_risk": 0.31,
"trm_risk": 0.19,
"mismatch_count": 1,
"full_prediction": { /* full /predict response */ }
},
…
],
"best_donor_label": "Donor A",
"model_version": "model"
}
Run locally
Two options: Docker (self-contained, matches the HF Space exactly) or uv (faster iteration during development).
Option A — Docker
-
Clone the backend repo
The Docker image lives in the HuggingFace Space repository, not the main GitHub repo.
git clone https://huggingface.co/spaces/coconutmocha/capa capa-backend cd capa-backend
-
Build the image
docker build -t capa-backend .
-
Run the container
docker run -p 7860:7860 capa-backend
The server starts on
http://localhost:7860. Visit/healthto confirm it's up.
Option B — uv (development)
-
Clone the backend repo and install
git clone https://huggingface.co/spaces/coconutmocha/capa capa-backend cd capa-backend uv sync
Requires Python 3.11+ and uv. Install uv with
curl -LsSf https://astral.sh/uv/install.sh | sh. -
Checkpoint is already included
The repo ships with a bundled checkpoint at
runs/best/model.pt. The server finds it automatically — no extra config needed. To override, setCAPA_CHECKPOINTto any valid.ptpath. -
Start the server
uv run uvicorn capa.api.predict:app \ --reload --host 0.0.0.0 --port 8000
The API is now at
http://localhost:8000. The--reloadflag restarts on code changes. -
Point the frontend at your local server
Edit
web/config.jsand change theapiUrl:window.CAPA_CONFIG = { apiUrl: 'http://localhost:8000' };Then open
web/predict.htmlin a browser — the prediction UI now calls your local server.
Without a pre-built HDF5 embedding cache the model uses zero vectors for unknown alleles and logs a warning. This is fine for smoke-testing. To build the cache with real ESM-2 embeddings, run uv run python scripts/preprocess.py — this downloads IPD-IMGT/HLA sequences and runs ESM-2 inference (requires ~4 GB RAM and takes a few minutes on CPU).
Configuration
All runtime settings are controlled via environment variables. No config files to edit — set variables before starting the server or pass them to docker run -e.
| Variable | Default | Description |
|---|---|---|
| CAPA_CHECKPOINT | runs/best/model.pt | Absolute or relative path to the .pt checkpoint file. |
| CAPA_EMBED__CACHE_PATH | data/processed/hla_embeddings.h5 | HDF5 embedding cache. If absent, zero vectors are used for all alleles. |
| CAPA_EMBED__DEVICE | cpu | PyTorch device for ESM-2 inference: cpu, cuda, or mps. |
| CAPA_CORS_ORIGINS | * | Comma-separated list of allowed CORS origins, e.g. https://your-frontend.vercel.app. |
Switching between the live and local backend
The frontend reads web/config.js at runtime. Change apiUrl there — no rebuild needed, just refresh the page.
window.CAPA_CONFIG = {
// Live HF Space (production)
apiUrl: 'https://coconutmocha-capa.hf.space'
// Local dev server
// apiUrl: 'http://localhost:8000'
};