API Docs — CAPA

Overview

CAPA's backend is a FastAPI application that loads a trained PyTorch model on startup and exposes a REST API for inference. It runs inside Docker on HuggingFace Spaces — always-on, CPU-only, free tier.

The model pipeline: donor and recipient HLA alleles are resolved to protein sequences, embedded with frozen ESM-2 vectors (or looked up from a pre-built cache), passed through a cross-attention interaction network, and decoded by a DeepHit competing-risks survival head that jointly predicts the time-to-event distribution for GvHD, relapse, and TRM.

Live endpoint

Base URL: https://coconutmocha-capa.hf.space
No authentication. CORS is open (*). Rate limiting may apply under heavy load.

What the model returns

For every prediction you get three cumulative incidence functions (CIF) — one per competing event — evaluated at 100 evenly-spaced time points from 0 to 730 days post-transplant. Each CIF is a monotone non-decreasing curve in [0, 1]. The scalar risk_score is simply cif[−1]: the estimated probability of the event occurring within two years.

Quick start

The fastest way to try the API is a single curl call. You need at least one HLA locus for both donor and recipient.

Shell — check server health

curl https://coconutmocha-capa.hf.space/health

Shell — minimal prediction (3 mismatched loci)

curl -s -X POST https://coconutmocha-capa.hf.space/predict \
  -H "Content-Type: application/json" \
  -d '{
    "donor_hla":     { "A": "A*02:01", "B": "B*07:02", "DRB1": "DRB1*15:01" },
    "recipient_hla": { "A": "A*01:01", "B": "B*08:01", "DRB1": "DRB1*03:01" }
  }'

Python — full example with clinical covariates

import requests

payload = {
    "donor_hla": {
        "A": "A*02:01", "B": "B*07:02", "C": "C*07:02",
        "DRB1": "DRB1*15:01", "DQB1": "DQB1*06:02"
    },
    "recipient_hla": {
        "A": "A*01:01", "B": "B*08:01", "C": "C*07:01",
        "DRB1": "DRB1*03:01", "DQB1": "DQB1*02:01"
    },
    "clinical": {
        "age_recipient": 12,
        "age_donor": 34,
        "disease": "ALL",
        "conditioning": "MAC",
        "donor_type": "MUD",
        "stem_cell_source": "BM",
        "sex_mismatch": 0
    }
}

r = requests.post(
    "https://coconutmocha-capa.hf.space/predict",
    json=payload,
    timeout=30,
)
r.raise_for_status()
data = r.json()

print(f"GvHD 2-yr risk:    {data['gvhd']['risk_score']:.3f}")
print(f"Relapse 2-yr risk: {data['relapse']['risk_score']:.3f}")
print(f"TRM 2-yr risk:     {data['trm']['risk_score']:.3f}")
print(f"Mismatches:        {data['mismatch_count']}")

JavaScript (fetch)

const res = await fetch("https://coconutmocha-capa.hf.space/predict", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    donor_hla:     { A: "A*02:01", DRB1: "DRB1*15:01" },
    recipient_hla: { A: "A*01:01", DRB1: "DRB1*03:01" },
  }),
});
const data = await res.json();
console.log(data.gvhd.risk_score);  // e.g. 0.331

Endpoints

GET /health Liveness + readiness probe

Always returns HTTP 200. Check the ready field to know if the model loaded successfully. Use this before sending predictions in scripts to avoid silent 503s.

Response

{
  "status":         "ok",
  "model_version":  "model",
  "ready":          true,      // false if checkpoint missing or corrupt
  "startup_error":  null,      // human-readable error string if ready=false
  "uptime_seconds": 1382.4,
  "device":         "cpu"
}

POST /predict Single donor–recipient pair

Takes donor HLA typing, recipient HLA typing, and optional clinical covariates. Returns competing-risk CIF curves for GvHD, relapse, and TRM. Requires at least one HLA locus on each side; missing loci are filled with zero embeddings.

Returns 503 if the model checkpoint has not loaded; 422 if neither donor nor recipient has any HLA locus.

POST /compare Rank multiple donors for one recipient

Accepts a recipient and a list of 2–20 candidate donors with optional labels. Runs /predict for each pair and returns them ranked by composite acute-risk score (GvHD + TRM, lower is better). Useful for donor selection simulations.

Request format

All endpoints accept and return JSON (Content-Type: application/json).

HLA typing object

All fields are optional strings. Supply whichever loci you have — the model runs on any subset. Use standard IMGT allele notation (A*02:01, not A2).

Field	Type	Example	Notes
A	string	`"A*02:01"`	HLA-A allele
B	string	`"B*07:02"`	HLA-B allele
C	string	`"C*07:02"`	HLA-C allele
DRB1	string	`"DRB1*15:01"`	HLA-DRB1 allele
DQB1	string	`"DQB1*06:02"`	HLA-DQB1 allele
DPB1	string	`"DPB1*04:01"`	HLA-DPB1 (optional 6th locus)

Clinical covariates object optional

All fields are optional. Missing values are imputed with zeros or the "unknown" category — the model will still run, just with less information.

Field	Type	Example	Notes
age_recipient	number	`12`	Years
age_donor	number	`34`	Years
cd34_dose	number	`5.2`	×10⁶ cells/kg
sex_mismatch	0 or 1	`1`	1 = donor/recipient sex differ
disease	string	`"ALL"`	`ALL` · `AML` · `CML` · `MDS` · `NHL` · `HD` · `AA` · `MM` · `other`
conditioning	string	`"MAC"`	`MAC` · `RIC` · `NMA`
donor_type	string	`"MUD"`	`MSD` · `MUD` · `MMUD` · `haplo` · `cord`
stem_cell_source	string	`"BM"`	`BM` · `PBSC` · `cord`

Full /predict request body

JSON schema

{
  "donor_hla": {          // required — at least one locus
    "A":    "A*02:01",
    "B":    "B*07:02",
    "C":    "C*07:02",
    "DRB1": "DRB1*15:01",
    "DQB1": "DQB1*06:02"
  },
  "recipient_hla": {      // required — at least one locus
    "A":    "A*01:01",
    "B":    "B*08:01",
    "C":    "C*07:01",
    "DRB1": "DRB1*03:01",
    "DQB1": "DQB1*02:01"
  },
  "clinical": {            // optional — all sub-fields optional
    "age_recipient":    12,
    "age_donor":        34,
    "disease":          "ALL",
    "conditioning":     "MAC",
    "donor_type":       "MUD",
    "stem_cell_source": "BM",
    "sex_mismatch":     0
  }
}

Response format

/predict response

JSON

{
  "gvhd": {
    "cumulative_incidence": [0.0, 0.003, …, 0.331],  // 100 values, 0–730 days
    "risk_score":          0.331,                       // = CIF at day 730
    "time_points":         [0.0, 7.37, …, 730.0]        // 100 day values
  },
  "relapse": { /* same shape */ },
  "trm":     { /* same shape */ },
  "attention_weights": [                               // n_loci × n_loci matrix
    [0.42, 0.12, 0.11, 0.21, 0.14],
    …
  ],
  "mismatch_count": 3,                                // allele-level loci mismatches
  "model_version":  "model"
}

Field	Type	Description
gvhd / relapse / trm	object	Competing-risk event block — CIF array, risk score, time points
cumulative_incidence	float[100]	Monotone CIF values in [0, 1] at each of the 100 time points
risk_score	float	2-year cumulative incidence probability (CIF at day 730)
time_points	float[100]	Day values from 0 to 730 corresponding to each CIF entry
attention_weights	float[][] \| null	n_loci × n_loci cross-attention matrix (donor→recipient, last layer)
mismatch_count	int	Number of loci where donor and recipient alleles differ
model_version	string	Identifier of the loaded checkpoint

/compare response

Returns all donors ranked by ascending composite score (GvHD risk + TRM risk). The best donor is first and identified by best_donor_label.

JSON

{
  "donors": [
    {
      "label":          "Donor A",
      "rank":           1,
      "gvhd_risk":      0.28,
      "relapse_risk":   0.31,
      "trm_risk":       0.19,
      "mismatch_count": 1,
      "full_prediction": { /* full /predict response */ }
    },
    …
  ],
  "best_donor_label": "Donor A",
  "model_version":    "model"
}

Run locally

Two options: Docker (self-contained, matches the HF Space exactly) or uv (faster iteration during development).

Option A — Docker

Clone the backend repo
The Docker image lives in the HuggingFace Space repository, not the main GitHub repo.
```
git clone https://huggingface.co/spaces/coconutmocha/capa capa-backend
cd capa-backend
```
Build the image
```
docker build -t capa-backend .
```
Run the container
```
docker run -p 7860:7860 capa-backend
```
The server starts on http://localhost:7860. Visit /health to confirm it's up.

Option B — uv (development)

Clone the backend repo and install
```
git clone https://huggingface.co/spaces/coconutmocha/capa capa-backend
cd capa-backend
uv sync
```
Requires Python 3.11+ and uv. Install uv with curl -LsSf https://astral.sh/uv/install.sh | sh.
Checkpoint is already included
The repo ships with a bundled checkpoint at runs/best/model.pt. The server finds it automatically — no extra config needed. To override, set CAPA_CHECKPOINT to any valid .pt path.
Start the server
```
uv run uvicorn capa.api.predict:app \
  --reload --host 0.0.0.0 --port 8000
```
The API is now at http://localhost:8000. The --reload flag restarts on code changes.
Point the frontend at your local server
Edit web/config.js and change the apiUrl:
```
window.CAPA_CONFIG = {
  apiUrl: 'http://localhost:8000'
};
```
Then open web/predict.html in a browser — the prediction UI now calls your local server.

Embedding cache

Without a pre-built HDF5 embedding cache the model uses zero vectors for unknown alleles and logs a warning. This is fine for smoke-testing. To build the cache with real ESM-2 embeddings, run uv run python scripts/preprocess.py — this downloads IPD-IMGT/HLA sequences and runs ESM-2 inference (requires ~4 GB RAM and takes a few minutes on CPU).

Configuration

All runtime settings are controlled via environment variables. No config files to edit — set variables before starting the server or pass them to docker run -e.

Variable	Default	Description
CAPA_CHECKPOINT	`runs/best/model.pt`	Absolute or relative path to the `.pt` checkpoint file.
CAPA_EMBED__CACHE_PATH	`data/processed/hla_embeddings.h5`	HDF5 embedding cache. If absent, zero vectors are used for all alleles.
CAPA_EMBED__DEVICE	`cpu`	PyTorch device for ESM-2 inference: `cpu`, `cuda`, or `mps`.
CAPA_CORS_ORIGINS	`*`	Comma-separated list of allowed CORS origins, e.g. `https://your-frontend.vercel.app`.

Switching between the live and local backend

The frontend reads web/config.js at runtime. Change apiUrl there — no rebuild needed, just refresh the page.

web/config.js

window.CAPA_CONFIG = {
  // Live HF Space (production)
  apiUrl: 'https://coconutmocha-capa.hf.space'

  // Local dev server
  // apiUrl: 'http://localhost:8000'
};

Try it live → View source

Baselines & ablations

CAPA ships a full competing-risks baseline suite in capa/model/baselines.py. Every model exposes the same fit / predict_cif → (n, K, T) interface so comparisons are trivial to add.

Statistical baselines

Model	CLI key	What it tests
Fine-Gray	`finegray`	Standard subdistribution hazard regression (IPCW-weighted Cox, Geskus 2011). The clinical paper-review baseline.
Cox PH (cause-specific)	`cox`	Separate Cox model per event; competing events treated as censored. Simpler but statistically approximate.
Random Survival Forest	`rsf`	Non-parametric tree ensemble. Tests whether non-linear tabular interactions explain anything beyond the Cox models.
Gradient Boosting	`gbm`	`GradientBoostingSurvivalAnalysis` (scikit-survival). The strongest tabular ML baseline — what CAPA must beat to claim the embedding is doing real work.
Eplet Proxy (Cox)	`eplet`	Amino acid mismatches at known eplet positions (HLAMatchmaker / PIRCHE-II approximation) as Cox covariates. The actual clinical state-of-the-art; the hardest baseline to beat.

Deep / ablation baselines

Model	CLI key	What it tests
CAPA-OneHot	`capa_onehot`	Full CAPA architecture with learned allele embeddings — no ESM-2. Isolates whether the cross-attention architecture itself is doing work.
CAPA-BLOSUM	`blosum`	BLOSUM62 mean-pool allele embeddings (20-dim) fed into the same cross-attention network. Tests whether biochemical similarity captures what ESM-2 does, or whether evolutionary/structural context is necessary.
CAPA (full)	`capa`	ESM-2 1280-dim embeddings + cross-attention + DeepHit. The proposed model.

Running a comparison

Synthetic smoke-test (no data needed)

# All 8 models, 200 subjects, 50 time bins
uv run python scripts/compare_baselines.py --synthetic

# Specific subset — faster
uv run python scripts/compare_baselines.py --synthetic \
  --models finegray cox gbm eplet capa_onehot blosum capa \
  --n 500 --epochs 30 --n-bootstrap 200

# Skip bootstrap CIs for quick iteration
uv run python scripts/compare_baselines.py --synthetic --n-bootstrap 0

# GBM and RSF require scikit-survival
pip install scikit-survival

Output columns

Model                           C-idx GvHD   C-idx Relapse   C-idx TRM
                                IBS GvHD    IBS Relapse     IBS TRM
────────────────────────────────────────────────────────────────────
Fine-Gray                          0.5431        0.4892       0.5017
                                   0.2801        0.2903       0.2411
Eplet Proxy (cause-specific Cox)   0.6012        0.5234       0.5678
                                   0.2312        0.2701       0.2201
CAPA-BLOSUM (ablation)             0.5892        0.5104       0.5511
                                   0.2401        0.2819       0.2344
CAPA (full)                        0.6423        0.5701       0.6012
                                   0.2101        0.2612       0.2089

Key ablation logic

CAPA vs. CAPA-BLOSUM isolates ESM-2's contribution. CAPA-BLOSUM vs. CAPA-OneHot isolates biochemical priors vs. random initialisation. CAPA vs. Eplet Proxy is the clinical comparison. All three deltas need to be positive and significant for the paper's claims to hold.

Evaluation & calibration

All metrics live in capa/training/evaluate.py. The master function evaluate_all(cif, event_times, event_types, ...) returns C-index, Brier scores, IBS, and calibration curves in one call.

Discrimination: C-index

Harrell's concordance index — the probability that a patient with a higher predicted risk actually experienced the event earlier. 0.5 = random, 1.0 = perfect. Reported with 1000-replicate bootstrap 95% CI.

Calibration: Brier score and decomposition

The Brier score measures mean squared error between predicted CIF and observed event indicator. CAPA decomposes it following Murphy (1973):

BS = REL − RES + UNC

Component	Formula	Interpretation
UNC (uncertainty)	ō(1 − ō)	Base-rate difficulty — fixed for a dataset, not model-dependent.
REL (reliability)	Σ n_k/n · (f̄_k − ō_k)²	Calibration penalty — how far mean predictions deviate from observed rates within quantile bins. Lower is better.
RES (resolution)	Σ n_k/n · (ō_k − ō)²	Discrimination — how much predicted risk separates high- from low-risk patients. Higher is better.

Python

from capa.training.evaluate import brier_decomposition, plot_calibration_curve

# cif: (n, T), event_times: (n,), event_observed: (n,) bool
decomp = brier_decomposition(cif_k, event_times, observed, eval_time=365, time_bins=bins)
# {'brier_score': 0.21, 'reliability': 0.03, 'resolution': 0.08, 'uncertainty': 0.26}

# Reliability diagram
fig = plot_calibration_curve(calib_result, model_name='CAPA', event_name='GvHD')
fig.savefig('calibration_gvhd.pdf')

Interpretability: residue attribution & biology alignment

After training, two functions in capa/interpret/attention_maps.py let you ask whether the model learned immunologically meaningful patterns.

Python

from capa.interpret.attention_maps import (
    residue_gradient_attribution, biology_alignment_score,
    PEPTIDE_BINDING_GROOVE,
)

# donor_pos_embs: (n_loci, n_positions, 1280) — per-residue ESM-2, NOT mean-pooled
d_attr, r_attr = residue_gradient_attribution(
    model, donor_pos_embs, recip_pos_embs, clinical,
    event_k=0,  # GvHD
)

# Score against known HLA-A peptide-binding groove positions
score = biology_alignment_score(d_attr[0], locus='A')
# {'enrichment': 2.1, 'auroc': 0.72, 'mean_rank_ratio': 0.28}
# AUROC > 0.65 means the model preferentially attends known groove residues

Known positions reference

PEPTIDE_BINDING_GROOVE is a curated dict of critical residues per locus (HLA-A: positions 9, 24, 44, 45, 62, 66, 74–77, 80–81, 95, 99, 116, 143, 147, 152, 156, 163; DRB1: 13, 26, 28, 30–31, 37, 47, 57, 60, 67, 70–71, 74, 86). Sources: HLAMatchmaker v3, Duquesnoy 2002/2008, Siu 2020 NEJM Evid.

Try it live → View source

The CAPA prediction backend

Overview

What the model returns

Quick start

Endpoints

Request format

HLA typing object

Clinical covariates object optional

Full /predict request body

Response format

/predict response

/compare response

Run locally

Option A — Docker

Option B — uv (development)

Configuration

Switching between the live and local backend

Baselines & ablations

Statistical baselines

Deep / ablation baselines

Running a comparison

Evaluation & calibration

Discrimination: C-index

Calibration: Brier score and decomposition

Interpretability: residue attribution & biology alignment