Paper — CAPA

Abstract

Outcome of haematopoietic stem cell transplantation depends critically on HLA compatibility, conventionally encoded as a binary match/mismatch count that discards most immunological information. Even the continuous distance metrics proposed as replacements share a subtler flaw: a Euclidean mismatch distance is symmetric and therefore direction-blind, whereas graft-versus-host alloreactivity is intrinsically directional. We propose CAPA, which represents each HLA allele with a frozen protein language model (ESM-2, 650M) and learns donor–recipient alloreactivity end-to-end via attention over the signed difference embeddings, feeding a DeepHit head that jointly predicts cumulative incidence of GvHD, relapse, and TRM as competing risks. On the public UCI Bone Marrow Transplant cohort (n = 187) we report tabular competing-risks baselines (Fine–Gray relapse 0.84 / TRM 0.66 on a single split; 5×5 cross-validation corrects this to Cox 0.60 / 0.56); CAPA's ESM-2 step cannot be run end-to-end there for want of per-allele HLA typing. In a controlled directional simulation, a scalar-distance Cox model collapses to near-chance (C = 0.58) while CAPA reaches C = 0.87, recovering 93% of the gap to a direction-aware oracle with non-overlapping confidence intervals on every seed — capturing directional information no scalar distance can encode. We release all code and weights as an open, reproducible proof-of-concept and discuss the small-cohort and simulation-based limitations frankly.

1 Introduction

HLA matching is the strongest modifiable predictor of HSCT outcome. The standard representation — an integer count of matched alleles across loci — assumes all mismatches are equal and discards the protein-level differences that actually drive alloreactivity.^[1] A single amino-acid substitution in the peptide-binding groove can change immunogenicity dramatically, while many substitutions are functionally silent.

We ask whether a continuous, learned representation of HLA sequences can recover this lost signal — and in particular whether it can capture the direction of donor–recipient mismatch, which a symmetric distance cannot — without hand-engineered mismatch features.

2 Methods

2.1 · Sequence embedding

For each allele we retrieve the full protein sequence from IPD-IMGT/HLA and embed it with frozen ESM-2 (esm2_t33_650M_UR50D), mean-pooling the final layer to a 1 280-dim vector e ∈ ℝ¹²⁸⁰.^[2]

2.2 · Directional interaction

CAPA self-attends over the signed per-locus donor–recipient difference embeddings (e^D − e^R), preserving the direction of mismatch that a symmetric distance discards; the resulting interaction representation feeds the survival head. A higher-capacity bidirectional cross-attention variant is also implemented but is under-determined at the cohort sizes available, so the compact signed-difference model (469K parameters) is the one evaluated.

CIF_k(t | x) = P(T ≤ t, ε = k | x)(1)

2.3 · DeepHit competing-risks head

We model the three causes jointly with DeepHit, optimising a log-likelihood plus a ranking loss over event times.^[3] Competing-risks formulation respects that the events are mutually exclusive over a patient's trajectory.

Model schematicFIG. 1

Fig. 1 End-to-end architecture. The protein language model is frozen; only the signed-difference interaction module and DeepHit head are trained.

3 Results

3.1 · Tabular baselines on UCI BMT. CAPA's ESM-2 step cannot be evaluated end-to-end on this cohort (it records aggregate mismatch counts, not per-allele typing), so we report tabular competing-risks baselines as reference points. On a single held-out test split (n = 29) the best baseline — Fine–Gray — reaches a time-dependent C-index of 0.84 for relapse and 0.66 for TRM. These single-split numbers are optimistic; repeated 5×5 cross-validation corrects them to Cox relapse 0.60 ± 0.14 and TRM 0.56 ± 0.06. GvHD was not evaluable owing to only 2 events in the test fold.^[4]

Table 1 — Time-dependent C-index on the UCI BMT test set (n = 29), tabular baselines only. CAPA requires registry-scale data with per-allele HLA typing, absent here. Higher is better; — denotes not evaluable.
Model	Relapse	TRM	GvHD
Cox-PH (cause-specific)	0.75	0.65	—
Fine–Gray	0.84	0.66	—
Random Survival Forest	0.48	0.65	—
DeepHit (tabular HLA)	0.65	0.57	—

3.2 · Directional simulation — where CAPA wins. In a controlled cohort (N = 10,000, 6 seeds) where GvHD hazard is driven by the direction of the donor–recipient difference, a Cox model on symmetric scalar distances collapses to near-chance, while CAPA — learning from signed difference embeddings — recovers most of the gap to a direction-aware oracle, with non-overlapping confidence intervals on every seed. On a scalar-distance TRM control CAPA is deliberately weaker, confirming the advantage is specific to directional structure rather than a generic capacity effect.

Table 2 — Directional GvHD simulation (N = 10,000; mean ± SD over 6 seeds). The oracle is given the true directional features; TRM is a scalar-distance control.
Model	GvHD	Relapse	TRM
Cox (binary mismatch)	0.53	0.77	0.59
Cox (scalar distances)	0.58	0.78	0.67
Cox (oracle direction)	0.89	0.78	0.67
CAPA (signed diff)	0.87	0.77	0.53

Cumulative incidenceFIG. 2

GvHDRelapseTRM

Fig. 2 Illustrative predicted cumulative incidence functions across the three competing risks.

4 Discussion & limitations

The directional simulation supports the central claim: a learned representation on the signed difference embedding captures directional alloreactivity that a symmetric scalar distance provably cannot. This is established in a simulation that plants the directional signal (a capability demonstration) and against tabular baselines; the UCI BMT cohort is small (n = 187), single-source, and several event types are too rare to evaluate. The one real-data test (IHWG, composite OS endpoint) shows ESM-2 distances merely comparable to binary mismatch overall. We make no clinical claims; CAPA is a methodological proof-of-concept. External validation on large multi-centre registries with per-allele typing and event-specific endpoints is the necessary next step.

5 References

[1]Dehn, J. et al. Selection of unrelated donors and cord blood units for HSCT. Blood, 2019.

[2]Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model (ESM-2). Science, 2023.

[3]Lee, C. et al. DeepHit: a deep learning approach to survival analysis with competing risks. AAAI, 2018.

[4]Sikora, M. et al. Bone Marrow Transplant: children. UCI Machine Learning Repository, 2020.