The landscape of antigen-specific T cells in human cancers

Bo Li, Longchao Liu, Jian Zhang, Jiahui Chen, Jianfeng Ye, Alexander Filatenkov, Sachet Shukla, Jian Qiao, Xiaowei Zhan, Catherine Wu, Yang-Xin Fu

Preprint posted on 1 November 2018

Identifying the specific targets of T cells that can destroy cancer cells is a priority in the field of immunotherapy. Here, the authors use a clustering approach in pan-cancer RNAseq datasets to discover new recurrent targets.

Selected by Rob Hynds


Antigen-specific tumour-infiltrating lymphocytes (TILs) can mediate tumour destruction but identifying the specificity of these is a challenge due to the variety of possible sources (including single nucleotide variants (SNVs), insertion or deletions (indels), re-expression of developmental antigens, intron retention etc.), the diversity of the antigen binding region and the promiscuous binding of antigenic peptides. This study attempts to develop and test an improved strategy to study TCR repertoires associated with cancer by leveraging publicly available datasets.


  • A new pipeline to identify clusters of recurrent TCR complementarity-determining regions (CDRs; also known as hypervariable regions) from TGCA RNAseq that are not present in healthy controls.
  • Gene expression analysis shows CDR3 (which is the most variable of the three CDRs) clusters are from samples with an enriched T cell activation signature.
  • Clustering CDR3s reveals a CD8+ T cell subset with a Trm signature. The differentiation pathway of these cells is mapped using trajectory analysis.
  • Clustered CDR3s are used to identify neoantigens and cancer-associated antigens.
  • It is possible to detect cancer-associated CDR3s in early breast cancer samples so this protocol may be relevant for early detection of cancers.

Detecting cancer-associated CDR3s

Using an algorithm (TRUST), the authors identified 1.5 x 10CDR3 sequences in over 9,000 (pan-cancer) TGCA RNAseq samples. Around 170,000 of these were complete productive sequences and after excluding sequences that are also found in a healthy population, the remaining 82,000 sequences were dubbed ‘cancer-associated’. These CDR3 sequences were clustered (using ‘iSMART’), generating 4501 clusters containing 15,254 sequences. Given the probability of sequences that are not found in the healthy population co-occuring in multiple individuals is low, they assume that these share antigen specificity.

Relationship of clusters with gene expression

The authors go on to correlate gene expression and the number of clustered CDR3s in samples and find that genes involved in immune response (inc. T cell activation) are positively correlated, while a number of immune inhibitory genes are negatively correlated. They conclude that samples with clustered CDR3s are enriched for activated TILs. Next, single cell RNAseq data is used to look at the phenotype of T cell clonotypes with clustered CDR3s and a CD8+ subpopulation that has high T cell cytotoxicity and exhaustion scores (resembling a recently described Trm cell signature) is found. Trajectory analysis sheds some light on Trm differentiation: early in pseudotime cells express precursor markers and low levels of exhaustion markers whereas later, two populations emerge. The metabolic activity of one is low, while the other is high. The highly metabolic population comes from low metabolic activity differentiated cells and not directly from progenitors. Analysis of individual clonotypes shows this is dependent on receptor sequence.

Figure 2C from the preprint shows the pseudotime trajectory plot that depicts the inferred Trm differentiation pathway.

Neoantigen identification

The authors sought associations between recurrent SNVs (>3) and CDR3 clusters to see whether genomic aberrations generate abnormal proteins that are recognised by the immune system. Six such associations were significant, four of which are predicted to bind to HLA molecules. Two patients with these mutations also had a matched HLA genotype. As well as SNVs, the authors looked at indels, identifying 10 significant pairs despite a five-fold lower number of indels compared to SNVs. Four were confirmed in RNAseq data (which is important as mutant transcripts can be degraded by nonsense-mediated decay); interestingly two hits are Wnt pathway genes and all of these samples are from stomach cancer patients with high microsatellite instability and indels in short-tandem repeat regions. The validity of this approach is supported by the fact that these patients have a good clinical response to checkpoint inhibition.

Cancer-associated antigen identification

Differential gene expression analysis for CDR3 clusters identified two clusters with distinct CDR3 patterns that targeted the gene HSFX1. These were derived from colon and endometrial cancer samples. Since HSFX1 is lowly expressed by normal tissues but is expressed in 13% CRC and 75% of endometrial cancers and high expression of HSFX1 is a positive prognostic indicator in endometrial cancer, it is a promising cancer associated antigen candidate. Strong binding of HSFX1 peptides was predicted to three common HLA alleles (including HLA-A*02:01) so the authors synthesised 9-mer peptides and injected them into humanised mice with HLA-A*02:01. IFNg ELISPOT on splenocytes from these mice demonstrated a response, suggesting the immunoreactivity of these peptides in vivo. The authors conclude that these likely escape central tolerance and could act as cancer-associated antigens in these cancer types.

Early detection relevance

The authors ask whether it’s possible to identify CDR3 clusters in PBMCs (rather than TILs) as this could allow early detection of cancer using peripheral blood samples. There were more cancer-associated CDR3s in 16 early stage breast cancer PBMC samples than healthy controls. Interestingly, these are fewer in number than in an advanced melanoma cohort but those that are found are detected at higher abundance. The authors suggest this is due to Trm differentiation reducing clonotype frequencies in more advanced tumours.

Questions for the authors

Q1) Could the use of healthy controls to filter out ‘non-cancer’ CDR3 sequences mean that co-morbidities contribute the ‘cancer-associated’ list? For example, are cancer patients more likely to have viral-reactive TCRs?

Q2) Few neoantigen-TCR pairs are detected, do the authors think that sequencing or analysis factors are more important in this? Have they tried the pipeline in TCR-enriched datasets?

Q3) Do the authors have a hypothesis for how receptor sequence influences Trm differentiation?

Tags: cancer, early detection, immunology, immunotherapy, t cells

Posted on: 19 November 2018


Read preprint (2 votes)

Author's response

Bo Li shared

Thank you for highlighting our work!

Q1: The existence of common, or ‘public’ TCRs between different individuals, is a known phenomenon caused by biased V(D)J recombination during T cell development. Some experts believe that public TCRs help to protect against some common viral infections, such as HCMV. Their presence in the tumor microenvironment could be due to circulation.

Q2: Detection tumor neoantigens with this approach requires large sample size, as we rely on statistical co-occurrence of hypervariable CDR3s and rare somatic mutations. Both parts are very diverse, making the detection power low. So far there are no TCR-enriched large cancer datasets available.

Q3: We speculate that T cells differentiated from precursor to Trm1 were newly entry T cells that respond to new cancer antigens (could be cancer-associated antigens or neoantigens), where T cells differentiated from Trm1 to Trm2 were cells that already resided in the tissue, responding to tissue-specific antigens. But of course, this is pure speculation.

Have your say

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

Also in the cancer biology category: