Final Report Summary - RECOGNIZE (Physical principles of recognition in the immune system.)
How can our adaptive immune system be prepared for the many pathogens we constantly encounter? A diverse repertoire of receptor proteins on the surfaces of B and T cells interacts with pathogens, recognize them and initiates an immune response. In this project we quantitatively described the generation and self-organization of the immune repertoire at the molecular and evolutionary level.
Diversity in the immune system is created via a series of stochastic events involving gene choices and random nucleotide insertions between, and deletions from, genes. Due to the random insertion and deletions we cannot deterministically say how a given sequence was formed. We developed statistical inference algorithms to probabilistically describe the generation process based on high throughput immune repertoire sequencing data and quantified the diversity of both generated and functional repertoires. We found the potential diversity, mostly coming from random insertions and deletions, is much larger than the number of sequences in any individual. Characterizing the generation process gave us a baseline to quantify the selection and hypermutation process that follow generation. Selection showed significant correlation with biases induced by VDJ recombination and a reduction of diversity, suggesting that natural selection acting on the recombination process anticipated somatic selection pressures. Interestingly, we found that the number of sequences shared between people was well-predicted by the generation probability, indicating a purely stochastic origin of such ”public” sequences. Only identical twins shared larger numbers of rare clonotypes than predicted by chance, potentially due to in-utero sharing of actual T-cells. We showed that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, and quantified the rapid increase in repertoire diversity in developing mice.
The repertoire is not made out of unique receptors but clones of different sizes. We showed that fluctuations in the antigen-related fitness of the receptor clones (not cells) are the essential ingredient needed to generate universally observed long-tailed clone size distributions and we characterized the scale of fluctuations in antigenic environments.
We then took a completely different theoretical perspective, predicting the optimal repertoire that minimizes the cost of infections contracted from a given distribution of pathogens, to explain the relatively low repertoire overlap between different individuals in terms of repertoire sparseness and cross-reactivity of receptors. More broadly, biological organisms have evolved a wide range of complex strategies to defend themselves against pathogens: from CRISPR to adaptive immunity. Using a common evolutionary framework that balances the benefits and costs involved in protection against pathogenic environments we linked the basic forms of known immunity in different species to the statistics of the pathogenic environments in which they function.
Lastly, in a joined experimental—theoretical approach named Tite-Seq we mapped out the quantitative relationship between an antibody’s sequence and its antigen binding affinity. We reported a large amount of beneficial epistasis, enlarging the space of high-affinity antibodies as well as their mutational accessibility and linked mutations stabilizing the antibody structure to affinity.
In summary, our analyses give a quantitative description of T and B-cell generation and selection in living organisms. We showed that many so-called “public” sequences are attributed simply to chance and developed a new baseline for further analysis of repertoire dynamics, implemented in new software. We also showed that quantitatively analyzing data driven population models discriminates between different evolutionary scenarios in immunology. Exploring ideas of optimality in recognition space we found that the immune system is phenotypically redundant allowing for many different functional solutions, and that the diversity of immune systems found in nature can be understood from an evolutionary perspective. Lastly, we developed a method for linking genotype and phenotype measurement of immune repertoires in a high throughput yet quantitative way, and used them to identify the importance of epistasis in antibodies.
Diversity in the immune system is created via a series of stochastic events involving gene choices and random nucleotide insertions between, and deletions from, genes. Due to the random insertion and deletions we cannot deterministically say how a given sequence was formed. We developed statistical inference algorithms to probabilistically describe the generation process based on high throughput immune repertoire sequencing data and quantified the diversity of both generated and functional repertoires. We found the potential diversity, mostly coming from random insertions and deletions, is much larger than the number of sequences in any individual. Characterizing the generation process gave us a baseline to quantify the selection and hypermutation process that follow generation. Selection showed significant correlation with biases induced by VDJ recombination and a reduction of diversity, suggesting that natural selection acting on the recombination process anticipated somatic selection pressures. Interestingly, we found that the number of sequences shared between people was well-predicted by the generation probability, indicating a purely stochastic origin of such ”public” sequences. Only identical twins shared larger numbers of rare clonotypes than predicted by chance, potentially due to in-utero sharing of actual T-cells. We showed that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, and quantified the rapid increase in repertoire diversity in developing mice.
The repertoire is not made out of unique receptors but clones of different sizes. We showed that fluctuations in the antigen-related fitness of the receptor clones (not cells) are the essential ingredient needed to generate universally observed long-tailed clone size distributions and we characterized the scale of fluctuations in antigenic environments.
We then took a completely different theoretical perspective, predicting the optimal repertoire that minimizes the cost of infections contracted from a given distribution of pathogens, to explain the relatively low repertoire overlap between different individuals in terms of repertoire sparseness and cross-reactivity of receptors. More broadly, biological organisms have evolved a wide range of complex strategies to defend themselves against pathogens: from CRISPR to adaptive immunity. Using a common evolutionary framework that balances the benefits and costs involved in protection against pathogenic environments we linked the basic forms of known immunity in different species to the statistics of the pathogenic environments in which they function.
Lastly, in a joined experimental—theoretical approach named Tite-Seq we mapped out the quantitative relationship between an antibody’s sequence and its antigen binding affinity. We reported a large amount of beneficial epistasis, enlarging the space of high-affinity antibodies as well as their mutational accessibility and linked mutations stabilizing the antibody structure to affinity.
In summary, our analyses give a quantitative description of T and B-cell generation and selection in living organisms. We showed that many so-called “public” sequences are attributed simply to chance and developed a new baseline for further analysis of repertoire dynamics, implemented in new software. We also showed that quantitatively analyzing data driven population models discriminates between different evolutionary scenarios in immunology. Exploring ideas of optimality in recognition space we found that the immune system is phenotypically redundant allowing for many different functional solutions, and that the diversity of immune systems found in nature can be understood from an evolutionary perspective. Lastly, we developed a method for linking genotype and phenotype measurement of immune repertoires in a high throughput yet quantitative way, and used them to identify the importance of epistasis in antibodies.