ISSN 1866-8836
Клеточная терапия и трансплантация

Immunoinformatics in COVID-19 Vaccine Development: The Role of HLA System

Anna Yu. Anisenkova1, Aleksander S. Golota1, Dmitry A. Vologzhanin1, Tatyana A. Kamilova1, Stanislav V. Makarenko1,2, Olga V. Shneider1, Oleg S. Glotov1,3, Yury A. Serov4, Sergey V. Mosenko1, Sergey V. Azarenko1, Konstantin V. Smanzerev1, Dmitry N. Khobotnikov1, Tatyana V. Gladisheva1, Sergey G. Shcherbak1,2

1 St. Petersburg City Hospital №40, St. Petersburg, Russia
2 St. Petersburg State University School of Medicine, St. Petersburg, Russia
3 D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia
4 Pavlov University, St.Petersburg, Russia

Dr. Aleksander S. Golota, Ph.D., Head, Clinical Research Branch for Medical Rehabilitation, City Hospital №40, Borisova St. 9B, Sestrorezk, St.Petersburg, Russia

doi 10.18620/ctt-1866-8836-2021-10-1-13-23
Submitted 25 December 2020
Accepted 12 March 2021


Individual genetic variation may help to explain different immune responses to a coronavirus SARS-CoV-2 across a population. The in silico computer simulation methodology provides the experimental community with a more complete list of SARS-CoV-2 immunogenic peptides presented by the antigens of the HLA system. This review considers an array of computationally predicted immunogenic peptides from SARS-CoV-2 for in vitro functional validation and potential vaccine developments. Several independent studies conducted with different approaches showed a high degree of confidence and reproducibility of the results. Computer-assisted prediction is instrumental for a quick and cost-effective solution to prevent the spread and ultimately eliminate the infection.

Most efforts to develop vaccines and drugs against SARS-CoV-2 target the spike glycoprotein (protein S), the major inducer of neutralizing antibodies. Several candidates have been shown to be effective in in vitro studies and have progressed to randomized trials in animals or humans against COVID-19 infection. This article highlights current advances in the development of subunit vaccines to combat COVID-19 that are reducing the time and costs of vaccine development.


Coronavirus, SARS-CoV-2, COVID-19, immunogenic peptides, antigen, HLA, vaccine, epitope, computational prediction, computer simulation in silico, immunoinformatics.


Human leukocyte antigens (HLA) on the cell surface play a pivotal role in recognizing inter-and extracellular proteins as "self" or "non-self". A unique set of HLA alleles keeps our immunological identity. The immune response of the host organism is based on the detection of "non-self" protein epitopes. E.g., in viral infections, Class I HLA molecules recognize viral peptides produced inside the infected cells, causing the recruitment of CD8+ cytotoxic populations and destruction of the target cells.

Class II antigens on the antigen-presenting immune cells mostly determine the risk of graft rejection after allogeneic transplantation of organs or bone marrow, thus requiring maximal HLA-similarity between recipient and donor. Since the 1970’s, the high-throughput and precise techniques of HLA-typing were introduced for allogeneic transplantation of hematopoietic cells (HSCT). More than 30.000 HLA variations of HLA proteins are registered and DNA-sequenced by the classical Sanger technique or NGS approach which is implemented for HLA typing worldwide [1]. The resulting databases contain quite extensive data on HLA polymorphic regions and provide excellent opportunities for in silico modeling of best matches between HLA and any foreign proteins, including fragments of viral antigens.

Our review aimed to discuss common bioinformatic approaches to the search of target HLA epitopes and immunity-related proteins when designing novel antiviral vaccines, in particular, against SARS-CoV-2. This new beta-coronavirus of the Coronaviridae family, Betacoronavirus genus was identified as a pathogen causing severe acute respiratory infection. On February 11, 2020, The World Health Organization has assigned the official name COVID-19 ("CОronaVIrus Disease 2019"), whereas an International Committee for Taxonomy of Viruses, ICTV) Committee for Taxonomy of Viruses, ICTV) designates the causative agent of this infection as SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2). At sooner time, WHO has claimed SARS-CoV-2 pandemia.

Clinical course in SARS-CoV-2 infection is quite variable. i.e., from mild or symptom-free clinical forms to severe or extremely severe COVID-19 with high mortality levels. Acute respiratory distress syndrome was registered in some patients, and 11% of the patients deceased with multi-organ failure in a short time [2]. Most patients with COVID-19 manifest with fever (90%); cough (80%); apnea (55%); loss of smell and taste (50%); myalgia and fatigability (44%); chest pressure and/or pain (20%), as well as headache (8%), hemoptysis (5%), diarrhea, nausea (3%) [3]. SARS-CoV-2 exhibits pronounced lung tropism causing severe respiratory failure in some patients with COVID-19-associated pneumonia thus requiring invasive mechanical ventilation with a mortality risk of up to 60% [4]. The specific clinical course of COVID-19 infection suggests a unique immune dysfunction with pronounced lymphopenia, excessive IL-6 production, and enhanced blood clotting [5].

The results of computer modeling (in silico experiments) show that there are genetic differences, especially in the human immune system, which may explain the different ability to respond to SARS-CoV-2 infection, differences in symptoms, and severity of COVID-19.

Following infection of human cells by a coronavirus, the organism responds by antiviral signaling. These alarms identify the virus and mobilize the immune system for sending cytotoxic T cells to destroy the infected cells. To find out if different alleles of signaling system may explain the diversity of immune responses to SARS-CoV-2, numerous research teams use computer algorithms to analyze all coronavirus proteins to predict how different versions of antiviral signaling system recognize the coronavirus proteins, in particular, to get data on the role of HLA haplotype predisposing for the clinical course of the infection, its diagnostics and therapy [6].

HLA alleles predispose for differential susceptibility to viral disease and its clinical course. Genetic variability of the main HLA genes may influence the severity of COVID-19 severity. In particular, understanding these variabilities may be helpful for detection of the persons with a high risk of the disease. The combination of HLA typing with SARS-CoV-2 testing will improve risk assessment in the general population. Following the development of the anti-SARS-CoV-2 vaccine, the persons with high-risk HLA alleles should have priority for vaccination.

When binding the virus or its fragment, HLA antigen exposes it on the cell surface, thus tagging the cell as an infected one, promoting its killing by specific immune cells. Generally, the more viral peptides could be recognized by the HLA system, the more pronounced is the immune response. The results of computer modeling predict that some HLA alleles are binding numerous peptides of SARS-CoV-2 proteins whereas others are binding with only a few of them, thus determining the extent and efficiency of an immune response. Therefore, a consensus was reached on the biological significance of HLA differences, i.e., these variations may partially explain the wide variability of differences for the COVID-19 infection severity. The differences for HLA loci seem to be not the only genetic factor influencing COVID-19 severity. However, understanding their actual effects upon the clinical course of COVID-19 may help to reveal the persons with a higher risk for the disease and to develop vaccines against SARS-CoV-2.

Biology and genetic properties of SARS-CoV-2

SARS-CoV-2 belongs to the single-stranded coronavirus family. Its genome encodes a series of structural and non-structural genes. It consists of a single-stranded positive-sense RNA 30 kb long and contains two flanking untranslated regions, and one long open reading frame (ORF) encoding a polyprotein including replicase complex (ORF1ab), and the genes of four structural proteins: spike glycoprotein (S), membrane glycoprotein (M), an envelope protein (E), and nucleocapsid phosphoprotein (N). The ORF1ab sequence encodes 16 non-structural proteins [7]. Two SARS-CoV-2 proteins, ORF-3a and ORF-7a, are the suggested determinants for T cell recognition. Both proteins are important for viral replication and may influence the pathogenesis and dissemination of the disease [8].

Clinical features of COVID-19

COVID-19 in SARS-CoV-2-positive patients is classified as mild, severe, and critically severe disease. The absolute and relative number of CD4+, CD8+ T cells, and B cells is reduced with increased severity of the disease. The levels of pro-inflammatory IL-2, TNF-α, IL-6 cytokines, as well as CRP in blood plasma are increased, whereas activation of dendritic cell and B cells are decreased in severe clinical cases [9].

Lymphopenia is among the typical characteristics of SARS-CoV-2 infection. Lymphocyte migration from blood to lungs may be a reason for lymphocyte deficiency in peripheral blood, due to antigenic stimulation. If CD8+ T cells are unable to eliminate the virus, CD4 T cells will be activated to further enhancement of immune response. Continuous and excessive inflammatory reactions finally cause apoptosis and anergy of the lymphocytes. Hence, the lymphocyte function can be completely different in different stages of infection [9].

Severe complications develop in 20% of the COVID-19 patients, being connected with an uncontrolled systemic hyperinflammatory immune response to SARS-CoV-2 infection, the so-called "cytokine storm". Excessive IL-6 production is a trigger of this life-threatening condition. Suppression of this inflammatory immune response could be considered a target for anti-inflammatory and immune-modulating therapy in severe COVID-19 [10].

The results of computer modeling (in silico experiments) show that there are genetic differences, especially in the human immune system, which may explain the different ability to respond to SARS-CoV-2 infection, differences in symptoms, and severity of COVID-19.

Following infection of human cells by a coronavirus, the organism responds by antiviral signaling. These alarm signals identify the virus and mobilize immune cells, especially, cytotoxic T cells, to destroy the infected cells. To search for distinct alleles of signaling system predisposing for diverse immune responses to SARS-CoV-2, numerous research teams use computer algorithm to analyze all coronavirus proteins, to predict how different versions of antiviral signaling system recognize the coronavirus proteins, in particular, to get data on the role of HLA haplotype predisposing for the clinical course of the infection, its diagnostics and therapy [6].

Biology of an anti-COVID immune response

HLA alleles predispose for differential susceptibility to viral disease and its clinical course. Genetic variability of the main HLA genes may influence COVID-19 severity. In particular, understanding this variability may be helpful for detection of the persons with a high risk of the disease. The combination of HLA typing with SARS-CoV-2 testing will improve risk assessment in the general population. Following the development of an anti-SARS-CoV-2 vaccine, the persons with high-risk HLA alleles should have priority for vaccination.

When binding the virus or its fragment, HLA antigen exposes it on the cell surface, thus tagging the cell as an infected one, promoting its killing by specific immune cells. Generally, the more viral peptides could be recognized by the HLA system, the more pronounced is the immune response. The results of computer modeling predict that some HLA alleles are binding numerous peptides of SARS-CoV-2 proteins whereas others are binding with only a few of them, thus determining the extent and efficiency of an immune response. Therefore, a consensus was reached on the biological significance of HLA differences, i.e., these variations may partially explain the wide variability of differences for the COVID-19 infection severity. The differences for HLA loci seem to be not the only genetic factor influencing COVID-19 severity. However, understanding their actual effects upon the clinical course of COVID-19 may help to reveal the persons with a higher risk for the disease and to develop vaccines against SARS-CoV-2.

Distinct HLA haplotypes are associated with different susceptibility for infections, mainly, due to T cell receptors which recognize the conformational structure of the HLA antigen-binding domain in a complex with corresponding antigenic peptides. Hence, the advantage in the immune response is to express HLA molecules with increased specificity of binding to SARS-CoV-2 viral peptides on the cell surface of antigen-presenting cells (APCs). Identification of class I and II HLA alleles associated with the immune response against SARS-CoV-2 is, therefore, important for the development of diagnostic test-systems and evaluation of the vaccine efficiency [6, 11].

Following viral penetration to the target cell, its antigens presented by HLA molecules are recognized by virus-specific cytotoxic T lymphocytes (CTL). Numerous HLA alleles, e.g., HLA-B*46:01, HLA-B*07:03, HLA-DRB1*12:02, and HLA-C*08:01, (most frequently can be found in Chinese and Indonesian population) are associated with susceptibility to SARS-CoV, whereas HLA-DR*03:01, HLA-C*15:02, and HLA-A*02:01 correlate with protection against SARS-CoV infection and European Population. HLA-II molecules, such as HLA-DRB1*11:01 and HLA-DQB1*02:02, are associated with susceptibility to MERS-CoV infection. These data provide valuable clues for studying the pathogenetic mechanisms of COVID-19. The antigen presentation stimulates humoral and cellular immune response mediated by virus-specific B and T cells, respectively. The number of SARS-CoV-2-infected CD4+ and CD8+ T cells in peripheral blood of patients was sufficiently decreased, while their immune status is excessive activation, as evidenced by high proportions of HLA-DR-positive CD4+Т cells and CD38+ CD8+ T cells [12]. The SARS-CoV-2 epitope screening identified 2013 and 1399 peptide epitopes with high-affinity for HLA class I and class II molecules, respectively. These epitopes are distributed across structural proteins (S, M, E, and N), and non-structural proteins, being able to induce the CD8+ and CD4+ T cell responses. Several regions are enriched in high-affinity epitopes. These data are important for the development of vaccines against SARS-CoV-2 and T cell response monitoring [13].

The signal pathway of HLA-G and its receptor, expressed on the surface of immune cells, participates in viral infection by downregulation of T-, B-, and natural killer (NK) cells. Increased HLA-G expression is a strategy of viral escape from an immune response. The main evasion mechanism is that HLA-G binds to the immuno-inhibitory CTL receptors. Comparison of HLA-G expression and its receptors on peripheral blood lymphocytes between the day of an initial positive result for SARS-CoV-2 and the day of negative result has shown a positive relation between HLA-G expression on В cells and IFN-γ levels, while HLA-G expression in monocytes was negatively related to IL-2 levels. The percentage of HLA-G+ Т cells and expression of immune-inhibiting ILT4 receptor on B cells showed a negative correlation with TNF-α levels, the expression on the monocytes exhibited a positive correlation with IL-6, IL-10, and IFN-γ contents. The HLA-G expression pattern on peripheral immune cells may reflect three phases of the disease, i.e., primary infection, replication, and clearance of SARS-CoV-2. The time course of HLA-G expression on the peripheral immune cells termed as high/low/high dynamics from SARS-CoV-2 positive state to the SARS-CoV-2 negative condition suggests that the SARS-CoV-2 infection state is connected with cytokine regulation of HLA-G expression. Given that HLA-G is an antigen-presenting molecule, suppression of HLA-G expression by the SARS-CoV-2 virus can disrupt virus recognition by CD8 + T cells and maintain immunity evasion [14].

Genotyping of the COVID-19 patients by HLA loci has shown that the frequencies of HLA-C*07:29, C*08:01G, B*15:27, B*40:06, DRB1*04:06 and DPB1*36:01 (all of these alleles can be found mostly in the Asian population) alleles are higher, whereas frequencies of DRB1*12:02 and DPB1*04:01 alleles are lower in COVID-19 patients than in control population. Only the HLA-C*07:29 and B*15:27 frequencies were statistically different, with regard to corrected statistical significance. These data demonstrate some HLA alleles to be associated with COVID-19 [15].

The unique pattern of altered immunity regulation in SARS-CoV-2 patients is characterized, firstly, by increased circulating pro-inflammatory cytokine levels (especially, IL-6), secondly, by the functional lymphoid defect connected with IL-6-mediated decrease in HLA-DR expression. As mentioned above, the HLA class II, especially, HLA-DR, are constitutively expressed, mainly, by antigen-presenting cells (APC), В cells, and T cell subpopulations. IFN-γ is able to induce HLA-DR gene, as well as proinflammatory TNF-α, IL-1β и IL-6 cytokine expression. HLA class II activation enhances the HLA-restricted antigen presentation and adaptive immune response [5].

All the patients with SARS-CoV-2-associated pneumonia who develop severe respiratory failure, exhibit excessive inflammatory responses with immune dysregulation caused by IL-6, or macrophage activation syndrome (macrophage activation syndrome, MAS) caused by IL-1β, or very low HLA-DR expression, accompanied by a sharp decrease in CD4+T lymphocytes and NK. Blood plasma from the SARS-CoV-2 patients is shown to inhibit HLA-DR expression which could be partially restored by tocilizumab, an IL-6 blocking antibody. Tocilizumab treatment is accompanied by an increase in circulating lymphocyte numbers. Hence, the unique pattern of immune dysregulation in severe COVID-19 is characterized by the IL-6-mediated suppression of HLA-DR expression and lymphopenia due to permanently enhanced cytokine production and excessive inflammation [5].

In COVID-19, the patients show a decreased HLA-DR expression on the CD14+ monocytes, whereas blood ferritin levels sufficiently exceed normal values, a pattern revealed only in SARS-CoV-2 patients, with increased haemophagocytosis being of high diagnostic value. In the patients with bacterial pneumonia, HLA-DR on the CD14+ monocytes is below normal values, being, however, close to normal levels in SARS-CoV-2-associated pneumonia. In cases of its sudden drop, severe respiratory failure develops. The circulating IFN-γ concentrations were very low in all patients with SARS-CoV-2 infection. On contrary, the IL-6 and CRP levels were sufficiently higher in the patients with disturbed immunity than in cases with intermediate immune activation. IL-6 inhibits HLA-DR expression in COVID-19 patients. Accordingly, a negative correlation was found between serum level of IL-6 and absolute counts of HLA-DR on the CD14+ monocytes, as well as between absolute lymphocyte counts and HLA-DR contents on CD14+ monocytes in COVID-19 patients [5]. The role of IL-6 as a factor of HLA-DR decrease on the CD14+ monocytes is confirmed by an increase in HLA-DR+ circulating cells upon recovery from COVID-19 [16]. Tocilizumab can partially restore HLA-DR expression on the monocytes, thus increasing the number of circulating lymphocytes [17].

Development of vaccines against SARS-CoV-2

Aiming for the development of epitope-based subunit vaccines against SARS-CoV-2, one perform studies using reverse vaccinology and immune informatics. The tools of bioinformatics are used in reverse vaccinology to analyze the genome and proteome of the pathogen, identification, and analysis of neoantigens (Fig. 1). This approach allows choosing the viral antigenic segments that should be addressed when developing vaccines, being a more efficient, simple, time- and cost-effective method for the vaccine design.


Figure 1. Step-by-step strategies of reverse vaccinology approach of vaccine development [23]

During viral infection, the viral proteins are processed into small peptides within the proteasomes of infected cells. As mentioned above, the viral peptides are presented by HLA molecules at the surface of infected cells and recognized by T cells. The epitopes potentially recognizable by the T cells could belong to any viral structural or non-structural proteins.

Prediction of SARS-CoV-2 epitopes and appropriate immune responses is quite important to design and evaluate the immunogenicity of a vaccine against SARS-CoV-2. Prediction of antigenicity for the candidate vaccines presumes determination of numeric index for the ability of the vaccine to bind the В- and Т cell receptors and augment the immune response. To construct a vaccine, only highly antigenic sequences are chosen. At present, however, only scarce information exists on what SARS-CoV-2 sequences may induce a strong immune response.

The virus antigen-activated CD4+ helper T cells boost B cells to produce a lot of specific antibodies. Macrophages and CD8+ CTLs are also activated by the T-helpers, thus causing the final destruction of the target antigen. T cell epitopes interact with HLA class I and II alleles. This approach is used in the modern vaccine design since it provides a benefit, concerning time and cost expenditures over classical "test and error" strategy of "wet" labs (unlike dry labs, i.e., in silico experiments using mathematic or computer analysis). T cell epitopes were identified which can elicit a stable immune response against SARS-CoV-2 in the general human population. However, the ability of these epitopes to serve as vaccine candidates should be analyzed and validated at the molecular biology laboratories. Prediction of B cell epitopes seems less reliable than T cell epitopes since the B epitopes do not elicit a strong humoral response. For this reason, only T cell epitopes able to generate longitudinal CD4+ and CD8+T cell responses were analyzed in most in silico studies to choose epitopes for subsequent testing traditionally ("wet" laboratory) [8].

К. Kiyotani et al. [13] have performed immunoinformatic epitope prediction of S, E, N, M, and ORFs proteins from SARS-CoV-2 reference sample. They selected The HLA-A, HLA-B и HLA-C alleles presented in more than 5% frequencies in a Japanese population. To predict HLA class II epitopes, the HLA-DPA1-DPB1, HLA-DRB1 alleles, and HLA-DQA1-DQB1 haplotypes were used which encounter in Japan at frequencies of 5 to 38%. The work involved a total of 6421 sequences of SARS-CoV-2 isolated in various regions including 587 sequences from Asia, 1918 from North America, 3190 from Europe, and 726 from the Pacific Region deposited in the Global Initiative on Sharing Avian Influenza Database.

The next step included a screening of high-affinity peptide epitopes from the SARS-CoV-2 proteins which could be presented by molecules HLA class I and II. The two T cell epitopes in ORF1ab protein, ORF1ab2168-2176, and ORF1ab4089-4098, that were predicted to have a strong affinity to HLA-A*24:02, HLA-A*02:01 and HLA-A*02:06, have shown the broadest coverage of the Japanese population (83.8%). ORF1ab2168-2176, a predicted epitope binding HLA-C molecules (C*01:02, C*08:01, C*12: 02, and C*14:02), is present in 76.5%; the S268-277 and S448-457 epitopes in S protein were detected in 70% of Japanee individuals. The complexes of HLA with these peptides are useful for monitoring CD8+Т cell responses in patients and symptom-free infected individuals. No mutations were found in the sequences of the above epitopes. Therefore, the authors believe that these potential candidate epitopes can contribute to the development of rationally designed peptide vaccines based on epitopes against SARS-CoV-2 [13].

A complex analysis using in silico computer modeling of binding affinity between the peptides-HLA class I for 145 HLA-A, -B, and -C genotypes and whole proteome of the SARS-CoV-2, as well as cross-protective immunity resulting from preliminary action of the 4 widespread human coronaviruses, has shown that the HLA-B*46:01 antigen binds a minimal number of predicted SARS-CoV-2 peptides. This fact allows suggesting that the individuals carrying this allele could be especially susceptible for COVID-19, as it was previously shown with SARS-CoV, thus corresponding to clinical data which associate this allele with severe disease. And, vice versa, HLA-B*15:03 allele has demonstrated maximal ability to present highly conserved SARS-CoV-2 peptide sequences [18]. Hence, distinct HLA genotypes may differentially induce T cell antiviral response, thus influencing clinical course of infection and its transmission.

For 32257 unique 8-12-mer SARS-CoV-2 peptides, which are predicted to pass through the proteasomal processing pathway, a SARS-CoV-2-specific distribution was shown for presentation by the HLA class I molecules. The presumed ability of SARS-CoV-2 peptides for antigenic presentation is unrelated with the population frequency of HLA alleles. When reporting the global frequency cards for the 145 studied HLA alleles, the authors highlight the global distribution of the three best (A*02:02, B*15:03, C*12:03), and three worst (A*25:01, B*46:01, C*01:02) HLA-presenting alleles, by their ability to generate the SARS-CoV-2 epitope repertoire, in order to support T cell immune response. These differences remain significant at the haplotype level. SARS-CoV-2 evolution in the population may modify the repertoire of presented viral epitopes or otherwise modulate HLA-independent epitopes. However, the authors recommend integrating HLA-typing into clinical trials, and combining it with COVID-19 testing to apply it as a potential predictor(s) of the disease severity among the population, and, probably, for adaptation of future vaccination strategies for genotypic risk groups. This approach could be used to control a broad spectrum of other viruses [18].

Aiming for epitope-based development of a vaccine against SARS-CoV-2 employing analysis of viral proteome using immunoinformatic tools, А. Joshi et al. have chosen antigenic non-toxic non-allergenic peptides from non-allergenic proteins based on their interactions with HLA allelic sets. Among the identified T cell epitopes, the ITLCFTLKR epitope is a candidate for the anti-SARS-CoV-2 vaccine, exhibiting better binding indexes in the epitope-HLA complexes, as well as acceptable stability, toxicity, and population coverage. This epitope should now undergo laboratory verification [8].

The spike protein (S protein) is most commonly analyzed to detect immunogenic epitopes which are highly affine for the cellular ACE2 receptor when penetrating human cells. On this basis, S protein is considered a potential target for the coronavirus vaccine. Using immunoinformatics and computer modeling tools, М. Bhattacharya et al. [19] have revealed 34 linear B cell epitopes for S protein of the SARS-CoV-2 and identified among them 13 HLA I-binding epitopes and 3 HLA II-binding epitopes with antigenic features of SARS-CoV-2 S protein, which are recognized by Т cell receptor (TLR-5, Toll-like receptor-5). The antigenic epitopes are converted into the single vaccine component using a linker peptide that promoted stability and assembly of the modeled and validated vaccine component. Molecular docking between the vaccine component and TLR5 caused spontaneous reactivity in the receptor-ligand complex which activates immune cascades for the destruction of viral antigens. Hence, the selected antigenic SARS-CoV-2 epitopes are good candidates for the design of the immunogenic multi-epitope peptide vaccine against SARS-CoV-2.

A similar immunoinformatic approach was used by V.Baruah and S.Bose aiming to reveal significant epitopes in the S protein of SARS-CoV-2. The authors identified five CTL-specific epitopes (YLQPRTFLL, GVYFASTEK, EPVLKGVKL, VVNQNAQAL, WTAGAAAYY), that were highly affine for appropriate class I HLA alleles (A*02:01, A*03:01, B*07:02, B*07:02, HLA‐B*15:01). Upon modeling of molecular dynamics, it was shown that these epitopes bind the peptide-binding groove of the HLA-I molecule used for antigen presentation through multiple contacts, showing their potential for immune response generation. Some of these epitopes are candidates for the design of anti-SARS-CoV-2 vaccines. In addition to activating CTLs, the successful immunogens should generate a persistent humoral immunity. This study has revealed three such B-cell epitopes unique to SARS-CoV-2 [20].

D. Santoni et al. [21] used a bioinformatic methodology based on the selection of viral peptides which are at a distance of more than 3 mutational steps from humans, presuming that the probability of binding to HLA antigens increases with longer evolutionary distance from humans. In other words, the researchers seek those viral peptides which are absent in humans (nullomeres). Identification of the most human-distant peptides is quite important to minimize the autoimmunity risks. Of the 27 nullomeres, 25 were common to all the known SARS-CoV-2 strains. These peptides were called the third-order nullomeres that belong to the most distant class from humans were subjected to three additional selection stages, to choose those with the highest probability of exposure on the cell surface, i.e., (1) being the product of proteasomal cleavage, 2) being transferred to the cell surface, and (3) strongly binds to HLA molecules. A set of nine peptides was identified for subsequent experimental studies. The in silico selected SARS-CoV-2 peptides represent potential targets for the immune system, that should be then tested experimentally to confirm their immunogenicity. According to in silico prediction, the YVMHANYIF peptide is strongly binding to 27 various HLA antigens, the FLCWHTNCY and YIKWPWYIW peptides may bind 11 and 10 different HLA molecules, respectively, YYHKNNKSW peptide may interact with 8 HLA allelic variants.

The concept of multi-epitopic vaccine addresses identification and assembly of В- and Т cell-specific epitopes into an integral immunogen able to induce effective response at both immunity arms. The peptides and epitopes proved to be preferred candidates for vaccine development, due to simpler technology production, chemical stability, and absence of infectious potential. The multi-epitopic vaccines may contain epitopes for B cells, CD8+ CTLs, and CD4+ helper T cells.

Using immunobioinformatic approach, the five domains rich in common T- and B-cell epitopes were selected and connected by linker chains, in order to generate a more diverse and stable immune response. The resulting multi-epitope candidate vaccine NOM (nucleocapsid, ORF3a, membrane protein) represents itself a recombinant multi-epitope protein constructed and validated utilizing bioinformatics tools which include, as the name suggests, antigenic immune epitopes from three SARS-CoV-2 viral proteins. The optimized NOM protein structure is adapted to interactions with specific immune receptors. Upon modeling the molecular dynamics, maximal stability was shown for the NOM binding with TLR4 and HLA-A*11:01 (i.e., NOM–TLR4 and NOM–HLA-A*11:01 models). The in silico testing has shown that interaction of the chimeric protein with TLR4 and HLA-A*11:01 receptors induces both humoral and cellular immune responses, due to the composite nature of the construct including В- and Т cell epitopes [22].

С.Н. Lee et al. have reported on in silico identification of a comprehensive list of SARS-CoV-2 immunogenic peptides which could be used as target molecules for vaccine development. Among them 48 viral peptides showing a high degree of similarity with immunogenic peptides deposited in the Immune Epitope Database (IEDB), and 63 more new peptides with high immunogenic potential, which could be recognized by T-cell receptors. Searching for the immunogenic SARS-CoV-2 peptides, the authors focused on the haplotypes common in Europe and China. The 28 most promising SARS-CoV-2 peptides are shown to bind different HLA class I and HLA class II alleles (HLA-A*01:01, DRB1*07:01, DRB1*04:01), and could serve as a target for, respectively, CD8+ and CD4+ T cells. Based on HLА presentation and immunogenicity prediction, the five peptides were selected for subsequent experimental validation (VQMAPISAM, AMYTPHTVL, TLDSKTQSL, KVDGVVQQL, KVDGVDVEL), which potentially would specifically bind four different HLA variants (A*01:01, HLA-B*07:02, HLA-B*40:01, and, with the highest affinity, HLA-A*02:01 [23].

To perform prediction and cluster analysis of probable HLA alleles, which could interact with SARS-CoV-2 epitopes, В. Sarkar et al. [24] have used the HLAcluster 2.0 online tool ( After predicting the three-dimensional structure of highly antigenic, non-allergenic, and nontoxic Т- and B-cell epitopes, they compared the ability of these epitopes to bind HLA molecules. The HLA-A*11:01 allele was used as a receptor for docking with HLA-I epitopes, whereas HLA-DRB1*04:01 was tested with HLA-II epitopes. The best results were obtained for viral N-protein, the QLESKMSGK peptide for docking with HLA-I and LIRQGTDYKHWP peptide with HLA-II epitopes. The GVLTESNKK peptide from S-protein was the best for HLA-I epitopes, whereas the TSNFRVQPTESI peptide of the viral surface glycoprotein showed the best binding properties for HLA class II.

As based on successful modeling of peptide/protein docking, three anti-SARS-CoV-2 vaccine variants were proposed, using the selected epitopes, CV-1, CV-2 и CV-3. The CV-1 proved to be the best, by the molecular docking criteria. CV-2 showed maximal binding efficiency with HLA-DRB3*02:02 and HLA-DRB1*03:01. The CV-3 vaccine has better binding energy parameters with most HLA alleles (DRB5*01:01, DRB1*01:01, and DRB3*01:01). Since CV-1 showed the best results for protein/protein docking, it was recognized as the best construct of those three. Modeling of molecular dynamics and in vitro adaptation studies have been performed with only this vaccinal variant. The proposed vaccine constructs could be used for SARS-CoV-2 vaccination if the validation trials will yield satisfactory results [24].


Figure 2. Scheme of design of a multi-epitope subunit candidate vaccine using epitopes of B-cells, CTLs, and T-helper cells [25]

The unique construct of a recombinant multi-peptide subunit vaccine against COVID-19 contains 18 epitopes for CTLs, 6 epitopes for T-helper cells, and 9 epitopes for В cells from three SARS-CoV-2 proteins which participate in recognition of cellular receptors and viral penetration into the target cells (Fig. 2). To enhance immunogenicity, the construct is supplied by a sequence from human β-defensin (TLR3 agonist) which serves as a binding adjuvant with epitopes and linkers. Computational studies suggest that the vaccine is not allergenic, stable, and can elicit humoral and cellular immune responses. The synthesis and experimental evaluation of this vaccine are pending to determine its immunogenic activity. The authors hope that this vaccine will be synthesized and used in public healthcare [25].

SARS-CoV-2 possesses self-amplifying RNA in the cytosol, which allows the development of an RNA-based vaccine. These vaccines use the mRNA sequence of recombinant target protein, rather than the sequence of the target antibody. The vaccinal mRNA is then transferred by lipid nanoparticles to the cytoplasm where the immunogenic protein is translated. After release from the cell, this protein is quickly captured and processed by APCs, followed by HLA-mediated presentation on the surface of the antigen-presenting cell, followed by the activation of B and T cells and, accordingly, the antigen-specific humoral and cytotoxic responses.

The vaccines based on the cytoplasmic expression of chimeric viral mRNA are potentially superior to protein-based vaccines: they are more safe, efficient, easier to produce, and they can block chromosomal integration of the virus. The manufacturing of RNA-based vaccines is the most promising trend in vaccinology, due to capacity of wide-scale production, thus spating time during pandemics. Following injection, the vaccine RNA could be processed by immune cells and produce specific protein directly through translation followed by activation of other immune cells and antibody synthesis (Fig. 3) [7].


Figure 3. Schematic diagram of the mRNA-based vaccine targeted to the S protein of SARS-CoV-2

The mRNA-based vaccine targeted to the S protein of SARS-CoV-2 works by active immunization. This technique uses mRNA of the S protein, coated with lipid nanoparticles for effective delivery. Once injected into the muscle, the myocytes take up the lipid nanoparticle and then release the mRNAs into the cytoplasm for translation into the S proteins. These endogenously synthesized S proteins will be secreted to activate both humoral and cellular immune responses. S protein – spike protein; IM – intramuscular, LNP – lipid nanoparticle; DC – dendritic cell; MHC – major histocompatibility complex; Ag – antigen [7].

The lipid nanoparticle (LNP)-encapsulated mRNA-1273 based vaccine, produced by Moderna Inc. (USA) encodes SARS-CoV-2 S-protein. Upon the formation of sufficient antibody titers to S-protein, a double therapeutic effect may occur, i.e., the host immune system could eliminate the antigen-antibody complex, thus promoting virus clearance and alleviating its contagiosity [7]. Preliminary results of clinical trials have shown that the mRNA-1273 vaccine has induced high levels of both virus-binding and neutralizing antibodies, as well as strong cytokine response with participation of CD4 helper T cells, type 1 [26] and that the mRNA-1273 immunogenicity retained for, at least, 3 months [27].

Prefusion-stabilized protein immunogens that preserve neutralization-sensitive epitopes are an effective vaccine strategy for enveloped viruses. Structural studies have led to the search for mutations that stabilize Betacoronavirus spike proteins in the prefusion state, improving their expression and increasing immunogenicity. This principle has been applied to design mRNA-1273, an mRNA vaccine that encodes a SARS-CoV-2 spike protein that is stabilized in the prefusion conformation. Here we show that mRNA-1273 induces potent neutralizing antibody responses to both wild-type (D614) and D614G mutant2 SARS-CoV-2 as well as CD8+ T cell responses, and protects against SARS-CoV-2 infection in the lungs and noses of mice without evidence of immunopathology. mRNA-1273 is currently in a phase III trial to evaluate its efficacy.

K.S. Corbett et al. [28] identified 2 proline substitutions (2P) at positions 986 and 987 of S protein that effectively stabilized S proteins in the pre-fusion conformation. The authors performed structural analysis and developed serological tests in silico, without additional experimental verification. Similar to other prefusion-stabilized fusion proteins, S (2P) protein SARS-CoV-2 was more immunogenic at lower doses than wild-type S protein [28]. The 2P mutation has similar effects on the stability of S proteins and from other beta coronaviruses, suggesting a generalizable approach for designing S protein antigens for vaccination. Such generalizability is fundamental to pandemic preparedness [29].

The production of mRNA encoding the S-protein (2P) SARS-CoV-2 (mRNA-1273) was started in parallel with the preclinical evaluation. This led to the first phase I human clinical trial, which began on March 16, 2020. Thus, concepts based on new technologies, such as synthetic vaccinology, can facilitate and accelerate a vaccine development program based on pathogen sequences.

Protein glycosylation is a very common biological process, a form of posttransplant protein modification and regulation of protein location and functioning. Glycosylation of viral structural proteins is closely related to their replication and cell invasion, thus helping to escape the host immune response. Unusually high glycosylation degree of SARS-CoV-2 presumes its high mutation rates, thus sufficiently complicating the vaccine development. However, the mRNA-based vaccine technologies and targeting for S-protein may result in the production of antibodies to S-protein only, irrespectively of its glycosylation status. Taking into account useful features of mRNA-based vaccines, such as lack of integration into the genome, lack of induction of autoantibodies, the feasibility to produce mRNA vaccines in large quantities, and their high purity, mRNA-based vaccines are a promising choice for combating COVID-19 [7].


The development of vaccines is a long-term and costly process with high failure rates which requires several years for the production of a commercial product. Optimization of the manufacturing process allows sooner testing of the subunit vaccines and their release to the market. Protein-based vaccines include only antigenic parts of the pathogen causing an immediate immune response. Moreover, the vaccine does not contain a live pathogen. Hence, it could be used in immunocompromised patients. Computer-assisted studies show that the multi-epitope subunit vaccine is safe and effective against SARS-CoV-2 infection.

Common vaccines, attenuated, or inactivated, are not always able to provide immunity for the target antigen. Besides, the generally accepted approach to vaccine development causes multiple concerns for safety during pre-clinical and clinical trials. The subunit vaccines designed using in silico computer modeling may overcome these difficulties. Numerous study groups perform studies and publish their results in this field. The independent works carried out with different computational techniques provide higher reliability and reproducibility level of the results.

Individual genetic variations of HLA allelic variants may explain diverse immune responses to the virus infection and variable clinical outcomes in the population. Moreover, a detailed analysis of the common HLA alleles will help to develop more effective anti-COVID-19 vaccines, since searching for optimal vaccine constructs strongly depends on the detection of significant epitopes in spike glycoprotein, and other SARS-CoV-2 proteins recognized by the antigen-presenting cells, cytotoxic T-cells, and B cells using bioinformatics methodologies. This review presents a profile of in silico predicted SARS-CoV-2 immunogenic peptides for functional validation and vaccine development. Computer-assisted prediction plays an important role in rapid and cost-effective decision making for the prevention of infection spreading, and, finally, overcoming pandemics. Most attempts in the field of SARS-CoV-2 prevention and treatment are focused on the viral S protein, the main inducer of virus-neutralizing antibodies. The article concerns current achievements in the development of modern SARS-CoV-2 vaccines which, in sum, may stop this new viral infection.

Conflict of Interest

None declared.


  1. Glotov OS, Romanova OV, Eismont YA, Sarana AM, Scherbak SG, Kuzmich EV, Alyansky AL, Ivanova NE, Teplyashina VV, Serov YA, Zubarovskaya LS, Afanasyev BV. Comparative analysis of HGS and Sanger sequencing methods for HLA typing at a Russian university clinic. Cell Ther Transplant. 2018; 7(2): 72-82. doi: 10.18620/ctt-1866-8836-2018-7-4-72-82.
  2. Tian X, Li C, Huang A, Xia S, Lu S, Shi Z, Lu L, Jiang S, Yang Z, Wu Y, Ying T. Potent binding of 2019 novel coronavirus spike protein by a SARS coronavirus-specific human monoclonal antibody. Emerg Microb Infect. 2020;9(1):382-385. doi: 10.1080/22221751.2020.1729069.
  3. Pathological anatomy of lungs in COVID-19. Histology atlas. Ryasan, 2020. 52 p (In Russian).
  4. Arabi YM, Murthy S, Webb S. COVID-19: a novel coronavirus and a novel challenge for critical care. Intensive Care Med. 2020;46(5):833-836. doi: 10.1007/s00134-020-05955-1.
  5. Giamarellos-Bourboulis EJ, Netea MG, Rovina N, Akinosoglou K, Antoniadou A, Antonakos N, Damoraki G, Gkavogianni T, Adami ME, Katsaounou P. Complex immune dysregulation in COVID-19 patients with severe respiratory failure. Cell Host Microbe. 2020; 27(6):992-1000.e3. doi: 10.1016/j.chom.2020.04.009.
  6. Forouzesh M, Rahimi A, Valizadeh R, Dadashzadeh N, Mirzazadeh A. Clinical display, diagnostics and genetic implication of novel Coronavirus (COVID-19) epidemic. Eur Rev Med Pharmacol Sci. 2020;24(8):4607-4615. doi: 10.26355/eurrev_202004_21047.
  7. Wang F, Kream RM, Stefano GB. An evidence based perspective on mRNA-SARS-CoV-2 vaccine development. Med Sci Monit. 2020 May 5;26:e924700. doi: 10.12659/MSM.924700.
  8. Joshi A, Joshi BC, Mannan MA, Kaushik V. Epitope based vaccine prediction for SARS-COV-2 by deploying immuno-informatics approach. Inform Med Unlocked. 2020 Apr 29:100338. doi: 10.1016/j.imu.2020.100338.
  9. Wang F, Hou H, Luo Y, Tang G, Wu S, Huang M, Liu W, Zhu Y, Lin Q, Mao L et al. The laboratory tests and host immunity of COVID-19 patients with different severity of illness. JCI Insight. 2020 May 21;5(10):e137799. doi: 10.1172/jci.insight.137799.
  10. Alijotas-Reig J, Esteve-Valverde E, Belizna C, Selva-O'Callaghan A, Pardos-Gea J, Quintana A, Mekinian A, Anunciacion-Llunell A, Miró-Mur F. et al. Immunomodulatory therapy for the management of severe COVID-19. Beyond the anti-viral therapy: A comprehensive review. Autoimmun Rev. 2020 Jul;19(7):102569. doi: 10.1016/j.autrev.2020.102569.
  11. Shi Y, Wang Y, Shao C, Huang J, Gan J, Huang X, Bucci E, Piacentini M, Ippolito G, Melino G. C. COVID-19 infection: the perspectives on immune responses. Cell Death Differ. 2020;27(5):1451-1454. doi: 10.1038/s41418-020-0530-3.
  12. Li X, Geng M, Peng Y, Meng L, Lu S. Molecular immune pathogenesis and diagnosis of COVID-19. J Pharm Anal. 2020; 10(2):102-108. doi: 10.1016/j.jpha.2020.03.001.
  13. Kiyotani K, Toyoshima Y, Nemoto K, Nakamura Y. Bioinformatic prediction of potential T cell epitopes for SARS-Cov-2. J Hum Genet. 2020 Jul;65(7):569-575. doi: 10.1038/s10038-020-0771-5.
  14. Zhang S, Gan J, Chen BG, Zheng D, Zhang JG, Lin RH, Zhou YP, Yang WY, Lin A, Yan WH. Dynamics of peripheral immune cells and their HLA-G and receptor expressions in a patient suffering from critical COVID-19 pneumonia to convalescence. Clin Transl Immunology. 2020 May 10;9(5):e1128. doi: 10.1002/cti2.1128.
  15. Wang W, Zhang W, Zhang J, He J, Zhu F. Distribution of HLA allele frequencies in 82 chinese individuals with Coronavirus disease-2019. HLA. 2020; 96(2):194-196. doi: 10.1111/tan.13941.
  16. Thevarajan I, Nguyen THO, Koutsakos M, Druce J, Caly L, van de Sandt CE, Jia X, Nicholson S, Catton M, Cowie B et al. Breadth of concomitant immune responses prior to patient recovery: a case-report of non-severe COVID-10. Nat. Med. 2020; 26(4):453-455. doi: 10.1038/s41591-020-0819-2.
  17. Jamilloux Y, Henry T, Belot A, Viel S, Fauter M, El Jammal T, Walzer T, François B, Sève P. Should we stimulate or suppress immune responses in COVID-19? Cytokine and anti-cytokine interventions. Autoimmun Rev. 2020 Jul;19(7):102567. doi: 10.1016/j.autrev.2020.102567.
  18. Nguyen A, David JK, Maden SK, Wood MA, Weeder BR, Nellore A, Thompson RF. Human leukocyte antigen susceptibility map for SARS-CoV-2. J Virol. 2020 Jun 16;94(13):e00510-e00520. doi: 10.1128/JVI.00510-20.
  19. Bhattacharya M, Sharma AR, Patra P, Ghosh P, Sharma G, Patra BC, Lee SS, Chakraborty C. Development of epitope-based peptide vaccine against novel coronavirus 2019 (SARS-COV-2): Immunoinformatics approach. J Med Virol. 2020;92(6):618-631. doi: 10.1002/jmv.25736.
  20. Baruah V, Bose S. Immunoinformatics-aided identification of T cell and B cell epitopes in the surface glycoprotein of SARS-CoV-2. J Med Virol. 2020;92(5):495-500. doi: 10.1002/jmv.25698.
  21. Santoni D, Vergni D. In the search of potential epitopes for Wuhan seafood market pneumonia virus using high order nullomers. J Immunol Methods. 2020;481-482:112787. doi: 10.1016/j.jim.2020.112787.
  22. Enayatkhani M, Hasaniazad M, Faezi S, Guklani H, Davoodian P, Ahmadi N, Einakian MA, Karmostaji A, Ahmadi K. Reverse vaccinology approach to design a novel multi-epitope vaccine candidate against COVID-19: an in silico study. J Biomol Struct Dyn. 2020 May 2:1-16. doi: 10.1080/07391102.2020.1756411.
  23. Lee CH, Pinho MP, Buckley PR, Woodhouse IB, Ogg G, Simmons A, Napolitani G, Koohy H. Potential CD8+ T Cell Cross-Reactivity Against SARS-CoV-2 Conferred by Other Coronavirus Strains. Front Immunol. 2020 Nov 5;11:579480. doi: 10.3389/fimmu.2020.579480.
  24. Sarkar B, Ullah A, Johora FT, Taniya MA, Araf Y. Immunoinformatics-guided designing of epitope-based subunit vaccine against the SARS Coronavirus-2 (SARS-CoV-2). Immunobiology. 2020 May 11;225(3):151955. doi: 10.1016/j.imbio.2020.151955.
  25. Kalita P, Padhi AK, Zhang KYJ, Tripathi T. Design of a peptide-based subunit vaccine against novel coronavirus SARS-CoV-2. Microb Pathog. 2020 May 4;145:104236. doi: 10.1016/j.micpath.2020.104236.
  26. Anderson EJ, Rouphael NG, Widge AT, Jackson LA, Roberts PC, Makhene M, Chappell JD, Denison MR, Stevens LJ, Pruijssers AJ et al. Safety and immunogenicity of SARS-CoV-2 mRNA-1273 vaccine in older adults. N Engl J Med. 2020; 383(25):2427-2438. doi: 10.1056/NEJMoa2028436.
  27. Widge AT, Rouphael NG, Jackson LA, Anderson EJ, Roberts PC, Makhene M, Chappell JD, Denison MR, Stevens LJ, Pruijssers AJ et al. Durability of responses after SARS-CoV-2 mRNA-1273 vaccination. N Engl J Med. 2021; 384(1):80-82. doi: 10.1056/NEJMc2032195.
  28. Corbett KS, Edwards DK, Leist SR, Abiona OM, Boyoglu-Barnum S, Gillespie RA, Himansu S, Schäfer A, Ziwawo CT, DiPiazza AT et al. SARS-CoV-2 mRNA vaccine design enabled by prototype pathogen preparedness. Nature. 2020;586(7830):567-571. doi: 10.1038/s41586-020-2622-0.
  29. Graham BS, Corbett KS. Prototype pathogen approach for pandemic preparedness: world on fire. J. Clin. Invest. J Clin Invest. 2020;130(7):3348-3349. doi: 10.1172/JCI139601.

Volume 10, Number 1

Download PDF version

doi 10.18620/ctt-1866-8836-2021-10-1-13-23
Submitted 25 December 2020
Accepted 12 March 2021

Back to the list