Mapping the ACE2 binding site on the SARS-CoV-2 spike protein S1: molecular recognition pattern

Coronavirus SARS-CoV-2 enters the host cell via binding with the angiotensin-converting enzyme 2 (ACE2), and here we used computational modelling to study the molecular recognition pattern of this interaction The fragment of the N-terminal part of the enzyme containing amino acids 19-45 was used as the lead peptide in this study The structure of this peptide was systematically modified by successive replacement of its amino acids with alanine, serine, glycine, and phenylalanine Then docking energies were calculated for all these mutant peptides These docking energies were correlated with physical descriptors, proposed for the modelling of peptide-protein interactions, characterizing hydrophilicity and volume-related properties of amino acid side chains From these correlations the corresponding specificity factors were obtained for all amino acid positions, and thus the full description of the molecular recognition pattern of the ACE2 alpha 1 domain by the virus S1 protein binding site was obtained


INTRODUCTION
Coronavirus SARSCoV2, responsible for the COVID 19 pandemic, enters human cells through the interaction of the virus surface spike protein S1 with the human an giotensinconverting enzyme 2 (ACE2) on the host cell membrane [1,2]. The initial binding step of the virus par ticle on the surface of the host cell is followed by the fusion of the viral and cell membranes and entry of the viral RNA into the host cell [2][3][4]. This mechanism of the virus entry suggests that inhibition of the interaction be tween the S1 protein and ACE2 may be a promising op tion to combat the infection. A rather straightforward way to achieve this goal is inhibiting one of the binding sites involved in this interaction by using peptides that mimic the counterpart protein. As ACE2 is physiologically im portant for the functioning of the host cell, inhibition of the virus binding site on this enzyme by using peptides derived from the Sprotein structure, as proposed in [5], has understandable shortcomings. However, these do not apply in the case of peptides that mimic the ACE2 struc ture and inhibit the virus-receptor interaction by binding with the receptorbinding domain (RBD) on the S1 protein [6,7]. Therefore, this study, as well as our previous work [7], was focused on the design of such inhibitory peptides by using computational methods.
It can be suggested that the most straightforward de sign of effective peptide inhibitors that target the binding site on the S1 protein can be made proceeding from the peptide sequence of the virus binding domain of the ACE2 molecule [7]. The viability of this approach is based on the availability of structural data for proteins S1 and ACE2 and their complex [3,5], published as '6LZG' in the Worldwide Protein Data Bank (PDB) database (www.pdb.org). Using these data, we modelled the struc ture of the S1 and ACE2 complex as illustrated in Fig. 1.
Initially, these structural data were used for com putational analysis of the interaction of RBD on the S1 protein with ACE2 and its fragments [6]. That study demonstrated effective interaction of the peptides derived from the ACE2 structure with the SARSCoV2 S1 protein, as already a single α 1 helix domain peptide that contains amino acids 21-55 of the ACE2 Nterminal sequence binds with the S1 protein with almost the same effectiveness as the fullsize protein [6]. It was also revealed that this sequence includes 12 amino acid residues of ACE2 that seem to interact with the RBD of SARSCoV2, whereas 10 other amino acid residues participating in the virus binding process come from other parts of the ACE2 molecule [6]. Thus, the interaction of the spike protein with its receptor site on the ACE2 molecule is clearly focused on binding the α1 domain with the S1 protein ( Fig. 1).
Following [6], in our earlier study [7] we mapped the α1 domain binding site on the S1 protein using com putational docking analysis; in that study the peptide sequence 19-45 of the Nterminal part of ACE2 (STIEEQ AKTFLDKFNHEAEDLFYQSS) was truncated from both ends, and the docking of 200 peptide fragments at the binding site on the S1 protein was analysed. We found that the α1 domain sequence can be shortened to a certain extent without significant reduction of the docking energy, which is 'good news' for therapeutic peptide development [8]. This conclusion about the influence of peptide length was confirmed by results published in [9], where the binding of three peptides, also derived from the same ACE2 α1 domain and containing 26, 23, and 20 amino acids, was computationally investigated with the S1 protein.
Thereafter, similar docking analysis of peptides derived from the α1 domain of ACE2 and containing 23 and 19 amino acids was reported [10]. These results demonstrated that alteration of peptide length from 19 to 23 amino acids has practically no effect on the positioning of these compounds in the binding site on the S1 protein, although the binding effectiveness is somewhat reduced by peptide shortening. In conclusion, all these results support the idea that short peptides can effectively bind with the S1 protein and therefore can be used for developing antiviral drugs. To achieve this goal, it seems to be important to improve the effectiveness of peptide binding, first and foremost, through directed modification of the peptide primary structure. Therefore, we continued mapping the S1 binding site for ACE2derived peptides. In this paper we analyse the molecular recognition pattern of this interaction interface by combining computational docking calculations with methods of quantitative structure-activity analysis.

METHODS
The input files used for modelling an ACE2 and the receptor binding domain of theCoV2 spike protein S1 (amino acids from 333 to 527) as well as the complex formed between these proteins were built starting from data about the spatial structure of these proteins, obtained by Xray structure analysis [3,5] and listed as '6LZG' in the PDB database (www.pdb.org).
The peptides used for the docking study were derived proceeding from the α1 domain of the ACE2 protein. The lead peptide sequence STIEEQAKTFLDKFNHEAEDLFYQSSL was systematically modified by gradually altering all amino acids by alanine, serine, glycine, and phenylalanine. Thus, the recognized procedure of alanine scanning [11] was extended to describe binding properties of mutants of serine, glycine, and phenylalanine by using computational data. Computer modelling of the complex formed between the SARSCoV2 spike protein S1 (right molecule, cyan) and human angiotensinconverting enzyme 2 (left molecule, green) by using structure data listed at '6LZG' in the PDB database (www.pdb.org).
The lead peptide was mutated in each position of amino acids with conserved main scaffold of the peptide chain. The best scoring results of peptide positioning were picked for the peptide-S1 complex. The docking energy values were calculated and further processed by using con ventional quantitative structure-activity analysis methods. The para meters and the procedure of the MD simulations were described in detail in our previous work [7].
Recently one more computational analysis of this interaction interface was published [17]; there par ticipation of other 'contact interactions' is mentioned in addition to hydrogen bonds. The list of contact inter actions includes van der Waals and hydrophobic bonds as well as / and /cation interactions. Based on [17], formation of hydrogen bonds with participation of amino acids Q24, K31, E35, E37, D38, Y41, and Q42 can be expected in the case of the lead peptide, while contact interactions can be suggested in the case of amino acids Q24, T27, D30, K31, H34, D38, Y41, and Q42. As can be seen, these lists significantly overlap, demonstrating that the actual specificity pattern that governs the interaction interface can be rather complex. This is a good justifi cation of the following analysis.

Computational sitedirected mutagenesis of the lead peptide
The contribution of individual amino acid residues into the effectiveness of peptide binding was studied by combining the sitedirected mutagenesis method with computational docking analysis as suggested in [11]. Firstly, we made consecutive replacement of all amino acids with alanine, as the methyl group of this amino acid cannot be involved in polar interactions and therefore this method is often used for the determination of 'hot spots' of the protein-peptide interaction interface. Secondly, we also mapped polar and hydrophobic properties of the binding site by scanning the lead peptide with serine, glycine, and phenylalanine. Results of this analysis are summarized in Fig. 3, where the docking energies for alanine, serine, glycine, and phenyl alanine mutants of the lead peptide are compared. As can be seen in Fig. 3, in many cases the docking effectiveness of the lead peptide, E dock = -12.6 kcal/mol, is not affected by the replacement of the initial amino acid. This means that these amino acids are probably not involved in the peptide interaction with the S1 protein.
On the other hand, however, Fig. 3 also reveals several hotspot positions as the alanine scan caused a weakening of the peptide binding in the following positions: Q24, T27, F28, D30, K31, E35, D38, Y41, and L45. Importantly, interaction of all these amino acids with the S1 protein has been suggested in structural studies cited above. Therefore, it can be concluded that the computational docking study describes adequately the peptide-protein interactions in the case of the formation of the ACE2-S1 complex.
However, the influence of alanine mutations on the docking energy is rather different along the peptide chain; moreover, these effects are not similar to the changes caused by other mutations. This means that the interplay of different specificity factors governs, indeed, the binding process, as was suggested in [17].
It is interesting to note that all hotspot amino acids are located on the same side of the spiral structure of the α1 domain, facing the S1 protein. This situation is illustrated in Fig. 4, where mutual positioning of the hotspot amino acids in the α1 domain and its binding site on the S1 protein are shown.

Molecular recognition pattern
The molecular recognition pattern of the peptide binding interface can be characterized in terms of structureactivity relationships, assuming that the contribution of each amino acid can be presented as the sum of interactions quantified by certain specificity descriptors [18]. The possibility of encoding these interactions in terms of two orthogonal sets of descriptors, which characterize the volumerelated (ϖ) and hydrophilicity related (η) effects of amino acid side groups [19], simplifies this analysis and opens new perspectives for converting structural data to the property space. These descriptors, listed in Table 1, demonstrate that the hydrophilicity parameter η has a negative value for alkyl and nonpolar groups and a positive value for polar and ionic groups, independently of the sign of the net charge on the group. The volume descriptors ϖ vary from -4.04 for glycine, which has the side group of minimal size, up to 4.28 in the case of tryptophan, which is the bulkiest.
It is important to mention that the parameters ϖ are well correlated with the conventional scale of the molar refractivity (MR) values (R 2 = 0.9634), commonly used for the characterization of the volumerelated properties of amino acid side chains [20]. At the same time, the correlation between the hydrophilicity parameters η and the classical hydrophobicity parameters [21] is weaker (R 2 = 0.7153); still these values show obvious similari ties.
The systematic scans of peptide binding properties by using alanine, serine, glycine, and phenylalanine together with the amino acid in the parent peptide structure provide five data points for most amino acid positions.  This is sufficient for structure-activity correlation using the descriptors in Table 1. For example, in the case of mutants Q24A, Q24S, Q24G, and Q24F the E dock values were -12.0, -12.2, -11.8, and -10.9 kcal/mol, respectively. Together with the E dock value -12.6 kcal/mol for Q24, the following correlation was obtained: E dock = (-11.9 ± 0.22) + (0.06 ± 0.08)ϖ + (-0.49 ± 0.21)η. (1) This interrelationship describes the influence of the amino acid side group on the docking energy in terms of two specificity factors, ϖ and η. In this case the volume related effects, characterized by ϖ, play no statistically relevant role, whereas the docking energy is governed by the hydrophilicity (η) of the amino acid side group, quantified by the specificity factor -0.49.
Similar correlations were obtained for all positions of the peptide sequence. Therefore, we suggest that these data characterize the molecular recognition pattern of the binding interface. All results of this analysis are sum marized in Fig. 5. It is important to emphasize that the physical meaning of the interactions depends on the selection of the descriptor sets.
Hydrophilicity is revealed in the case of most polar amino acids in the lead peptide (Fig. 5). At the same time the contribution of this specificity factor varies significantly in different positions. On the other hand, there are also volumerelated effects (see Fig. 5), and these interactions are also distributed throughout the peptide. In most cases these effects support peptide binding.

Hot spots and the molecular recognition pattern
The location of the hotspot amino acids in the lead peptide was identified by structural studies as described above as well as by alanine scan as shown in Fig. 2. In summary, this list of amino acids includes S19, Q24, T27, D30, K31, E35, E37, D38, Y41, and L45. As Fig. 5 shows, these positions are indeed important for peptide binding because at least one specificity factor makes a significant contribution in these cases. Therefore, the molecular recognition pattern provides the same information as other approaches. On the other hand, however, the recognition pattern characterizes these interactions also quantitatively.
In this study computational docking energy values were used for analysing the peptide-protein interaction interface; therefore, the effects of entropy on the rec og nition pattern were not analysed. Irrespective of this draw back, we believe that the presented results may still be useful for further optimization of the peptide structure and will enable improving its binding properties. Most interest ingly, already the scan results reveal some point mutations that increased the peptide binding effectiveness (Fig. 3).

CONCLUSIONS
Docking of 108 peptides derived from the ACE2 binding domain sequence (19-45) STIEEQAKTFLDKFNHEAE DLFYQSSL with the receptor binding site of the CoV2 virus S1 protein was analysed and the molecular recog nition pattern of the peptide-protein interaction interface was mapped quantitatively. The studied peptides were obtained by systematic scanning of the lead peptide sequence with alanine, serine, glycine, and phenylalanine. The results revealed that replacement of amino acids in the lead peptide reduced its binding effectiveness in certain critical positions (Q24, T27, D30, K31, E35, E37, D38, Y41, and L45), which agree with the sites where the formation of hydrogen bonds and a salt bridge can be predicted proceeding from structural data. The scanning results were used for correlation analysis of the influence of the amino acid side chain structure on peptide binding, and the contributions of distinct interactions were quantified at each amino acid position. The set of ortho gonal descriptors that characterize volumerelated and hydrophilic properties of the amino acid side groups was used for this analysis. The set of these correlations was used to char acterize quantitatively the molecular recognition pattern of the peptide binding site on the CoV2 virus S1 protein.