Introduction

Modification of proteins is widespread throughout nature, increasing the diversity of protein structure and hence function by up to two orders of magnitude1. Yet, our ability to synthetically mimic nature’s capacity to install such modifications is essentially limited by the chemistry that is available. Reaction at a single amino acid or site, among a sea of reactive carboxylic acids, amides, amines, alcohols and thiols, is a significant and exciting challenge in both chemo- and regioselectivity. Potential transformations, if they are to be relevant, are moulded by the need for biologically ambient conditions (that is, <37 °C, pH 6–8, aqueous solvent) so as not to disrupt protein architecture and/or function. Ideally, this should proceed with near total conversion to generate homogenous constructs2,3,4. The applications of modified proteins are many; they are as varied as the in vivo tracking of protein–fluorophore conjugates5 to the polyethylene glycol (PEG)ylation of therapeutic proteins to reduce immunogenicity6, from the production of materials with novel properties7 to probing the mechanism of pathological enzymes8.

While many past examples of so-called ‘bioconjugation’ exist (and even dedicated journals), those that teach a strong strategic lesson are more rare. The rigour of the chemical approach (including proper characterization) has been lacking—supplanted perhaps by a pragmatic desire for useful product. In an era now hungry for precise molecular knowledge of protein function, previously rare (historical) examples of precise protein chemistry become vital.

We (subjectively) consider that a seminal example can be found in the work of Wilchek9 and later Bender10 and Koshland11. Their chemical conversions of serine to cysteine have, as singularly early examples of site-directed protein mutagenesis, we believe, still not been fully appreciated. It set the stage for approaches that are only now coming to fruition in a broadly applied manner. In a post-genomic era that is more conversant with limitations of ‘gene-only’ methods, this will likely prove uniquely powerful12.

Over the past two decades, a number of methodologies have emerged (for example, Fig. 1) for undertaking modification at both natural and non-standard amino-acid residues, in vitro and in vivo, building on a previously limited toolkit for modification primarily at cysteine and lysine13. In this review, we will focus on these key developments that have greatly expanded the protein chemistry reaction palette. In particular, we will highlight merits and current limitations of reactions, in the context of the protein-conjugate bond-type formed.

Figure 1: A ‘tag-and-modify’ approach to protein modification.
figure 1

A uniquely reactive amino-acid ‘tag’171 is installed on a protein surface for reaction with a desired ‘modification’. Examples include (a) fluorophores5,125 (b) glycosylation35 (c) prenylation37 (d) PEGylation87 (e) attachment to solid surfaces172 (f) peptides44 (g) biotinylation90 and (h) antibody conjugation159.

Modifications of natural amino acids

Natural amino acids bring with them immediate accessibility without the need for more specialist techniques. Yet, the palette of functional groups is more limited and abundance may play a critical role in determining selectivity and hence precision. Low abundance (for example, cysteine) may allow (re)positioning to allow use of site-selective strategies; this is effective reassignment of the associated codons to encode the site of reaction for a complete and a hence precise alteration. Yet with high abundance, full reaction of all may be unlikely and hence reactions of all may generate (often statistical) mixtures of many products. Moreover, many of the functional groups in proteins are nucleophiles. This creates both a limitation on their use alone (differentiation based on selectivity may be more difficult) and an opportunity (unnatural amino acids (UAAs), see below, may then be designed with very different properties that are chemically distinguishable from this natural nucleophilic set).

Cysteine Sprotein–C bonds

As the most robustly nucleophilic of the 20 canonical amino acids, the thiol of cysteine offers a unique reactive handle within proteins, a property exploited extensively in nature1. Although pH may need to be controlled, selective reaction at cysteine over other nucleophilic residues such as lysine and histidine can be achieved14, while the low abundance (<2%) of cysteine in proteins often allows for facile modification at a single site3. In addition to functionalization of a protein of interest, this can also allow ready mutational repositioning and codon reassignment (through Cys→Ser/Gly mutation)15.

The selective reactions of cysteine with electrophiles such as α-halocarbonyls and maleimides have been suggested for almost a century (Fig. 2b,c)16. Indeed, iodoacetamide is used routinely for capping before digestion for protein sequencing17. Notably, some derivatives react with N-nucleophiles18. However, commercial availability and ease of use and synthesis of maleimide derivatives13,19 have led to widespread use (for example, vaccine candidates20 or modified enzymes21)13. Their use results in a reaction typically considered irreversible, yet it has been suggested that this can be reversed by competitive thiols22. Hydrolysis of the maleimide adduct moiety can also lead to subsequent decomposition of protein conjugates22,23; interestingly, this may advantageously reduce cited reversibility22. Thus, potential degradation may be an important consideration, particularly when instability may give rise to an unwanted mixture of products. Interestingly, use of bromomaleimides has opened up the possibility of reversible conjugation, allowing modulation of activity and in vivo monitoring, while also allowing the bridging and stabilisation of native disulfides19,24. The rare 21st amino acid, selenocysteine, can also be engineered into proteins and used to react with maleimides; greater Se nucleophilicity can allow conjugation selectivity over cysteine residues25.

Figure 2: Chemical modifications at cysteine.
figure 2

(a) Aminoethylation26,27 (b) iodoacetamides17 (c) maleimides19,173 (d) Dha formation166 (e) disulfide formation35 (f) reaction of Dha with thiols167 and (g) desulfurization of disulfides.

Recently, aminoethylating agents have been used; the resulting thioether products (‘thia-lysines’) can mimic lysines bearing post-translational modifications (PTMs) such as methylation and acetylation, modifications with key roles in eukaryotic cells, particularly in the histone proteins that package DNA (Fig. 2a)26,27. More recently, transition metal-catalysed modifications of cysteine have been reported, such as rhodium-catalysed reaction with diazo compounds, although potential side reactions with tryptophan have been noted as a limitation28. Thiyl radicals at Cys can also be utilized for unique chemistries (see Box 1).

The use of cysteine nucleophiles has traditionally been limited to in vitro application due to the concentration of free thiols in cells (for example, glutathione). Some attempts have been made to conduct selective cellular labelling through introduction of Cys at particularly reactive sites of cell-surface proteins29.

Cysteine Sprotein–S bonds

The ability of sulfur to alter oxidation state is often exploited in natural redox reactions. Sulfur–sulfur bonds are key in maintaining protein tertiary and quarternary structure via interchain bridges. This property of sulfur can be exploited in the synthetic modification of proteins; formation of disulfides between thiol and cysteine can occur under an ambient atmosphere (Fig. 2e). However, rate of reaction is often slow, and disulfide exchange (ideally with kinetic control) is a favoured method for installing modifications; Ellman’s reagent has long been used to quantify free thiols.

Additional thermodynamic driving force can be exploited to favour formation of disulfide on-protein but can be unpredictable. Thus, reagents that allow kinetic control have been developed, typically relying upon thiol-specific electrophilicity. The use of methanethiosulfonates and phenylthiosulfonates has allowed quantitative, selective modification of proteins, giving labelled proteins30, modified enzymes31 and glycoconjugates32,33,34. van Kasteren et al.35 utilized such reagents to generate a mimic of P-selectin glycoprotein ligand-1, via a dual modification with a methanethiosulfonate (to generate a sulfotyrosine mimic) and triazole-forming reaction (to install a mimic of sugar sialyl-Lewisx). Selenenyl–sulfides show enhanced reactivity (Fig. 2e); S–Se intermediates can either be installed on-protein, allowing reaction with the desired thiol (electrophilic strategy), or via addition of a selenenyl–sulfide reagent to cysteine directly (nucleophilic strategy)36,37. Mechanistically, both routes appear to exploit electrophilic sulfur in the S–Se bond as a source of their chemoselectivity.

Lysine and N-terminus Nprotein–C bonds

Despite high natural abundance, lysine remains a popular choice for modification due to the number of successful reactions that can be applied to highly nucleophilic primary amines4. This is particularly the case where selectivity (as opposed to reactivity) for the site of modification is perceived not to be important, or where multiple conjugations are desired, for example, in the display of multiple antigens for creation of conjugates as putative vaccines38. Preferential conjugation with amines, over the nucleophilic thiol of cysteine, can, in principle, be achieved through use of ‘harder’ electrophiles such as activated esters39, sulfonyl chlorides13 or isothiocyanates40. Indeed, Edman degradation, the classical reaction of N-terminal protein sequencing, relies on N-terminal modification with phenyl-isothiocyanates. Unsaturated aldehyde esters are also finding increasing favour due to their ability to undergo selective irreversible azo-electrocyclizations, such as in the installation of positron-emitting-metal-binding ligands41.

An alternative, well-established reaction for modification at lysine residues is through reductive alkylation using aldehydes in the presence of sodium cyanoborohydride42. The higher stability of this reagent over sodium borohydride allows selective reaction at an appropriate pH, although the rates of reaction may be sluggish due to slow imine/iminium formation in water. Iridium catalysis utilising formate as reductant can accelerate the reaction43.

While lysine is typically the most nucleophilic amine in proteins, the N-terminus may display unique reactivity. N-terminal modification in the seminal coupling of a N-terminal cysteine and a C-terminal thioester via the ‘native chemical ligation’ (NCL) reaction is one of the most important methods for the synthetic construction of the backbone of polypeptides. Since its introduction by Kent8,44, NCL has been used extensively for the generation of synthetic proteins45. The intermolecular formation of an intermediate thioester is followed by a rapid S→N-acyl shift46, resulting in the formation of native peptide bond. Applications in fully synthetic proteins have been reviewed elsewhere45,47 and many amino acids can now be generated at the ligation site in place of cysteine48. NCL may also be utilized in the ‘semi-synthesis’ or ‘expressed protein ligation’ of proteins through the site-selective ligation of recombinantly derived thioesters and synthetic peptides49. In these latter methods thioesters are usually generated by exploiting the protein self-splicing activity of inteins50,51. Recent work by Vila-Perelló et al.52 has greatly improved the generation and purification of recombinant thioesters, allowing the semi-synthesis of proteins to gain increasing utility as a method for protein modification. The NCL reaction is perhaps illustrative of the features of success in protein chemistry since it relies in essence on enhanced, synergistic chemoselectivity derived from more than one functional group in combination (proximal amine and thiol, see Box 2). The reaction of an N-terminal Cys with cyanobenzothiazole, which ‘borrows’ from the chemoselectivity in the last step of luciferin formation, can be viewed similarly53. The N-terminus has also been used to generate uniquely reactive ketones via biomimetic transamination mediated by the co-factor pyridoxal-5′-phosphate (PLP; see section entitled Manipulating carbonyls to Cprotein=N bonds). The different pKa of the N-terminus can also be used to exploit pH-dependent chemistry with resulting differences in reactivity and selectivity.

Modifications at UAAs

UAAs present an immediately obvious opportunity to provide potentially unique chemical handles at which to undertake site-selective modification of proteins in a more broad and free-ranging way than at natural. Yet, modification of such residues can be limited by methods for their installation and the chemistry available for reaction. Here we try to give illustrative examples of both and aim to do so in a representative manner. Yet, it should be noted that many hybrid strategies that ultimately exploit the same functional groups and bond-forming processes can be considered in other ways, which are not exhaustively discussed here, such as through the use, for example, of chemoenzymatic methods to first perform an enzymatic attachment of a functional group or ‘tag’ that is then reacted.

Staudinger amide Nprotein–C bonds

The Staudinger ligation between an azide and triarylphosphine, an extension of the Staudinger reaction54, was a conceptually exciting development. Here, elegantly, an electrophilic trap was used to divert hydrolysis of the intermediate aza–ylid, generating instead a stable amide bond (Fig. 3a)55. This has allowed the modification of azido-glycoproteins incorporated into the cell-surface glycocalyx of eukaryotic cells via the sialic acid biosynthetic and metabolic machinery. Indeed, the modification of sugars in glycoproteins was subsequently shown to be applicable to the remodelling of cell surfaces in living animals, demonstrating an apparently good level of biocompatability5. With the development of techniques for site-specifically incorporating azido-amino acids, this ligation became applicable to the direct modification of protein side chains, first at auxotroph-incorporated azidohomoalanine56 and subsequently at 4-azidophenylalanine incorporated by amber-stop codon suppression57 (see Box 3). Applications of Staudinger ligation in bioconjugation have been reviewed58, with uses as diverse as fluorogenic labelling59, epitope tagging of G-protein-coupled receptors60 and the installation of photoswitches61.

Figure 3: The Staudinger ligation.
figure 3

(a) A ‘bio-orthogonal’ labelling of protein azides55. (b) ‘Traceless’ Staudinger variants involve loss of the phosphorus-containing prosthetic group63.

Soon after the initial report, a ‘traceless’ variant was reported by the groups of Raines62 and Bertozzi63. Thus, an amide bond can be generated without residual phosphine oxide (Fig. 3b). Other variants such as a three-component Staudinger ligation have allowed the site-specific installation of amide-bonded glycomimetics64,65. More recently, Serwa et al.66 reported a phosphite-Staudinger reaction with a stalled intermediate phosphoramidate. In addition to negating the problem of possible phosphine oxidation67, this ‘traceless’ reaction valuably allows installation of phosphate mimics66.

Despite its obvious strengths, the use of the Staudinger ligation has diminished recently due to slower associated kinetics, the retained triarylphosphine oxide appendage in ‘non-traceless’ variants and problems related to phosphine oxidation and possible side reactions of phosphines in ‘traceless’ variations65,67.

Heterocycles from formal or concerted cycloadditions

In 2002, the groups of Sharpless68 and Meldal69 independently reported a stepwise modification of a classical reaction of organic chemistry, the Huisgen70–Dimroth71–Michael72 1,3-dipolar-cycloaddition between an azide and alkyne. They found that triazole formation was dramatically accelerated by the use of copper(I) even at room temperature, and was highly tolerant of both water and oxygen (Fig. 4a). The reaction was rapidly embraced by those seeking a more modular and less bespoke approach to molecular additions and has since found widespread use in the pharmaceutical and material industries, but its impact on bioconjugation has been particularly telling, initiating a number of formal and actual cycloadditions that have expanded the ability to label biological systems73.

Figure 4: Reactive handles used for formal and actual cycloadditions on proteins.
figure 4

(a) CuAAC68 and SPAAC90, and (b) IEDDA101 and ‘photo-click’112 reactions for site-selective protein modification with reported rates for small-molecule models (‘on-protein’ rates are given in brackets where reported). No rate is given for CuAAC due to the additional dependence on copper concentration.

The absence and reasonable inertness of azides and alkynes in biology has made the copper-catalysed azide–alkyne ‘cycloaddition’ (CuAAC) an excellent candidate for undertaking modification of biomolecules (although it should be noted that the CuAAC is not a true cycloaddition, rather proceeding via a metallocyclic intermediate74). In early demonstrations, Wang et al.75 coated cowpea mosaic virus with azide or alkyne moieties via nonspecific lysine labelling and then undertook CuAAC reactions to fluorescently label the capsid. Despite some limitations, such as need for organic co-solvent, unspecific labelling, incomplete conversions and breakdown in capsid structure, the selectivity of triazole formation gave a major indication of the potential power of the reaction. Soon after Speers et al.76 reported that CuAAC could be undertaken highly selectively in cellular lysates for activity-based profiling of intracellularly labelled proteins, thereby demonstrating potential tolerance towards cellular components. As with the Staudinger ligation, the ability to site-specifically incorporate UAAs into proteins allowed expanded application of the CuAAC. Having previously reported the incorporation of azide- and alkyne-containing UAAs into proteins in Escherichia coli via amber-stop codon suppression, the group of Schultz77 reported the incorporation of both O-propargyltyrosine and p-azidophenylalanine into proteins in Saccharomyces cerevisiae. They went on to demonstrate selective labelling of these UAAs using CuSO4 and copper wire (as indirect sources of Cu(I)) at 37 °C. Despite only partial conversions, these examples represented the first site-specific CuAAC on protein surfaces.

In a series of papers, the group of Tirrell78,79,80 reported that azide-containing amino acids could act as substrate surrogates for E. coli methionyl transfer RNA (tRNA) synthetase, resulting in their incorporation into cellular proteins. It was demonstrated that these ‘tagged’ proteins could undergo CuAAC to label the cell surface of E. coli for the first time78,79, as well as be used to identify newly synthesized proteins from stable cell line lysates80. This work importantly identified copper(I) bromide as a more efficient source of Cu(I) negating the need for an added reductant, yet a cellular toxicity of the metal catalyst to E. coli was suggested, with some exposed cells showing an unusual phenotype78,81. Although not yet precisely delineated, this toxicity has been attributed to the generation of reactive oxygen species that may cause intracellular damage82, implying that control of the Cu(I) oxidation state may prove key to effective reactions.

Perhaps at odds with a notion of general toxicity is the fact that several essential proteins in organisms utilize copper, leading in some cases to a relatively high cellular content83. For example, in yeast, estimates of over 105 atoms per cell have been made with a relatively low associated toxicity as judged by MIC50 of >0.7 mM84. Lack of toxicity is attributed to a highly conserved biological system for maintaining Cu(I) bound to a series of carriers, preventing the release of free copper ions that could generate reactive oxygen species. Therefore, ligands such as THPTA83, BTTES85 and histidine82 that have been used to generate Cu(I) complexes and that maintain and stabilize the metal oxidation state while also negating the potential toxicity of exogenous reductants seem a logical approach. Indeed, these have allowed the labelling of living systems including the labelling of glycans in developing zebrafish embryos85. Moreover, an alternative approach has recently been reported by Uttamapinant et al. Rather than reducing the toxicity of the catalyst, they used chelating azides, leading to a significant reduction in the required metal loading; this too allowed cell-compatible labelling86. Despite these developments, the use of CuAAC for intracellular modification in live cells is still to be reported and it is worth considering that eukaryotic cells are likely to offer additional unique challenges, which may be highly dependent on cell type.

Despite possible cellular limitations, CuAAC remains invaluable for in vitro protein modification, due to its high specificity, reasonably fast reaction rate and ease-of-use. This has led to a wide range of CuAAC reagents now being commercially available for bioconjugation. Indeed, the CuAAC can be performed site-selectively with complete conversion34 and has been used in many significant applications, such as the generation of PEGylated proteins87, the generation of dual PTM glycoprotein mimics due to its orthogonality to existing cysteine chemistry34,35, cellular proteomic analysis (BONCAT)80, a quantitative method for primary cell proteomics (QuaNCAT)88, and the construction of highly-valent protein nanoparticles89. Despite this compatibility, the perceived toxicity of copper has led to the exploration of alternative cycloaddition-type reactions.

The group of Bertozzi90 has removed copper from the equation entirely, first reporting a strain-promoted azide–alkyne cycloaddition (SPAAC) in 2004. Building on work by Wittig and Krebs91 in the 1960s, they found that highly strained cyclooctynes reacted rapidly at room temperature with azide-‘tagged’ glycoproteins in a reaction requiring no exogenous ligands or catalysts. No toxicity was observed during the reaction on the surface of mammalian cells. In its original format, the SPAAC reaction displayed similar, relatively slow, kinetics to the Staudinger ligation (Fig. 4a)67. To improve the rate, both difluorinated cyclooctynes92 (DIFO) and dibenzocylooctynes93 were independently reported allowing the visualisation of dynamic processes. In a particularly striking example, Laughlin et al.94 utilized DIFOs to visualize the development of glycans during zebrafish embryo growth, demonstrating a high degree of specificity and ‘bio-orthogonality’ at slightly faster rates than previously reported using the Staudinger ligation. Further enhancements in rates have been reported through the generation of biarylazacyclooctynones95 and cyclopropyl-fused bicyclononynes96 (Fig. 4a). The site-specific incorporation of cyclooctynes97 and biscyclononynes98 into proteins by amber-stop codon suppression has also recently been reported. However, limitations remain, such as occasional difficulty in the synthesis and handling of strained and unstable compounds, while crucially a degree of incompatibility towards cysteine has been reported99,100. In addition, these reactions remain relatively slow (Fig. 4a). As a general comment, many reported rate constants used to compare protein reactions have typically been calculated under conditions that often vary significantly, and most in fact not even on proteins but on small-molecule models; thus, direct comparison of rates should be undertaken with some caution and different derivatives may be more applicable to certain situations than others.

Inspired by the development of SPAAC reactions, the groups of Fox101 and Hilderbrand102 began to investigate the use of inverse-electron demand Diels-Alder (IEDDA) reactions as a method for bioconjugation. It was found that the reactive dienes trans-cyclooctene101 and norbornene102 react relatively rapidly with suitable tetrazine dienophiles (which release nitrogen irreversibly on subsequent retro-[4+2]-cycloaddition) allowing protein labelling at rates up to 1,000 times faster than SPAAC in the case of trans-cyclooctene (Fig. 4b). Inspired by the work of Dommerholt et al.96, it was found that trans-bicyclononene reacted at yet faster rates (while noting the caveats on rate determinations given above)103. In addition to allowing labelling of highly dynamic processes, such rapid and efficient reactions allow the concentrations of reactive partners to be lowered significantly, reducing background labelling particularly in cases where it is implausible to wash away excess reagent, such as intracellularly or in animal models104.

To generally enable IEDDA protein reactions, key reactive UAAs (tetrazine105, norbornene106,107,108, cyclooctene98,107 and biscyclononene98) have been incorporated into proteins by amber-stop codon suppression (see Box 2). Such strain-promoted cycloadditions offer intriguing and exciting possibilities for future protein labelling where the speed of labelling is vital, and recent developments suggest that the cyclooctyne-azide and cyclooctene-tetrazine reactions may have a degree of mutual compatibility, thereby allowing multi-site labelling109,110. Some limitations remain, such as the isomerisation of trans-cyclooctenes in the presence of thiols98 (cf reaction of thiols with cyclooctynes noted above) and the potential instability of tetrazines110. Also, in many variants of these SPAAC and IEDDA, reaction mixtures of regioisomers are formed (unlike the CuAAC, which is highly 1,4-selective) and in some cases bulky linkages may prevent effective syntheses of functional structures (as opposed to those that have simply been labelled) such as PTM mimics, which can often be quite small and structurally subtle.

An intriguing alternative approach to cycloadditions on proteins has been reported in a series of papers by Lin. Some tetrazoles can act as latent sources of nitrile imines, which can undergo [3+2]-cycloadditions with unactivated alkenes (Fig. 4b)111. Their generation requires irradiation with ultraviolet light (this is termed a ‘photo-click’). The reaction has been used to modify a number of alkenyl-UAAs site specifically, such as homoallylglycine112 and cyclopropenes113, while the genetic incorporation of tetrazoles has also been achieved as a reactive handle for undertaking ‘photo-click’ reactions114. While the rates of reaction are now approaching levels seen for norbornene–tetrazine conjugations, the reaction is still somewhat slower than cyclooctene–tetrazine reactions. Nonetheless, the ability to spatially and temporally control the reaction through light makes the ‘photo-click’ an attractive alternative for the site-specific labelling of proteins113.

Metal-mediated Cprotein–C or Cprotein=C bonds

Transition metal (TM) catalysis has revolutionized organic synthesis with the ability to tune reactivity by careful choice of metal, ligand and reaction conditions, allowing the generation of previously inaccessible carbon and heteroatom-containing scaffolds. Many of the factors that make such reactions appealing to the synthetic chemist also make them attractive for protein modification115. Such reactions are often associated with excellent functional group tolerance and high yields under mild conditions, while the reactive handles utilized are often inert in biological systems. However, restrictions such as a need to proceed efficiently at low protein loadings, solely in aqueous media, and to tolerate potential nonspecific binding to the multitude of possible Lewis-basic residues on protein surfaces, until recently, hindered the use of such reactions for site-specific protein modification.

Arguably, the most widely used TM-catalysed reactions in organic synthesis are the series of palladium-catalysed sp2–sp2 coupling reactions between aryl/alkenyl halides and a variety of coupling partners such as boronic acids (Suzuki–Miyaura), alkenes (Mizoroki–Heck) and alkynes (Sonogashira). Early examples of couplings on short synthetic peptides required high temperatures, or the presence of organic solvents, yet demonstrated tolerance of Pd towards some amino-acid functional groups116,117. Despite the UAA p-iodophenylalanine having been incorporated into proteins by amber-stop codon suppression and proposed as a Pd coupling partner as early as 2002 (ref. 118), it was not until 2006 that this was partially realized by Kodama et al. Both Heck and Sonagashira reactions were demonstrated, albeit in low yields (2% Heck, 25% Sonagashira), representing early examples of Pd-catalysed couplings on polypeptidic substrates119,120. Brustad et al.121 then went on to demonstrate that p-boronophenylalanine could be used to undertake Suzuki couplings, although again low yields (30%) and high temperatures (70 °C) that caused protein denaturation limited the usefulness of the reaction. It was not until 2009 that Chalker et al.122 demonstrated the first efficient Pd-mediated reaction on a protein, through the discovery of a water-and-air-stable ligand, 2-amino-4,6-dihydroxypyrimidine (ADHP, L1), for undertaking Suzuki–Miyaura cross-couplings at 37 °C in water at pH 8 (Fig. 5a). This allowed a variety of boronic acids to be coupled to a model cysteine-linked aryl iodide, with the benefit that even hydrophobic moieties could be transferred to the protein surface due to the water-solubilizing effect of the boronate group. To generalize this reaction to genetically incorporated amino acids, Spicer and Davis123 (and later Liu et al.124) demonstrated that through amber-stop codon suppression, p-iodophenylalanine could be used as a reactive handle for protein Suzuki–Miyaura cross-coupling. During this work, previously hypothesized weak, nonspecific binding of TMs to Lewis-basic amino acids was encountered under some conditions, leading to ambiguity in reaction analysis; this was circumvented by the identification of a suitable palladium scavenger. The group of Davis has since gone on to demonstrate that the Suzuki–Miyaura reaction is applicable to couplings on the cell surface of E. coli, demonstrating a negligible catalyst toxicity125, and that the coupling of carbohydrate–boronic acids to cell surfaces can be used to mimic glycoproteins in a cellular synthetic glycocalyx125,126.

Figure 5: TM-mediated protein chemistry.
figure 5

Use of (a) Suzuki122,123 and Sonagashira couplings127, and (b) olefin metathesis,131,135 for protein modification.

Although ADHP is an efficient catalyst for undertaking Suzuki–Miyaura cross-couplings, it can be less effective for other Pd-catalysed reactions. Li et al.127 have since shown that by simple methylation of the ligand, the same catalytic system could be used to promote Sonogashira reactions (L2 in Fig. 5a), while the minimal motif guanidine-based ligands (L3 and L4 in Fig. 5a) can significantly enhance the rate of Suzuki reactions relative to ADHP128 and allow efficient couplings even at low stoichiometries and concentrations suitable when labelling with scarce reagents129. It was also shown that PEG chains offer new reactivity and a significant enhancement in rate as self-liganding (internally chelating) boronic acids when used with Pd(OAc)2 to give a high-yielding site-specific PEGylation of proteins128. This self-liganded effect has more recently been exploited by Li et al.130, who found that PEG-conjugated fluorophores with Pd(NO3)2 could efficiently catalyse the Sonogashira cross-coupling, even intracellularly, in E. coli and Shigella.

Olefin metathesis has also found recent application in the site-selective modification of proteins, due to the discovery by Lin et al.131 that allyl sulfides are privileged substrates for undertaking aqueous cross-metathesis with Hoveyda–Grubbs II catalyst, via a proposed sulfur-relayed mechanism (Fig. 5b). The subsequent use of a variety of olefinic amino-acid side chains containing allylic heteroatoms suggested a breadth for this allylic chalcogen effect132. This allowed the installation of a number of olefin substrates including PEG and allyl glycosides at an S-allyl cysteine residue, introduced into proteins via a number of chemical routes133. Determination of sensitivity to accessibility, self-metathesis and reagent reactivity has delineated predictive rules for this protein reaction134. Moreover, tuning of heteroatom (S→Se) in Se-allylselenocysteine led to a significant increase in reaction rate and expanded substrate scope135; this was applied to a chemically controlled ‘write-read-erase’ histone protein modification cycle.

A further example of intriguing TM catalysis was first reported by Antos and Francis136, utilising rhodium-generated carbenoids formed from diazo reagents for modification of tryptophan residues. While this reaction initially required quite harsh acidic conditions (pH 3), it was subsequently found that this was primarily to denature early protein substrates and hence to expose the reactive tryptophan residue; conjugations at pH 6 are now possible137. Recent reports by the group of Ball138,139 have used rhodium-bound metallopeptides to catalyse modification of tryptophan by using a structure-directed approach. Despite this elegantly designed rate enhancement, the need for a highly specific interaction to direct the reaction will likely limit its general applicability. However, it represents an impressive example of molecular recognition to override inherent reactivity.

Given the strength and ubiquity of carbon–carbon bonds, there will be continuing utility in their formation, despite the potential of heteroatom–carbon bond-forming chemistry. We have focused in this section on metal-mediated processes since, thus far, they have dominated current strategies, yet it should be noted that other possible strategies exist that exploit non-metal-mediated processes, such as aldol140 or Wittig chemistry141, the use of (formal) cycloaddition chemistry or as the result of a relay from a prior bond-forming event (for example, Pictet–Spengler).

Manipulating carbonyls to Cprotein=N bonds

Despite being widespread throughout nature, the carbonyl groups of aldehydes and ketones are almost entirely absent from native proteins142. Yet, the diversity of unique chemistry that they can undergo in the presence of the natural functional groups of proteins makes them an attractive handle for protein modification. They have found particular use in the reaction of hydrazines and hydroxylamines to form hydrazones and oximes, respectively, under acidic conditions (in part due to reagent availability and ease of use). These reactions can be accelerated by nucleophilic catalysts such as aniline143.

To install aldehydes and ketones into proteins, a number of methods have been identified. Among the earliest was the discovery that periodate oxidation (cleavage) of N-terminal Ser/Thr residues led to a terminal aldehyde, which could then react selectively with a fluorescent hydrazine to allow site-specific protein tagging144. The group of Francis145 has utilized a biomimetic PLP-mediated transamination to generate N-terminal ketones. Investigation of the reaction conditions indicates that a range of amino acids are tolerated by this reaction146, allowing the selective modification of antibodies147 and filamentous phage148.

The genetic incorporation of a ketone-containing amino acid by amber-stop codon suppression was first reported by Cornish et al.149 via chemical acylation of a tRNA synthetase. This residue, once installed, again reacted selectively with a range of fluorescent hydrazines. The ketone amino acids p-acetylphenylalanine142 and m-acetylphenylalanine150 were subsequently incorporated into proteins, without the need for chemical acylation, in both E. coli142,150 and eukaryotic cells151. More recently, Huang et al.152 reported the incorporation of aliphatic ketone-containing amino acids that showed improved reaction kinetics, while diketone-containing amino acids have been reported to give increased stability in oxime products153. Alternatively, Carrico et al.154 have exploited a six-residue sequence tag that directs a natural formylglycine-generating enzyme in both prokaryotic and eukaryotic cells.

The subsequent formation of hydrazones and oximes from these carbonyls has found widespread use in the conjugation of functional handles and probes to proteins. For example, hydroxylamines have been used to generate glycoprotein mimics155, to label G-protein-coupled receptors156, antibodies157 and therapeutic proteins for increased pharmacokinetics6, and for dual protein tagging and formation of bifunctional antibodies158,159. However, despite its widespread use, the reactions of aldehydes and ketones suffer from a number of key drawbacks. Most importantly, due to the presence of a range of carbonyl-containing substrates in cells, this chemistry is not necessarily suitable for in vivo applications. In addition, the hydrazone and oxime linkages are inherently unstable, leading to hydrolysis over the course of hours, particularly under acidic conditions, although hydroxylamines are reported to generate more stable linkages160. To avoid such instability, Sasaki et al.161 recently reported the use of a modified Pictet–Spengler reaction for aldehyde modification (which ultimately leads to the subsequent creation of C–C and C–N bonds), with Agarwal et al.162 subsequently reporting greatly improved reaction kinetics. Although the fundamental limitation of a lack of bioapplicability still remains, the facile use of aldehyde/ketone chemistry remains an attractive tool for in vitro modifications.

Cprotein–S/Se bonds

The UAA dehydroalanine (Dha) can be used as a Michael acceptor and has found extensive use in protein modification, reacting rapidly with sulfur nucleophiles to generate alkyl cysteine analogues, offering an electrophilic alternative to nucleophilic reaction of cysteine3. This is particularly useful in examples where use of electrophilic alkylation of Cys is difficult to control or where appropriate electrophiles cannot be generated, and leads to a greater level of selectivity. Dha can be accessed via a number of routes: elimination of active-site serines, the oxidative elimination of unnatural selenocysteine amino acids163,164 or through the milder oxidative elimination of cysteine with sulfonylhydroxylamine reagents165. All can prove efficient but occasional side reactions in all recently prompted the development of a bis-alkylation method: cyclic sulfoniums can be eliminated to form Dha under strikingly mild conditions166.

The addition of functionalized thiols to Dha takes place rapidly and selectively under mild conditions. This reaction has been used to install a number of thioether mimics of natural protein modifications such as lipidation, glycosylation, phosphorylation and lysine methylation/acetylation164,165,167,168, as well as installing reactive handles for further modification such as S-allyl cysteine for olefin metathesis.131 These reactions typically (but not always) proceed with low substrate control in their diastereoselection and so a mixture of D/L-epimers is produced at the site of modification. Dha also provides a viable method for chemically creating selenocysteines in proteins135. Although only shown for a single example (Se-allyl cysteine), the discovery of conditions for creating suitable Se nucleophiles for this addition may enable broader methods.

While the olefin in Dha displays unique conjugate electrophile reactivity, isolated olefins in, for example, homoallylglycine can serve as useful UAAs for modifications using radical chemistry (see Box 1)169.

Outlook and future directions

The introduction of site-directed gene mutagenesis as a powerful method for altering protein structure at a genetic level revolutionized the study and application of proteins. The ability to switch between natural amino acids at virtually any desired residue site in a recombinant protein has allowed unparalleled progress in the manipulation of proteins for scientific discovery. Yet, this ability to access and alter functionality is limited by the 20 typical proteinogenic amino acids and a limited palette of chemical functional groups. As such, there is a powerful need for chemical modification of proteins and the installation of non-natural functionality as a strategy for more free-ranging protein synthesis or design. New methods should aspire to the widespread success and applicability of gene mutagenesis as a tool in the biological sciences12.

Over the past 15 years, the field of chemical protein modification has been dramatically revitalized, from one that focused on the use of natural cysteine and lysine residues to one that now utilizes a wide range of chemical handles, coupling partners and conditions, many of which are mutually compatible (‘orthogonal’) not just with (to) each other, thereby allowing multiple modifications to be undertaken, but also with (to) living systems, allowing them to be utilized in vivo170.

The use of such reactions is only beginning to be exploited as improvements in selectivity, kinetics, compatibility and ease of use are made. The potential applications of modified proteins are virtually limitless, whether be it for the in vivo tracking of dynamic processes, the conjugation of therapeutic agents, the elucidation of biosynthetic/metabolic pathways or the use of modified protein-based materials with novel functionality and structure. The development of these reactions has been reviewed here from the viewpoint of applicable chemistry, rather than the biological uses of modified proteins, and it is likely that with an expanding toolkit of chemical reactions for installing a range of modifications, chemists and biologists will discover exciting applications as yet unexplored. This could truly become an unlimited form of Synthetic Biology.

Since such protein methods have become a ‘hot-topic’ over the past decade, it has been easy to forget some critical principles in developing chemical reactions for modifying proteins. Sadly, increasing numbers of reports are now simply undertaking reactions ‘because one can’ with little-or-no regard for improving (either strategically or functionally) on the plethora of reactions already available. We see a particular need to develop chemical reactions that allow: the selective installation of a desired functionality in a manner that allows the mimicry of a natural modification, the rapid labelling of a biologically relevant site or new in vivo reactivity. We see less value in the discovery of those reactions that may not have been performed on a protein previously but in fact offer no benefit compared with the existing ‘toolkit’. Put more succinctly, there is not necessarily a need for new chemistry for modifying proteins; there is a need for better chemistry for modifying proteins. It is important to remember that proteins should not be seen merely as a substrate for undertaking a reaction. Rather, this chemistry should be increasingly seen as a method for testing a hypothesis, developing the technology or creating a functional probe.

To this extent, a number of key challenges can be envisaged that must be addressed in future developments in the field. While the reactions discussed here represent useful discoveries that will undoubtedly make a large impact on protein science, they are not without their limitations. A reaction that combines the ‘selling points’ of each must be seen as highly desirable: the mimicry and minimal linker afforded by cysteine, the ease of modification at lysine, the bio-orthogonality of the Staudinger ligation, the unparalleled speed of cycloadditions, the tunability of transition metal-mediated reactions, the potential reversibility and switching of carbonyl chemistry. A reaction that combines these favourable characteristics may represent an ideological end goal in the development of new chemistries. When judged by these criteria of utility, protein chemistry is still in its infancy. As it matures, it will likely revolutionize the molecular analysis of Biology.

Additional information

How to cite this article: Spicer, C. D. and Davis, B. G. Selective chemical protein modification. Nat. Commun. 5:4740 doi: 10.1038/ncomms5740 (2014).