The process of polypeptide elongation occurs by the sequential addition to the growing polypeptide chain of a single amino acid brought to the ribosome by a molecular complex with three constituents: aminoacyl tRNA aa-tRNA , elongation factor Tu EF-Tu , and GTP a so-called ternary complex bearing the correct cognate anticodon for the mRNA codon in the ribosomal A site Figure 1.
There are three general steps to the elongation cycle: tRNA selection, peptidyl transfer, and translocation. Accommodation is the movement of the amino acid portion of the aa-tRNA in the A site closer to the peptidyl tRNA in the P site for peptidyl transfer to occur [ 1 ]. The elongation cycle continues as the codon in the newly vacant ribosomal A site awaits the next tRNA arrival.
Interestingly, the ribosomal A site is likely seldom vacant and is instead sampled by cognate, near-cognate, and non-cognate tRNAs [ 17 ].
The terms, near-cognate and non-cognate, have conventionally been assigned to tRNAs which have single or multiple base mismatches with a given codon, respectively. However, Plant et al have challenged that a functional definition, namely the ability to form a minihelix with the codon in the ribosomal A site, better distinguishes a near- from a non-cognate [ 18 ]. It is important to note, that as peptidyl transfer and translocation occur much faster, tRNA selection appears to be the rate limiting step of ribosomal progression along the mRNA during polypeptide elongation [ 10 , 19 , 20 ].
Independently, two groups have observed large rate differences in the steps of polypeptide elongation by performing high resolution kinetic studies of the bacterial ribosome in vitro. They have determined that the rate of ternary complex GTPase activation in response to codon recognition is the rate limiting step of peptidyl transfer.
They found that GTP hydrolysis of the cognate ternary complex occurs fold [ 16 ] or approximately fold [ 21 ] faster than the near-cognate one base mismatch in 1 st codon position in these studies.
The other measurable rates were similar between cognate and near-cognate tRNAs, with the exception of a faster dissociation of the near-cognate during codon recognition [ 16 ]. Modeling of this kinetic data agrees with a competition for the A site whereby the binding and rejection of a number of near-cognate tRNAs, prior to the binding and accommodation of the cognate tRNA, delays the rate of translation [ 17 , 22 ]. The faster rate of cognate anticodon recognition combined with the rapid rejection of the near-cognate anticodon emphasize the role of tRNA selection in determining the rate of polypeptide elongation.
Since the binding of the aa-tRNA-containing ternary complex to the ribosome is essentially a binding reaction, concentration of the cognate tRNA for a particular codon should influence the rate at which the ribosome translates that codon.
This has indeed been shown by examining the correlation between codon translation rates and cognate tRNA concentrations [ 10 ]. Increasing the concentration of tRNA Trp four-fold by overexpression results in a three-fold increase in translation rate of the corresponding codon, UGG [ 8 ] tryptophan is one of only two amino acids which are encoded by a single codon. Most codons can be read by more than one isoacceptor tRNA due to Wobble pairing in the third position of the codon and first position of the anticodon [ 4 ].
Conversely, a single tRNA anticodon can decode various synonymous codons, and these can vary in translation rates. Similar to GAA and GAG, other in vivo measured translation rates of synonymous codons read by identical aa-tRNAs show that those with Watson-Crick pairing in the wobble position are translated faster than those with wobble pairing in every instance [ 8 , 9 ].
When more than one codon is translated by a single tRNA, the only difference is the nature of the base pairing and base stacking between the third codon position and the first anticodon position. The different rates observed clearly demonstrate that base pairing in the wobble position, in addition to tRNA concentration, determines codon translation rate. Recent ribosomal profiling has solidly corroborated this effect on in vivo rates in C. This result agrees well with what has been reported previously in E.
What might be the advantages that organisms derive from being capable of modulating their translation elongation rates? It is well known that the process of translation is not absolutely accurate [ 24 ]. Yet, various mutations in the bacterial translational apparatus can result in so-called hyperaccurate protein synthesis, where significantly fewer mistakes are made during translation [ 24 ]. However, these mutations result in considerably slower rates of polypeptide elongation.
In other words, in these mutants, accuracy is achieved at the expense of speed. Thus, it can be concluded that wild type polypeptide elongation rates are a compromise between accuracy and velocity. In circumstances where nutrient availability is limited and growth is restricted , the cell might need to decrease the production of proteins, yet ensure that those that are synthesized are relatively error free. In opposite circumstances, cells might take advantage of ample nutrients and not be gravely affected by amino acid misincorporation, as errors would be diluted as cells grow and divide.
As discussed in the above section, it is likely that polypeptide elongation rates depend both on the nature of the anticodon-codon interaction as well as actual aa-tRNA concentrations. The concentrations of tRNA molecules have been experimentally determined for several organisms and cell types, although these measurements do not distinguish between charged and un-charged tRNAs.
Regardless, the concentration of particular sets of tRNAs has been shown to correlate relatively well with corresponding tRNA gene numbers. For example, in E. In the eukaryote S. Additionally, it is known that there exists some variation in expression of tRNA as a function of growth conditions in both bacteria [ 28 ] and unicellular eukaryotes [ 29 ].
Regardless of these caveats, tRNA gene number has been largely accepted as a means to estimate relative aa-tRNA concentrations in multiple organisms. It is important to note that correlations have indeed been found between tRNA gene number and the nonrandom use of synonymous codons in highly expressed genes in several unicellular organisms.
This has led to the hypothesis that in organisms whose growth rates are largely dependent on the overall rate of protein production, the translation process has been accelerated, and thus optimized, by evolving codon usage in highly expressed genes to match the most abundant tRNAs [ 11 ].
In other words, evolving highly expressed genes to largely contain codons read by abundant tRNA would increase the rate of essential protein production and thus increase growth rates in these organisms.
Genes with low expression in these organisms, such as those encoding regulatory proteins, were found to be encoded by less biased usage of optimal and non-optimal codons. These results have led to the generalized assumption that frequently used codons are translated fast, and infrequently used codons are translated slowly across organisms, even though the inverse has been shown to occur for some codons [ 8 ].
For example, highest codon usage frequency and highest tRNA gene number agree only in 11 codons in human and 6 codons in E. Furthermore, in most organisms, there are examples in which the most frequently used codon for a particular amino acid across the genome has zero Watson-Crick-decoding tRNA genes and thus must rely on a tRNA that decodes via non-Watson-Crick interactions, which, as mentioned above, is generally slower.
Furthermore, there are several instances where there are vastly more tRNA genes for a particular codon, but the frequency with which that codon is used is only slightly higher for example, the codons for Asn in humans, Figure 2. The original studies derived codon frequencies from only highly expressed genes , whereas modern databases such as the one utilized to generate Figure 2 tabulate frequencies based on the total appearance of codons across entire genomes.
There would undoubtedly be more agreement between high tRNA abundance and high usage frequency for E. Differences in tRNA gene content across organisms. Codons boxed in blue denote tRNA genes often absent in bacteria and eukaryotes, while codons boxed in green denote genes mostly absent only in bacteria.
Actual tRNA gene numbers and codon usage frequencies for humans and E. Numbers in red color denote most frequent codons for which there is no cognate tRNA gene in each organism. Data were were obtained from [ 7 ]. The correlation between tRNA abundance and codon usage is maintained for the previously discussed glutamate codons of E. However, in the same study, the in vivo translation speeds of one frequent codon, CCG Pro , and one rare codon, CGA Arg , were translated at very similarly slow rates.
This is likely due to the low availability of tRNAs to decode these codons there are 1 and 0 cognate tRNA genes corresponding to these codons, respectively; Figure 2. These findings and others of the time [ 11 , 30 , 31 ] cultivated an increased emphasis on biased codon usage frequencies in translation speed and evolution studies.
Absolute codon frequency is the number of times a given codon is present in a given gene, set of genes, or an entire genome [ 33 ]. An important caveat of this method is that individual amino acids are not equally present in the coding sequences and may introduce an amino acid-related bias in the observed codon usage frequency patterns.
In order to represent codon usage bias independently of amino acid bias, relative frequencies can be calculated. Relative codon frequency is the ratio that results from dividing the absolute codon frequency of a particular codon by the sum of the absolute codon frequencies of all codons in a synonymous block [ 32 ].
Another codon usage metric, Relative Synonymous Codon Usage RSCU [ 35 ], takes the calculation one step further by normalizing equal codon usage frequencies within a synonymous block to 1. As stated above, highly expressed genes in bacteria and unicellular eukaryotes tend to be encoded by frequent codons.
However, there is no evidence for such bias in the highly expressed genes of vertebrates [ 11 , 14 ]. Interestingly, in C. Therefore, the adequacy of codon bias for relative translation rate predictions is limited to highly expressed genes in some unicellular and simple multicellular organisms.
To become biologically active, the great majority of proteins must fold into precise three-dimensional conformations. Invaluable insights regarding how protein chains acquire their so-called native states have come from in vitro refolding experiments [ 36 ] and computational biology approaches [ 37 ]. These studies have demonstrated that the amino acid sequence of a protein encodes in its entirety the necessary information to attain its native state.
The concept of codons was first described by Francis Crick and his colleagues in During the same year, Marshall Nirenberg and Heinrich Matthaei performed experiments that began deciphering the genetic code. Following this discovery, Nirenberg, Philip Leder, and Gobind Khorana identified the rest of the genetic code and fully described each three-letter codon and its corresponding amino acid.
There are 64 possible permutations, or combinations, of three-letter nucleotide sequences that can be made from the four nucleotides. Of these 64 codons, 61 represent amino acids, and three are stop signals. Although each codon is specific for only one amino acid or one stop signal , the genetic code is described as degenerate, or redundant, because a single amino acid may be coded for by more than one codon.
The corresponding wetware of DNA and the cell allows deciphering of independent messages. Layer 1 is the linear sequential prescription of amino acids that defines the protein's primary structure amino acid sequencing. This is represented as Domain X. Rule 1 amino acid mapping is applied to identify all codons for a given AA and is shown in Domain Y.
Layer 2 is the TP sequence symbolically representing information necessary to modulate protein folding during the elongation process. Rule 2 defines the various TP commands as a function of Domain Y. This result is represented as Domain Z, the set of TP commands. Given a TP requirement results in the final selection of codons for a given protein prescription and is shown in Domain A. The superposition of layers 1 and 2 on the DNA strand contributes to the protein synthesis process.
This must occur within the same nucleotide sequence space of the ORF gene element without interfering with each other. The arrangement of nucleotides that code the sequential arrangement of amino acids appears as prescribed data to the embedded algorithm i. The arrangement of nucleotides that define the message of TP appears as another set of formal instructions and controls not mere physical constraints.
Sequencing is arbitrarily selected and rule-based, free from the constraints of initial conditions and law. Figure 2. DNA codon selection that illustrates the multi-layer requirements. Requirement 1: Amino acid selection to prescribe protein requirement. Requirement 2: Folding requirement in terms of the pausing of the translation process. In computer science, a thread is the smallest sequence of programmed instructions that an operating system scheduler can manage independently.
Multi-threading is a widespread executing and programming model that allows multiple threads to co-exist within the context of a single process. Multi-threads are able to share the resources of a given process while executing concurrently. We posit that the operation of the ribosome can be viewed as a type of physical multi-core processor in terms of concurrently executing amino acid elongation and pausing control to enable protein folding.
Within the context of the protein synthesis process, we posit that the two independent threads of information co-exist within the same nucleotide sequence because of the redundancy of the genetic code as shown in the previous sections.
Specifically, having multiple codon codes prescribing the same amino acid allows any one of those redundant codonic prescriptions to have an alternate coded meaning which can produce a completely different biofunction. Because of the contingency of the glycine representative codon code, we can assign alternative functions to any four of the four codons thread 2. In conventional computers, multi-threaded code is written using the same machine language. When only a single processor is used, the code is written in nested form allowing a scheduler to determine when parts of the nested code are to be executed.
Only one nested thread can be executed at a time. Thus, multiple threads are executed in series scheduled one thread at a time. This permits multi-threaded computation. Using multi-cores multiple CPU's , the code must be written such that the scheduler can parse the code to each CPU for parallel execution. However, our biological system uses the same nucleotide tokens residing in the same space that contain multiple meanings. We want to distinguish and emphasize here that we are not talking about nested code, but coincident code , i.
Deductively, these messages cannot use the same language because they occupy the same space and tokens bases. Different languages or mappings must be used to distinguish the two threads. In order to interpret these co-existing multi-threaded languages, there must be two independent decoding mechanisms multi-cores that can read and decipher the linear sequence of codons carried by the mRNA.
Such a mechanism must be synchronized with the same starting point of both messages. This places a further constraint on the control methodologies used to synchronize the start and end points of both independent messages in the DNA.
Having the two messages out of sync with each other would be analogous to an out-of-frame reading error. The way in which the prescriptive information space is compressed and utilized may represent an optimized approach of data compression. This represents a departure from the conventional way multi-threading is done in the computer world.
We now posit a model that explains the ribosome behavior regarding the decomposition of the genetic code presented in the input mRNA. The ribosome functions as a multi core processing protein synthesis machine. It is multi core in the sense that it can simultaneously process two threads of encoded information independently as discussed in the Multi-thread section above. Core A is defined as the machinery governed by the rules of codon to amino acid mapping.
These mappings are determined by the tRNA definitions which themselves are governed by rules outside of the ribosome.
These two cores operate independently of each other, meaning that there is no communication or feedback between these processes i. Despite this, the ribosome acts holistically in concert with the protein synthesis process to produce a prescribed nascent protein. Since the two threads work independently and blindly with respect to the nascent protein, this suggests that the information to co-actively synchronize and coherently adjudicate these two threads must originate outside of the ribosome.
The duality in the coding function acts to remove the redundancy in the genetic code when viewed holistically. We now posit a model of the how multi-dimensional information consisting of both translation pausing time data and DNA to amino acid mappings could be accomplished. The requirements in this example below are assumed to be known a priori. A simple example of the selection process is shown in Figure 3 below for a simple glycine to glycine sequence.
It is required that the glycine amino acid must follow the current glycine amino acid in our simple ORF example. The DNA sequence process would begin by selecting the codon representations for both the glycine-glycine pair. Next, the process would filter the codon sets for these two contiguous representations for those codons that meet the timing pause requirement. Filtering through the redundant codons results in a selection of candidate codons as shown in row four post codon selection.
In general there could exist additional filters which could further discriminate the post codon selection. Figure 3. Top row requires the given partial amino acid series for a given protein left to right.
The second row specifies the standard codon map for its particular amino acid. The pause requirement row specifies that a translation pause is necessary for the glycine to glycine junction for proper pre-folding. The rules for pausing are invoked to filter the redundant glycine codons. The post codon selection row results from filtering the codon map above for specified TP condition. The final codon selection row is the final codon selection from the row immediately above may or may not be a function of other filters.
This results in writing the ORF of a hypothetical gene. The question arises as to how codon selection would precede in a continuous chain of a prescribed amino acid representation for a given protein. Figure 4 illustrates the selection process for such a case. In this hypothetical case, it is required that within the ORF of a hypothetical protein, the following amino acids are to be sequenced with the following requirement: … Glycine 1 Glycine 2 Arginine 3 Glycine 4 Serine 5 … in a consecutive order as shown in row 1 the subscript denotes the sequential amino acid position within the ORF.
The codons for each amino acid are assembled and shown in row 2. A requirement exists for a set amount of pausing for each amino acid boundary defined as [glycine 1 glycine 2 ] followed by [glycine 2 arginine 3 ] followed by [arginine 3 glycine 4 ] and [glycine 4 serine 5 ] as shown on row 3.
The result of filtering the codon map of row 2 results in a subset of codons shown in row 4. Notice that the selection of the codon for glycine2 is dependent on the next boundary condition immediately following the current boundary condition. This dependency establishes an iterative selection process. A codon pair that meets the current boundary condition may not reside in the solution space for the next boundary condition.
In other words, the selection of the hexamer sequence for a given boundary condition must be the prerequisite condition for the next boundary condition. This becomes a constraint in our selection process that must be accounted for. This results in an iterative process to successfully filter this chain of code. Finally, allowing for additional filtering effects, undefined here are shown in row 4, the final selection of codons is written into the DNA ORF.
Figure 4. The pause requirement row specifies that a translation pause is necessary for the glycine to glycine junction, the glycine to arginine junction, arginine to glycine junction, and glycine to serine junction for proper pre-folding. We posit that the TP effect allows time for upstream mechanisms to control the folding process of the elongating protein. The regulation processes correct for misfolding due to external environmental effects.
Heat stress is a prime example. Examples of the upstream products were discussed in the previous section Mechanistic view. They include in eukaryotes, chaperones, binding proteins and tunnel interactions with ribosome associated factors. This ribosome architecture works in conjunction with protein signaling feedback to control translational folding.
In prokaryotes, co-translational folding involves trigger factors and chaperones that are coupled to codon mappings that initiate TP allowing the up-stream folding process to function. We posit that misfolding of the nascent protein or un-regulated chaperone could result without proper TP functionality. Point mutations, and their effects on redundant codons, can be seen when such mutations affect the timing pause as dictated by the rules we posit for TP pausing.
This would occur without affecting the proper amino acid prescription. Redundancy in the primary genetic code allows for additional independent codes. Coupled with the appropriate interpreters and algorithmic processors, multiple dimensions of meaning, and function can be instantiated into the same codon string. We have shown a secondary code superimposed upon the primary codonic prescription of amino acid sequence in proteins. Dual interpretations enable the assembly of the protein's primary structure while enabling additional folding controls via pausing of the translation process.
TP provides for temporal control of the translation process allowing the nascent protein to fold appropriately as per its defined function. This duality in the coding function acts to reduce the redundancy in the genetic code when viewed holistically. The ribosome can be thought of as an autonomous functional processor of data that it sees at its input. This data has been shown to be PI o in the form of prescribed data D'Onofrio et al. Choices must be made with intent to select the best branch of each bifurcation point, in advance of computational halting.
The arrangement of codons has embodied in it a prescribed sequential series of both amino acid code and time-based TP code necessary for protein assembly and nascent pre-folding that defines protein functionality. We have shown that the TP coding schema follows distinct and consistent rules.
We have demonstrated that these rules are logical and unambiguous. This conditional selection is shown in the algorithm section. Such an iterative process nicely lends itself to an algorithmic process should geneticists experiment with writing their own genetic code. Understanding the dual mappings between the amino acid and TP code will allow algorithmically computed solutions to simultaneously fulfilling the this dual requirement using the same written code.
It has been shown that both the genetic code and TP code are decoupled allowing simultaneous decoding and dual functionality within the ribosome using the same alphabet nucleotides but different languages. With other languages such as French, we share the same alphabet, but employ different semantic and grammatical rules. The same is true of the codon alphabet being used by the cell to generate more than one language. The TP code exhibits distinct meaning in relation to mappings between codons and pausing units.
The TP code also exhibits a syntax or grammar that obeys strict codon relationships that demonstrate language properties. Because of the redundancy of the genetic code, it could be argued that the TP language is a subset of the genetic language. The subspace of the TP language resides, and thus appears to have a dependency on, the primary genetic code. Within this subspace, however, we argue that the TP language is decoupled from and remains independent of the protein-coding language.
Hypothetically, in a non-redundant codon to amino acid mapping, once the codon sequence is selected, thereby defining the prescribed amino acid chain, this prescription would preclude additional information from occupying the same space ORF to prescribe TP. The only way the physical constraints could be removed from formal PI o instantiation of additional TP controls using the same code would be to build redundancy into the genetic language.
Thus, having redundant contingency in the genetic code is both a necessary and sufficient condition to represent multiple languages using the same alphabet of the genetic code. More research is needed to determine a higher level of fidelity in regards to specific timing of pauses relative to TP codons and nascent folding in order to understand its impact on disease.
According to Shalgi et al. Misfolding stress, and the heat shock response pathway in particular, play specific developmental roles, and are implicated in a variety of diseases. Upregulation of chaperones is frequently observed in cancer. Chaperone inhibitors hold promise as antitumor agent Whitesell and Lindquist, ; Calderwood et al. Overexpression of eukaryotic proteins with strong internal SD sites would sequester ribosomes and compromise protein yield Li et al.
David J. D'Onofrio conceived the overall concept including generation of figures and tables. D'Onofrio managed iterative refinement of this review from input provided by co-author. David L. Abel provided major insight into subject matter contributing to the technical content and refinement of the manuscript.
All authors contributed to writing the manuscript. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abel, D. Palyi, L. Caglioti, and C. Zucchi Modena: University of Modena , 4. Complexity, self-organization, and emergence at the edge of chaos in life-origin models. The biosemiosis of prescriptive information.
Semiotica , 1— CrossRef Full Text. The capabilities of chaos and complexity. The GS Genetic selection principle. Constraints vs. Open Cyber. Is life unique? Life 2, — Seckbach and R. Gordon Dordrecht: Springer , — Three subsets of sequence complexity and their relevance to biopolymeric information.
Self-organization vs. Life Rev. Agashe, V. Function of trigger factor and DnaK in multidomain protein folding: increase in yield at the expense of folding speed. Cell , — Alp, K. A comparison of sign and symbol their contents and boundaries. Andersson, S. Codon preferences in free-living microorganisms.
Pubmed Abstract Pubmed Full Text. Arrowsmith, C. Epigenetic protein families: a new frontier for drug discovery. Nature Rev. Drug Discov. Babu, M. Versatility from protein disorder. Science , — Barbieri, M. BioSemiotic Research Trends.
Barbieri Dordrecht: Springer , — Biosemiotics: a new understanding of life. Naturwissenschaften 95, — Bedau, M. An aristotelian account of minimal chemical life. Astrobiology 10, — Benner, S. Defining life. Bopry, J. The give and take between semiotics and second-order cybernetics.
Semiotica , 31— Brier, S. Cybersemiotics: a transdisciplinary framework for information studies. Biosystems 46, — Bucher, E. The return of Lamarck? Calderwood, S. Heat shock proteins in cancer: chaperones of tumorigenesis. Trends Biochem. Carothers, J. Informational complexity and functional activity of RNA structures.
Chen, H. Determination of the optimal aligned spacing between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia coli mRNAs. Nucleic Acids Res. Chiba, S. Recruitment of a species-specific translational arrest module to monitor different cellular processes. Church, G. Toward synthetic life: scientists create ribosomes — cell protein machinery. Science Daily. Code, C. Dictionary of Engineering [Online].
Computer, C. Computer, P. What is Computer Programming? Contessoto, V. Analyzing the effect of homogeneous frustration in protein folding. Proteins 81, — Craig, J.
Epigenetics: a Reference Manual Book. Norfolk: Caister Academic Press. Crombie, T. Protein folding within the cell is influenced by controlled rates of polypeptide elongation. Derrien, T. D'Onofrio, D. Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems.
Duan, Z. A three-dimensional model of the yeast genome. Nature , — El-Hani, C. A semiotic analysis of the genetic information system. Evans, M. Cotranslational folding promotes beta-helix formation and avoids aggregation in vivo. Frank, J.
0コメント