Cracking the Genetic Code

in #steemstem6 years ago (edited)

What we are is largely due to what we are made of, in a strictly genetic material sense. Every day in a human adult's life 50-70 billion cells die due to apoptosis, nature's way of replacing cells which aren't in good shape to do whatever they are supposed to. Most cells I have today might not exist by next year, nonetheless I continue to exist in entirety ! So without getting very philosophical, my genetic material which hardly changes throughout my life is the best proof of identification I will ever have. People for very long time knew that there is something that parents pass on to their children which makes them similar to their parents, the stuff that makes sure that the apple doesn't fall very far from the tree.

It took a lot of great men and women to even figure out what's that stuff that makes the apple to not fall far from the tree. Finally it was boiled down to two major suspects, Proteins and DNA. Though in 1944, Oswald Avery, C. M. MacLeod, and M. McCarty came up with evidence to establish DNA as the genetic material, they couldn't convince a lot of protein "fanboys" and "fangirls". Finally in 1952, Alfred Hershey and Martha Chase equipped with Geiger Counter, Blender , Bacteriophages and some radioactive Phosphorous and Sulphur , once and for all established that DNA is the genetic material. Hershey-Chase experiment is one of those really kickass biological experiments one should definitely read about !

Everyone wanted to understand DNA better, considering DNA is the blueprint of an individual. DNA has valuable information coded in it, which is utilised by certain "machines" inside the cell to produce the proteins that are required to accomplish a lot of tasks a living organism has to accomplish on a regular basis in order to survive.
cell.png

Fig-1: Caricature depicting the synthesis of m-RNA which is later utilised by ribosome and t-RNA to produce proteins (source: figure drawn by me using Inkscape)

In 1953 , J.D. Watson and F.H.C. Crick discovered the double-helix structure of DNA. Other people like Rosalind Franklin had contributed significantly to the experiments that gave out the X-ray diffraction pattern of DNA.
watson_crick.png

Fig-2 : Nature article dated April 25, 1953 in which Watson and Crick published their discovery/deduction. (source: Watson, J. D., & Crick, F. H. (1953). Molecular structure of nucleic acids. Nature, 171(4356), 737-738.)

Years following the discovery people wanted to know the language in which the blueprint of life was written and how it is read by the machines and how exactly the information written in the language of DNA is translated to the language of Proteins. Often the simplest of questions are the hardest to answer and these questions were no exceptions.

In 1954 James D. Watson and renowned physicist George Gamow formed a scientific "gentleman's club" called the RNA Tie Club. It included 20 scientists from various backgrounds (Ex: Feynman, Teller) and the goal of the club was to "solve the riddle of the RNA structure and to understand how it built proteins". Each member was provided with a black wool-knit tie with a green and yellow RNA helix emblazoned on it. Each of the 20 members were allotted one of the 20 amino acids and they were supposed to have received a tie pin with three alphabets indicating their amino acid. They met twice every year but they communicated within them through regular mails.
RNA_Tie_Club_Members.png

Fig-3 : RNA Tie Club members (source: https://en.wikipedia.org/wiki/RNA_Tie_Club)

One can see from Fig-1 that all the machinery required to produce protein based on the blueprint provided by mRNA exists outside the nucleus, that is in the cytoplasm. The idea/technique of cell-free system is an ingenious one. Often in order to study a particular cellular process, one doesn't need to have to consider an entire cell but just subcellular fractions that provide the molecular machinery required for that specific cellular process. In case of protein synthesis one needs to basically consider the stuff in the cytoplasm to reproduce the entire protein synthesis activity as long as the stuff includes raw ingredients that are produced in the nucleus.

Marshall Nirenberg, working at National Institute of Health (NIH) was interested in the problem pertaining to the flow of information from nucleic acid (RNA,DNA) to protein. He wanted to understand the factors affecting the rate of cell-free protein synthesis. DNase, short for deoxyribonuclease is an enzyme that can degrade DNA, several contemporary scientists had observed that DNase inhibited the in vitro protein production. Or in other words, if you take the stuff inside the cell in a test tube, which contains all the stuff required to make proteins and add DNAase to that test tube it screwed up the protein synthesis, leading many to conclude that DNA templates have some crucial role to play in protein synthesis. Nirenberg with the help of his colleague Heinrich Matthaei, set out to identify the role of RNA templates in protein production. They observed that adding externally prepared RNA strands to the stuff in the test tube led to protein production but also noticed that there was protein production even in the absence of adding RNA externally. This was expected since the stuff in the test tube itself had some amount of RNA which lead to protein production, hence they added DNase to make sure no further RNA is produced from the DNA that existed inside the test tube. After a while there won't be any RNA being produced inside the test tube and on addition of RNA templates externally, there was protein production.
nirenberg-dnase.png

Fig-4: Without external mRNA and in the presence of DNase the protein production saturates after a while (bottom most curve). Even in the presence of DNase external addition of mRNA leads to protein production (middle curve). In the absence of external mRNA and DNase there is protein production because of internally produced mRNA. (source: http://www.pnas.org/content/47/10/1588.long)

Proteins are essentially an heteropolymer/copolymer made out 20 different amino acids. Amino acids are the building blocks of protein and mRNA template provides the blueprint required for the protein construction. mRNA template is written in the language of nucleotides, which has 4 alphabets (A,U,G,C) and proteins are written in the language of amino acids which has 20 alphabets. Marshall Nirenberg wanted to know the way to translate a code written in the language of nucleotides to the language of amino acids. There are 4^2 = 16 possible two letter combinations using (A,U,G,C) and 4^3 = 64 three letter combinations. Since there are 20 amino acids, one can rule out the possibility of two letters (doublet codon) being used to represent an amino acid. Using 3 would leave you with surplus (64-20=44) and it has to be an integer, duh! Gamow and many others were convinced that it had to be 3 letters per amino acid. An interesting side note about the RNA "blueprint", the blueprint is actually heavier than the final product, which is generally not the case with the blueprints we come across in our day today life.

Simple calculation:

  • average molecular weight of a nucleotide = 330 D

  • average molecular weight of an amino acid = 110 D

  • 3 nucleotides code for 1 amino acid, hence

  • The final product is almost 10 (9) times as heavy as the blueprint !

Poly-U experiment :

Nirenberg and Matthaei added RNA templates that were just made out of one nucleotide, Uridine (U) and they noticed that every time poly-U template led to the production of a peptide chain which had one and only one amino acid, phenyalanine. In short, poly-U produced polyphenylalanine, every single time ! They also observed that the only single stranded poly-U produced this and whereas double-or triple-stranded poly-U produced nothing. This clearly established that RNA is the template for protein, residues of U correspond to phenylalanine in protein and the process of translation is affected both primary structure (just the sequence) and the secondary structure (linear, circular, single-stranded, double-stranded...). In essence the first triplet codon to be deciphered was UUU which coded for amino acid phenylalanine. In a similar way they also discovered that poly-C and poly-A yielded polypeptide chain with just proline and lysine, respectively. As long as one could synthesise RNA sequences which had all possible 3 codon combinations, determining the entire code should be easy but it wasn't very easy to synthesise these sequences!

H. Gobind Khorana :

Har Gobind Khorana working at University of Wisconsin, Madison had managed to come up with a chemical method to prepare any kind of polyneucleotide sequence. Let me quote him from his Nobel lecture,

"By using a combination of purely chemical methods, which are required to produce new and specified information, and then following through with the two enzymes, DNA polymerase and RNA polymerase, which are beautifully precise copying machines, we have at our disposal a variety of high-molecular-weight ribo-polynucleotides of known sequences. Mistake levels, if they occur at all, are insignificant"

Note : They are called ribo-polynucleotides since they used the RNA that forms the ribosomes as the raw material.

Next natural step to take was to synthesise all 64 possible ribonucleotides. In general a polynucleotide made of repeated dimer (AG, UA,CA,CG ...) can code for two different amino acids depending on how the sequence is read. Similarly a sequence made up of repeated trimer and repeated 4-mer can code for 3 and 4 amino acids, respectively. The below figure illustrates this simple fact,
Codons.png

Fig-5 : Polynucleotides made up of repeated AG (red) , repeated CGA (blue) and repeated UUAC (black).(source: drawn by me using inkscape)

So in general a polynucleotide sequence made up of repeated 2-mer, 3-mer and 4-mer yields dipetide, tripeptide and tetrapeptide sequences. But there were interesting cases when the polyneucleotide chains instead of forming polypetides chains yielded short amino acid sequence fragments and that allowed researchers to confirm the existence of codons which acted as the terminator codons, codons that acted as stop signals.
khorana_table6.png

Fig-6 : Different polynucleotide made of repeating tetramers and their corresponding final product. Poly-GUAA and Poly-AUAG failed to yield long chains since they coded for stop codons (source: https://www.nobelprize.org/nobel_prizes/medicine/laureates/1968/khorana-lecture.html)

Khorana summarises the results of these experiments nicely in his nobel lecture as,

"The results summarized above lead to the following general conclusions:

(1) DNA does, in fact, specify the sequence of amino acids in proteins and this information is relayed through an RNA. (This was the first time that a direct sequence correlation between DNA and a protein had been established.)

(2) All the results prove the 3-letter and non-overlapping properties of the code.

(3) Finally, information on codon assignments can also be derived from these results."

The above experiments on their own doesn't help one to uniquely assign the codons to amino acids. For example, nucleotide sequence containing repeated UC dimer yields dipeptide sequence made of serine and leucine but it doesn't tell you among UCU and CUC, which among them stands for serine and leucine. Remember that tRNA (transfer RNA) is a necessary tool/machine required for translation. tRNAs carry specific amino acids and add them to polypeptide chain, in the process of elongation. These tRNAs are very codon specific machines, there are tRNAs corresponding to each of the 61 (64-3) codons. The three stop codons do not have tRNAs corresponding to them. In fact they serve their role as stop codons due to this fact, when the stop codon is encountered by the ribosome on a mRNA strand, it keeps waiting for tRNAs that doesn't exist and after a while it gives up and they prepare themselves for dealing with some other mRNA strand. Everyone has only finite lifetime, even these tiny machines.

Using the fact that tRNAs are very codon specific, one can figure out the exact association of codons to amino acids. A polyneucleotide sequence made up of repeated AAAG comprises of 4 different codons, AAA, AAG, AGA and GAA. In the presence of only $latex [^{14}C]$ lysyl-tRNA ( tRNA that codes for the amino acid lysine) , the binding of $latex [^{14}C]$ lysyl-tRNA to ribosomes was very sensitive to which one of the above 4 codons was in excess. Khorana found that codons AAA and AGA significantly increased the binding of $latex [^{14}C]$ lysyl-tRNA to ribosomes. By the way all these tRNA contained the radioactive isotope $latex [^{14}C]$ instead of regular carbon, so that you can keep a tab on them.

khorana-fig2.png

Fig-7 : The number of ribosome bound lysine-tRNA increases very significantly as you add more of AAA and AAG and remains indifferent on adding more of GAA or AGA (source: https://www.nobelprize.org/nobel_prizes/medicine/laureates/1968/khorana-lecture.html)

This above mentioned method of inferring the exact association of codons to amino acids was not as smooth as described above. The degree of binding was not always so neat that one could infer the associations with great confidence but this was definitely a very powerful method. At the end of the day researches around the globe managed to map 61 codons to 20 amino acids and also figured out the three stop codons ( UAA, UGA , UAG ).
khorana-code.png

Fig-8 : The Genetic Code (source : https://www.nobelprize.org/nobel_prizes/medicine/laureates/1968/khorana-lecture.html)

The Genetic Code has it's own beauty to it. Codons that code for the same amino acid (synonym codons) usually differ only in terms of the base present in the third position. To quote Nirenberg, " These synonym codons are systemically related to one another". Close observation of the code tells you that there are five patterns of codon degeneracy that can be found in the genetic code,
nirenberg-degen.png

Fig-9 : Five kinds of degeneracy that exists in the genetic code (source: http://www.pnas.org/content/47/10/1588.long)

One of the advantages of this systemic degeneracy in code is that it safeguards the whole process of protein production from mutations that changes the nucleotide at the third position. To quote Nirenberg,

"Many mutations, therefore, are silent ones. The code appears to be arranged so that the effects of base replacements in DNA, or erroneous translations of bases in mRNA, often are minimized".

I must admit that I am yet to convince myself that if one were optimising the code for robustness against single nucleotide mutation, this particular systemic degeneracy offers the maximum robustness.
tRNA and Robert Holley :

Francis Crick in the mid 1950 had proposed what's called as "adaptor hypothesis". Crick had proposed that each amino acid is first attached to it's own specific sequence "adaptor" piece of nucleic acid and the order of assembly of the amino acids is then determined by a specific recognition between the adaptor and the nucleic acid which is serving as the informational template. In this way the amino acids could be lined up by the template in a specific order. Coupling between adjacent amino acids would then lead to the synthesis of a polypeptide whose sequence is determined by the template nucleic acid. The evidence for "adaptor hypothesis" was provided by the discovery of tRNA. Robert Holley at Cornell is the man who revealed the structure of tRNA. I found it bit hard to understand all the chemistry that went into determining the structure but I can totally appreciate the beauty of the final structure.

Robert Holley extracted alanine transfer RNA (alanine-tRNA) from yeast. The process of obtaining pure sample of alanine-tRNA on it's own was an ordeal. Using various methods he managed to obtain the primary sequence of the alanine-tRNA. Remember that this was the first nucleotide sequence known for a naturally existing RNA. Since knowing the RNA sequence let's you to figure out the DNA sequence (gene) that codes for the particular RNA, one can say this is also the first naturally occurring DNA strand(gene) to be sequenced. It's very easy to miss the importance of this work since sequencing today has become a trivial laboratory procedure but it wasn't the case when genetic code was being cracked. Though the tRNA is single stranded piece of RNA, it had a 3 dimensional structure. 3 dimensional structures are hard to figure out and often one needs to use X-ray diffraction to unravel them. But Robert Holley managed to obtain the secondary structure very well, he noticed that tRNA hardly looks like a single stranded piece of RNA because of the large amount of base pairing that exists between the complementary bases ( A-U, G-C). The final structure happened to have 3 loops and one of the loop regions contained the all important triplet codon which specifically bind to the complementary triplet codon and helps to carry the amino acid corresponding to the complementary triplet codon. The loops are formed by the bases which couldn't find complementary bases to hook up with.
holley-trna.png

Fig-10 : Secondary structure of alananine-tRNA, thanks to Robert Holley (source: https://www.nobelprize.org/nobel_prizes/medicine/laureates/1968/holley-lecture.html)

The systemic degeneracy in genetic code we came across makes more sense in the light of Wobble base hypothesis, proposed by Francis Crick in 1966. According to Wobble base hypothesis the Watson-Crick base pair rules (A-U , G-C) are not always obeyed when tRNA binds to the mRNA template. The first two nucleotide of the triplet codon obey the Watson-Crick base pair rules strictly but the third need not obey completely, hence you have scenarios where the the codon binding need not be very strong leading tRNA to "wobble". This allows for tRNAs with different codon recognition unit but corresponding to the same amino acid to help each other in their absence. In fact the exact mechanism itself is very involved and needless to say I am not competent enough to summarise it without fucking it up.

The genetic code is conserved across all existing living species, which makes it an universal code. The abundance of certain codons in mRNA template, the abundance of each tRNA (tRNA) can vary across species and individuals but the fundamental code remains same. There are tons of philosophical questions one can ask, like why is the code the way it is ? At what point in time this code essentially became universal ? What other kind of coding system existed before this particular one became universal ? Is this code optimised for certain properties or is it as good as any other ? ... I have no answers for any of them but I noticed that there are lot of papers written exploring these questions. Hopefully sometime in future I will go through few of them and in the process discover some more interesting stuff. Before that we need to acknowledge hundreds of scientists who contributed to the cracking of the genetic code, especially Marshall Nirenberg, Har Gobind Khorana and Robert Holley. It should be noted that neither of the three guys belonged to the RNA Tie Club, which makes this a great underdog story. When Nirenberg was working on unravelling the genetic code at NIH in an era when nobody from NIH had ever won a Nobel Prize, a large number of colleagues realised the importance of Nirenberg's work and lent their helping hands and their work really paid off in every sense. Marshall Nirenberg became the first person from NIH to win a Nobel prize.

The RNA Tie Club members, especially Francis Crick, Gamow contributed a lot to the cracking of genetic code. It is said that when Nirenberg spoke about his Poly-U experiment to a small audience in Moscow in 1961, Francis Crick who was among the audience realised the massive importance of this work and requested Nirenberg to present it to lager audience during the same meeting in Moscow. Finally, seven years after Nirenberg spoke about what I consider as the most important experiment among the numerous experiments that lead to the cracking of genetic code, Marshall Nirenberg, Har Gobind Khorana and Robert Holley were jointly awarded the Nobel prize in Physiology or Medicine "for their interpretation of the genetic code and its function in protein synthesis" .

all_recepients.jpg

Fig-11 : Robert Holley (L), Marshall Nirenberg (M) and Har Gobind Khorana (R) (source : https://www.nobelprize.org/nobel_prizes/medicine/laureates/1968/)

References:

p.s. This was originally written by me on my wordpress blog at https://kuyyamudi.wordpress.com/

Sort:  

@boyonpointe, let me be the first to welcome you to Steemit! Congratulations on making your first post!

I gave you a $.05 vote!

Would you be so kind as to follow me back in return?

Hello @boyonpointe , by now you must have got some feedback. But I will just point out one thing. While using images, please make sure their corresponding license allows you to share those commercially. Also 3 points to note: Please give proper credits to all images you are using. Mention the license, author(or uploader) and source. An example here:

3954245670_e3a3e28240_z.jpg
Image Source: Flickr, Uploader: ch00n, License: CC BY 2.0

(As an example from Publishers, you cannot reproduce a PNAS figure, but Scientific reports allow using images because of respective license stuff.)

And yes welcome to steemit. Happy steeming!

I will keep that in mind and soon update the details. The article includes one diagram drawn by me , few from the PNAS paper cited and few from Nobel prize official website which is also cited. But will keep this in mind. Thank you

Coin Marketplace

STEEM 0.29
TRX 0.12
JST 0.034
BTC 62759.93
ETH 3112.27
USDT 1.00
SBD 3.87