What Type Of Structure And Language Does Our DNA Has : Why Are We Being Measured By Genes
The fugu fish genome is about 8 times smaller than the human genome and 330 times smaller than the genome of the lungfish, the protopter. What "ghosts" live in the "graveyard of genomes" and how much debris in our DNA?
Some molecular genetics
Recall that the transfer of hereditary information is based on a double-stranded DNA molecule . It is a polymer of four types of monomers (nucleotides): adenine (A), thymine (T), cytosine (C) and guanine (G), and is laid on chromosomes. In humans, 23 pairs of chromosomes (22 pairs of non-sexual and one pair of sexual) located in the nucleus, they form the basis of our genome. If we took one human cell, sewed all the chromosomes together and pulled them into a thread, we would get a molecule two meters long, consisting of six billion base pairs (nucleotides). Three billion from the pope and three billion from the mother (diploid set of chromosomes).
The most studied type of DNA functional sequences are genes encoding proteins. From such genes, an RNA molecule is read, which then plays the role of a matrix for the synthesis of proteins and determines their amino acid sequence. The coding part of the RNA molecule can be divided into triplets of nucleotides (codons), which either correspond to some amino acid, or determine the place of termination of protein synthesis (stop codons). The rule of codon matching amino acids is called a genetic code. For example, the GCC codon encodes the amino acid alanine.
Sometimes in the media you can hear the incorrect phrase "the genetic code mutated." But mutations do not occur in the code, but in the DNA molecule (in the genome). As a result, the nucleotide sequences change. This can be compared with the replacement of a letter in some word. For example, the phrase "Masha was riding a motorcycle" turns into the phrase "Sasha was riding a motorcycle", if one letter M "mutated" in the letter C. Change of the genetic code is much more serious - this is like changing the alphabet. Imagine that in the whole text the letter M suddenly turned into the letter K. Now we have "Kasha rode on the kototsike". It is clear that such changes lead to significant consequences and therefore occur very rarely in nature. But they are happening! For example, in some infusoria (unicellular protozoa), one of the stop codons can encode the amino acid glutamine . In addition, it was easy to small artificial change in the genetic code of some modern organisms, for example, E. coli . But this is more an exception than the rule. The majority of organisms have the same genetic code: a person has the same genetic code as a worm or cucumber. But the genomes of these organisms differ very much. The same alphabet, but different text.
We are being measured by genes
Once they thought that such a complex organism as a person should have a lot of genes. Before the human genome was read , scientists even arranged sweepstakes: how many genes will be detected? Numbers were called up to hundreds of thousands. Many scientists were surprised when it turned out that the number of genes in humans and a small round worm Caenorhabditis elegans is about the same. The worm has about 20,000 genes, and we have 20-25 thousand , that for the "crown of creation" the fact is rather offensive. Especially when you consider that there are many organisms with a larger genome (double-boiling fish Protopterus aethiopicus has a genome 40 times greater than a human) and with a larger number of genes (Oryza sativa has 32,000-50000 genes ).
But in fact in humans less than 2% of the genome is encoded by any proteins. Why do the rest 98%? Maybe there is a secret of our complexity? It turned out that there are important non-coding regions of DNA. For example, these are sites of promoters on which the enzyme RNA polymerase sits down and where the synthesis of the RNA molecule begins. These are sites of binding of transcription factors - proteins that regulate the work of genes. These are telomeres that protect the ends of chromosomes, and centromeres, necessary for the correct divergence of chromosomes at different poles of cells during division. Some regulatory RNA molecules (for example, microRNAs ) are known, as well as RNA molecules that are part of important enzyme complexes, for example, ribosomal RNA. There are other examples of important non-coding regions of DNA.
But, alas, it turned out that most of our genome is like a desert : repeating sequences, the remains of "dead" viruses, which were once integrated into the genomes of our ancestors, the so-called "selfish mobile elements" DNA sequences capable of jumping from one genome to another, different pseudogenes are nucleotide sequences that have lost the ability to code proteins as a result of mutations but still retain some of the characteristics of the genes. This is not a complete list of "ghosts" that inhabit the "graveyard of the genome."
In connection with the foregoing, there is a view that most of the human genome is not functional. In 2004 Nature magazine published an article describing mice, from the genome of which were cut out significant fragments of non-coding DNA measuring 1.5 million and 0.8 million nucleotides. It was shown that these mice do not differ from the usual body structure, development, life span or the ability to leave offspring . Of course, some differences might have gone unnoticed in the laboratory, but on the whole it was a serious argument in favor of the existence of "junk DNA" , which you can get rid of without serious consequences. Of course, it would be interesting to cut out not a couple of million nucleotides, but a billion, leaving only the predicted gene sequences and known functional elements.
Will it be possible to derive such a "minimal mouse", and will it be able to function normally? Can a person get by with a genome "only half a meter long"? Perhaps someday we will find out about it. Meanwhile, another important argument in favor of the existence of garbage DNA is the presence of fairly close organisms with very different genome sizes. The fugu fish genome is about 8 times smaller than the human genome (although there are approximately as many genes in it) and 330 times less than the genome of the already mentioned fish protopter. If every nucleotide in the genome were functional, then the following question would be appropriate: why is the onion genome five times larger than we are with you?
On the enormous differences in the size of genomes of similar organisms drew the attention of the evolutionary biologist Susumu Ohno. It is believed that It was she who introduced the term "Junk DNA" . It turns out that back in 1972, long before the human genome was read, It had plausible representations of both the number of genes in the human genome and the amount of "garbage" in it. In his article "So much garbage DNA in our genome" , he notes that in the human genome there should be about 30,000 genes. This number is close to the truth, as we learned dozens of years later, but at that time it was not at all obvious. In addition, It provides an estimate of the functional proportion of the genome (6%), declaring more than 90% of the human genome as debris.
What for one - a find, for another - garbage
The ENCODE project (Encyclopedia of DNA Elements) called the idea of the existence of a junk DNA. Having obtained numerous experimental data on which parts of the human genome interact with different proteins, participate in transcription or other biochemical processes, the authors concluded that more than 80% of the human genome is somehow functional . Of course, this thesis caused heated discussion in the scientific community .
One of the most ironic articles critical to this conclusion of the ENCODE consortium is called: "On the immortality of televisions: the" function "in the human genome by the evolutionary gospel of ENCODE" . The article begins with an epigraph, which I dragged to the beginning of the text. Its authors, Professor Dan Graur (Dan Graur) and colleagues note that some members of the consortium ENCODE disagree on which part of the genome is functional. So, one of them later specified that it was not about 80% of the functional sequences in the genome, but about 40% , and the other even reduced the figure to 20% , but continued to insist that the term " junk DNA "must be" removed from the lexicon. " Above this joked that a new arithmetic was invented, according to which 20% more than 80% .
According to Graur and colleagues , members of the consortium ENCODE pretty freely interpreted the term "function". For example, there are proteins called histones. They can bind a DNA molecule and help it to fit in compactly. Histones can undergo certain chemical modifications. According to ENCODE, the hypothetical function of one of these histone modifications is the "preference to be at the 5 'end of the genes" (the 5' end is the end of the gene from which the DNA and RNA polymerase enzymes move when copying DNA or transcription, respectively). "This is roughly how to say that the function of the White House is to occupy the area of the land at 1600, Pennsylvania Avenue, Washington, DC," the scientists note .
The problem also arises with the attribution of the function to sections of DNA. Suppose that some DNA segment binds an important protein, and therefore ENCODE ascribes to this site a "function". It is known that some protein (transcription factor) binds to the following nucleotide sequence: TATAAA. Consider two identical TATAAA sequences in different parts of the genome. After the transcription factor binds to the first sequence, the synthesis of the RNA molecule, serving as a template for the synthesis of some important protein, begins. Mutations in this sequence will cause the RNA to be read poorly, the protein will not be synthesized, and this is likely to negatively affect the survival of the body. Therefore, such a sequence of TATAAA will be maintained in the genome through natural selection, and in this case it is appropriate to talk about the presence of a function. The second sequence of TATAAA originated in the genome for random reasons. Since it is identical to the first, a transcription factor is also associated with it. But there is no gene near, so binding does not lead to anything. If there is a mutation in this area, nothing will change, the body will not suffer. In this case, there is no point in talking about the function of the TATAAA site. However, it may be that the presence of a large number of TATAAA sequences in the genome away from genes is necessary simply to bind the transcription factor and reduce its effective concentration. In such a case, the number of such sequences in the genome will be selected by selection.
To prove that some part of DNA is functional, it is not enough to show that in this area there is a certain biological process (for example, DNA binding). Members of the ENCODE consortium write that DNA regions that are involved in transcription have a function. "But why it is necessary to emphasize that 74.7% of the genome is transcribed, while it can be said that 100% of the genome takes part in the reproduced biochemical process - replication!" Graur and colleagues again joke.
A good criterion for the functionality of the DNA site is that the mutations in it are quite harmful and significant changes in this area are not observed from generation to generation. How to determine such areas? Here, bioinformatics, modern science at the junction of biology and mathematics about the analysis of sequences of genes and proteins come to the rescue. We can take the genome of a person and a mouse and find all the parts of DNA that are similar between them. It turns out that in these two species some sections of the nucleotide sequences are very similar. For example, the genes necessary for the synthesis of ribosomal proteins are rather conservative, i. E. mutations in them are sufficiently harmful that the carriers of new mutations die out without leaving offspring. About such genes say that they are under negative selection, cleansing of harmful mutations. Other parts of the genomes will have significant discrepancies between the species, which indicates that the mutations in these areas are most likely harmless, and therefore their functional role is at least not large or determined by a specific sequence of nucleotides. A number of studies evaluated the proportion of human DNA sites under the pressure of negative selection. It turned out that only about 6.5-10% of the human genome are under this effect , and the noncoding regions, in contrast to the coding regions, are much less subject to negative selection . It turns out that from the standpoint of evolutionary criteria, less than 10% of the human genome is functional. Notice how close it was in 1972!
Does this mean that the remaining 90% of the human genome is just rubbish, which is better to get rid of? Not certainly in that way. There are considerations that the large size of the genome can be useful in itself. In bacteria, the replication of the genome is a serious limiting factor that prevents effective reproduction. Therefore, their genomes, as a rule, are small, and from all superfluous they get rid. In large organisms, as a rule, DNA replication of dividing cells does not contribute so much to the total amount of energy expenditure of an organism against the background of the costs of working the brain, muscles, organs of excretion, maintaining body temperature, and so on. At the same time, a large genome can be an important source of genetic diversity, increasing the chances of the emergence of new functional areas from non-functional ones due to mutations in them in the course of evolution. Mobile elements can carry regulatory elements, creating genetic diversity in the regulation of gene work. Thus, organisms with large genomes can theoretically be able to adapt more quickly to environmental conditions, paying relatively small additional costs for the replication of a larger genome. We will not find a similar effect on a separate organism, but it can play an important role at the population level.
The presence of a large genome can also reduce the likelihood that a virus will build into a functional gene (which can lead to gene breakdown and in some cases to cancer). In other words, it is possible that natural selection can act not only to maintain specific sequences in the genome, but to maintain certain sizes of the genome, the nucleotide composition in some of its sections, and so on.
It is worth giving an adequate assessment of the work of the consortium ENCODE. Yes, the idea that 80% or even 20% of the human genome is functional is controversial, but this does not mean that the entire ENCODE project is subject to criticism. Within ENCODE, a huge amount of data was received on how different proteins bind to DNA, information about gene regulation, and so on. These data are of great interest to specialists and widely in demand. But hardly in the near future it will be possible to get rid of the "garbage" in the genome - both from the concept and from the unnecessary sequences themselves.
References for Text and Images:
Support @steemstem and the #steemstem
project - curating and supporting quality STEM
related content on Steemit