Genomes contain recipes for microRNAs, which are small RNA molecules that help regulate which proteins should be made in a cell.

New tool can quickly reveal special parts of a genome

More and more animals and plants have had their entire genome mapped. But what do all the letters really mean? Norwegian researchers have created a new tool that can locate microRNAs.

It has been more than 20 years since the entire human genome was mapped. This means that we are able to read the entire sequence of all the base pairs or letters in our DNA.

DNA is the ‘recipe book’ for how we are put together. Genes tell us how everything in the body, like skin, muscles and organs, is supposed to work and look.

But researchers found that only one to two per cent of our genome actually consists of genes that code for proteins. A whole lot in the genome thus does not directly describe how the body should be built.

“That finding was a bit strange,” says Bastian Fromm, a researcher at UiT Norway's Arctic University, who has his laboratory at the The Arctic University Museum in Tromsø. “Why do we have such large amounts of non-coding, or ‘meaningless’ DNA?”

The other surprise, says Fromm, was that humans do not necessarily have many more genes than other organisms, despite the fact that we consider ourselves rather complex.

Humans have 25 000 genes, while sweet corn has 32 000, according to the Great Norwegian Encyclopedia.

What’s what?

Mapping the entire genome of an organism has become easier. Scientists have now sequenced the genomes of over 6000 organisms.

Finding out what the letters in the sequencing mean is also important, such as where we find the genes, what they do, and what the functions of other areas in the genome are. This is called annotation.

Non-coding DNA – the parts of the genome that are not used to make proteins –nevertheless has important functions.

Recipes for microRNA are one of the things hidden in the chaos of the letters. These are small RNA molecules that help regulate which genes are active in a cell.

Mapping microRNA recipes in a genome can be a time-consuming process.

Fromm and Sinan Ugur Umu at the University of Oslo, as well as other researchers in Oslo, Tromsø and internationally, have created a tool that can uncover recipes for microRNAs in genomes.

RNA

RNA is a molecule found in cells. They have important tasks in the production of proteins and regulation of genes. RNA consists of almost the same building blocks as DNA.

DNA has two strands with ladder steps, whereas RNA has only one strand, and is like half a ladder.


Source: The Great Norwegian Encyclopedia

Trained on database

The tool is called MirMachine and is a program based on machine learning.

The researchers present the tool and their results after testing it on 100 mammalian genomes in an article in Cell Genomics.

They trained the machine learning tool on a database. The database contains genomes from 75 animal species whose microRNAs have been mapped in detail.

The database has taken almost ten years to create and fine tune, says Fromm, who is an expert on microRNA.

The machine learning tool has learned to recognize what microRNA is. It can see patterns that would be difficult for humans to pick up.

“To demonstrate the power of our tool, we’ve successfully used MirMachine on a number of genomes from extinct organisms such as the mammoth and the giant genomes of salamanders and lungfish, where genome annotation is particularly difficult,” says Fromm.

Compared the machine to the manual work 

How can the researchers know that the ‘machine’ is answering correctly, and not just making things up?

“Of course, it’s difficult to know 100 per cent,” says Fromm.

“But when the algorithm was completed, we tested it on all 75 organisms that we have in the database. We said ‘we know nothing about this organism, try to find the microRNA’.”

“Afterwards, we compared what we’d done manually for several years and what MirMachine did over a long weekend. It was almost impossible to find differences between them. We found some, but they were hard to see.”

Hope it will be used

Fromm hopes that many researchers will use the tool. He imagines that it would be useful for researchers who are working to map the entire genome of new organisms.

Several such whole-genome sequencing projects are available today, he says. They include The Earth BioGenome Project and the Darwin Tree of Life.

Pål Sætrom is a professor at NTNU and studies the role of non-coding RNA in gene regulation and disease. He has looked at the new study.

“I think the work is useful, because it solves some of the challenges associated with identifying and annotating which microRNA genes are found in various species that have available genome sequences,” Sætrom writes in an e-mail.

Bastian Fromm is an expert on microRNA.

Better alternative

The main limitations of the method are that it currently can only be used on animals, and it can only identify evolutionarily conserved microRNA genes, says Sætrom.

‘Evolutionarily conserved’ means that the genes could still be found in animals living today.

However, an example of microRNA genes that are not evolutionarily conserved, but which can still be found in animals today, are microRNA genes that are only found in a specific species, says Sætrom.

“This means that the method is well suited to automatically map and annotate evolutionarily known microRNA genes in recently sequenced animal species.”

“However, the method could miss microRNA genes that are specific to a species or a subgroup of species. In this case, the method depends on such microRNA genes being found using other methods, like sequencing and bioinformatics analysis of small RNAs,” Sætrom says.

He adds that similar tools to the one the researchers have created already exist.

“It’s not like this work is currently done by hand, one microRNA gene at a time. But MirMachine is an improvement over these alternative tools,” he says.

Octopus study

Fromm says that one of the interesting things to investigate is whether the number of genes for microRNA is related to complexity and intelligence in an organism.

As mentioned, the number of protein-coding genes are not what is decisive here.

Last year, Fromm and colleagues carried out an octopus study. When it came to the number of protein-coded genes and genome size, everything was as described above. But the octopus is a very intelligent animal.

“Then it was a surprise to find that the octopus has much more microRNA than birds, fish and reptiles, and almost as much as mammals and humans,” says Fromm.

So maybe microRNA is related to the development of intelligence.

Prevents proteins from being made

MicroRNA helps to regulate genes and is the youngest of the gene regulators discovered. MicroRNA was discovered 30 years ago, says Fromm.

He explains how the small RNA molecules help to control which genes should be active in a cell.

Every cell in the body has the entire DNA in it, the entire ‘cookbook’. But each cell only reads some of the recipes, so eye cells won’t suddenly start making hair.

When a gene is read, a working copy is made in the form of RNA, called mRNA. The messenger RNA is transported to the ribosomes so that a protein is produced from the recipe.

MicroRNA can put an end to this.

“It finds a complementary sequence on the mRNA and inserts itself, preventing a protein from being made,” says Fromm.

This is a safeguard to stop proteins that absolutely should not be made, says Fromm.

Relevant for cancer research

Fromm has previously researched microRNA in cancer cells and found that they have less microRNA than normal cells.

“That is part of the explanation for why cancer cells can become different things that normal cells can’t,” he says.

“MicroRNA is really important for maintaining a stable cell type that does its job and nothing else,” says Fromm.

We have approximately 550–600 microRNA genes.

Reference:

Sinan Ugur Umu, Bastian Fromm et.al.: Accurate microRNA annotation of animal genomes using trained covariance models of curated microRNA complements in MirMachineCell Genomics,  2023.

———

Read the Norwegian version of this article at forskning.no

Powered by Labrador CMS