As many of you know, a good percentage of our customers are doing microbiome research. Microbiomes, those localized communities of microorganisms that exist symbiotically with their immediate environment, can be found virtually anywhere; inside human colons, around plant roots, inside coral reefs and even within ant colonies. These little microbial microcosms are a hot topic right now. Recent studies have implicated their role for the wellbeing of people, animals, plants and entire oceans. Variations in microbial numbers and diversity within animals have shown critical involvement in auto-immune disease, obesity, acne, tooth decay, pregnancy and even brain chemistry. Plant studies have shown that symbiotic organisms impart resistance to drought, increase nutrient absorption and prevent attack by pathogens.


Today, we will discuss what happens after nucleic acids are extracted using QIAGEN’s sample prep kits. How is it that these conglomerations of microbial DNA/RNA can be converted into quantifiable information that is useful?


Amplicon and Whole Genome Sequencing

First and foremost, to understand how microbiomes might be important, we need to know what microbes are actually in there. Not too long ago, the only reasonable way to do this was to culture the microbes and then try to identify the species by growing up enough of them so that cell morphology, staining, antibodies, protein analysis or something similar could be used to identify the species. There are two major problems with this method. First, you lose all information about the relative concentrations of the microbes that were in there to begin with. Secondly, many microbes will not grow under typical laboratory conditions and so you would miss their existence entirely. With the invention of fast and cheaper DNA sequencing, the culturing step can be eliminated. Microbial DNA and RNA can be directly extracted and isolated from a sample of soil, water, biofilm or stool, among other interesting bio-substances. Once the microbial nucleic acids are isolated they can be used to determine which species existed within the sample using sequencing.


There are two common sequencing methods currently being used for this purpose. The first, referred to as targeted amplicon sequencing, requires knowing a little something about the community of microbes that might be in the sample to begin with. Scientists make use of the gene for the 16S ribosomal RNA. (1) It is the most highly conserved DNA in all cells but also contains a number of hypervariable regions that have diverged over time. PCR is used to amplify those divergent sections of the microbial genomic DNA, and these differences make it possible to uniquely identify particular microbes. By selecting primers that target conserved regions that flank the variable regions, unique differences can be used to identify microbial species. But while 16S ribosomal RNA is slightly different for virtually all species, no single hypervariable region can be used to distinguish all bacteria. So scientists do the best they can by selecting those sets of primers that will characterize the most common sets of bacteria in their sample. These are often referred to as “universal” primers. But in reality, there are no 100% universal primers. This is one drawback of the method.


In the second method, whole genome sequencing, DNA or RNA is isolated directly from environmental samples and sequenced without using a PCR amplification step. Sampled fragments of the whole genome or transcriptome are used to create libraries which are then sequenced in a pool. The genomic sequence is reconstructed and these sequences are compared to known databases of microbial sequences, like MBGB or KEGG, using sequence comparative analysis programs like BLAST. A disadvantage of this method is that not all microbial species have been sequenced or are accurately cataloged in these databanks. So again, some microbes can be missed.


Analysis of Similarity

Once microbial populations are identified through nucleic acid isolation, sequencing, and analysis, scientists then need to turn this into quantifiable information regarding what those population differences might mean. To achieve this goal, studies make comparisons between the microbial populations of very well-defined groups. Sometimes the groups are defined by their state of health, for example, people with diabetes compared to those without. They could also be defined by the locations that the samples were collected from, like soil from the Antarctic versus soil from the Russian Tundra.


Scientists compare the microbial population data from these well-defined groups and do what’s called an analysis of similarity (ANOSIM) to determine if the populations are more similar than they would be simply by chance. This type of analysis can get quite tricky and in research, it’s not just applied to microbial populations. But basically, one first generates data assuming there is no correlation between two data sets. This random data is then compared to the actual population of data that exists. From this analysis a statistical number called an R value is computed which indicates how similar or dissimilar two sets of data are when compared to what they would look like from pure chance. When applied to microbial communities the R value, a number between -1 and +1 indicates how similar populations of microbes are between groups. A number of +1 would indicate a very strong likelihood that microbial populations are similar in some way that would be very unlikely by chance. A number of -1 would also be statistically significant but would imply that two populations of bacterial are dissimilar for some particular reason that is not by chance. An R value of zero would indicate no relationship at all between the two populations.


Here is a simple example to illustrate. Imagine drawing two large, identically sized circles in the sand, circle A and circle B. You now have two different populations of bean bags: 20 red and 20 green. Let’s say these represent two species of bacteria. Now stand equally distant from the two circles and throw red and green bean bags into the circles in the sand. You should end up with something close to 10 red and 10 green within each circle. This is your random population because there is nothing biasing the two populations either way. Now do something different. Use only your left hand to throw the green bean bags and your right hand to throw the red bean bags into the two circles. Now compare the new population of bean bags in the two circles to the “random” one you did before. If you see a difference, then you’ve discovered something. Perhaps you don’t toss very well with your left hand. If you don’t see any difference between the two sets of circles well then you’ve discovered something too. There is nothing particularly significant about your throwing method. Scientists have a fancy name for this. It’s called the “null hypothesis,” meaning there is no relationship between the two groups of measurements, that is right and left-handed throwing have no effect. If we were talking about two bacterial populations perhaps we’d be testing if a difference in temperature affected the similarity of two microbial groups or whether being in close proximity affected the similarity.


Check out our newest products for your library prep needs:

  • QIAseq 1-Step Amplicon Library Kit - The QIAseq 1-Step Amplicon Library Kit combines end repair and ligation into a single 30-minute room temperature incubation for the simplest, easiest way to prepare your amplicons for Illumina sequencing.


  • QIAseq FX DNA Library Kit - The QIAseq FX DNA Library Kit takes you from 1 ng – 1 μg genomic DNA to sequencer-ready, whole genome libraries in just 2.5 hours.


  • QIAseq Ultralow Input Library Kit - The QIAseq Ultralow Input Library Kit enables the generation of high-quality libraries starting from just 10 pg –100 ng fragmented DNA and offer a robust solution for a wide range of research applications.



 (1) Weisburg WG, Barns SM, Pelletier DA, Lane DJ (January 1991). “16S ribosomal DNA amplification for phylogenetic study”. J Bacteriol. 173 (2): 697–703.