Glossary

The BIN combines a pre-checked registry of barcodes with an online database of specimens to identify and support taxonomic placement. By using graph-theory algorithms, specimen sequences are clustered into distinct OTUs that resemble biological species as closely as possible. These clusters are given a distinct number, i.e. BIN, that can then be linked to other studies as well as to other information in BOLD or other systems. BINs are not stable but can fuse as further sequence data is added.
Bioinformatics is the use of computational and statistical tools to process and interpret biological data. Bioinformatics typically uses mathematical models that explain the scenario in which the data were collected, how they could be interpreted and which factors may have affected the final fate of the experiment. Different software and pipelines have been developed to apply complex algorithms that allow analyzing and assembling data which is mainly founded in molecular studies such as high throughput sequencing.
BLAST is a fast bioinformatic method that can be run locally or online, to find similarities between a query DNA (or protein) sequence and other sequences that are registered in a database (e.g. NCBI, GenBank) and to align them. The BLAST algorithms include calculations of the statistical significance of the retrieved hit, to determine if the alignment is random or if it actually reflects conservative biology: BLAST indicates an expectation value (e-value) for each aligned sequence pair; if the e-value is extremely small (from 10- 10 to 10-200) the alignment is pertinent and unlikely to be random.
BOLD (Barcoding of life data systems) is a database and workbench of DNA barcode data. This database also includes the Barcode Index Number (BIN) information for the deposited barcodes, which consists of similar barcode sequences clustered into units that closely approximate biological species. BOLD also stores further metadata, e.g. information about data preservation (i.e. primers, electropherograms, images and sequences) or collection (collector, habitat, locality, GPS coordinates, date).
A clade or monophyletic group is a group of organisms that are lineal descendants of a common ancestor (e.g. the anthropoid apes and humans), where each of these descendants represents a single branch in the “tree of life”.
Clustering is an algorithmic method that attempts to find patterns in a dataset and then groups these patterns into smaller partitions, e.g. based on similarity (OTU clustering, where barcode sequences are grouped according to similarity). Examples are K-Means clustering, Mean-Shift clustering, hierarchical clustering, OPTICS and DBSCAN clustering algorithms.
Community DNA is the whole DNA present in a community, therefore it represents the whole biological diversity within a sample. Community DNA profiles are used in forensic biology and soil investigations because the total microbiome in the sample can be revealed compared to standard culture techniques.
Droplet-digital PCR (ddPCR) is a quantitative PCR method that partitions PCR reactions into droplets and PCR amplification is detected by using fluorescence probes. This so-called “third-generation” PCR detects the amplification in each of the thousands of droplets, allowing the direct quantification of the target DNA without a standard curve of the reference. ddPCR is more accurate than qPCR; it can detect differences as low as 1.25-fold, while qPCR can detect differences 2-fold and above.
DNA barcoding is a system for species identification focused on the use of a short, standardized genetic region acting as a “barcode” in a similar way that Universal Product Codes (UPCs) are used by supermarket scanners to distinguish commercial products.
The prefix ‘meta’ refers to the simultaneous collection of barcode sequences from multiple specimens (and species/taxa). DNA metabarcoding is the use of DNA barcoding methods applied to the identification of complex and different environmental samples (such as water, soil, feces or sediments) amplifying short regions of one or few informative genes obtained with HTS. It requires competence in PCR, bioinformatics and biostatistics to analyze sequencing results.
Environmental DNA (eDNA) refers to DNA that can be extracted from environmental samples (such as soil, water or air), without first isolating any target organism. This DNA originates from cells of organisms or their waste products (e.g. saliva, urine, feces). In certain cases, DNA extracted from organismal tissue can be seen as eDNA (e.g. for parasite or bacterial communities).
Free DNA in ecology refers to free DNA molecules that can be found in various ecosystems, terrestrial or aquatic. Such molecules are extracellular DNA freely existing outside organisms, either bound to the sediments or carried by water over large distances. Also the term “ExDNA” (extracellular DNA) is used in this context. Free DNA can originate from organisms of any taxon and consequently can be used in detection and monitoring studies. However, when referring to microorganisms (such as bacteria, microalgae etc.), it is not straightforward to separate free DNA from cellular DNA.
A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Haplotypes from the human diploid focus are non-recombining regions of one chromosome. A haplotype also refers to haploid markers, e.g. mitochondrial or chloroplastid markers, or a set of unique linked single-nucleotide polymorphisms (SNPs) that tend to always occur together.
A HPC or supercomputer is particularly powerful mainframe computer with a high level of computational capacity, which uses parallel processing techniques for solving complex computational problems.
High-Throughput Sequencing is a set of different sequencing technologies that can produce thousands to millions of DNA sequences from many samples in a short amount of time. They replace older, slower and more expensive (based on costs per base-pair) technologies like Sanger sequencing. An older term for HTS is Next-Generation Sequencing (NGS).
Metagenome is all genetic material of different organisms present in an environmental sample.
Metagenomics is the analysis of genetic material gathered from the environment, as opposed to analysing single specimens.
Molecular species delimitation is DNA-based species identification using different methods for defining OTUs that should resemble biological species. Tree-based methods such as GMYC (generalized mixed Yule-coalescent) and PTP (Poisson tree processes) are often used for analysis of single locus, while BPP (Bayesian Phylogenetics Phylogeography) focuses on multiple loci. Distance-based methods include ABGD (Automatic Barcode Gap Discovery) and BINs (Barcode Index Numbers).
MOTU is a group of similar organisms based on sequence data comparisons. See also “OTU (Operational Taxonomic Unit)”.
Multiplex polymerase chain reaction (PCR) is used for amplification of multiple targets in a single PCR. In a multiplexing assay, more than one target sequence can be amplified by using multiple primer pairs in one reaction mixture. As an extension to the practical use of PCR, this technique has the potential to produce considerable savings in time and effort within the laboratory without compromising on the utility of the experiment. There are two types of multiplex PCR: 1. Single Template PCR Reaction This technique uses a single template which can be genomic DNA along with several pairs of forward and reverse primers to amplify specific regions within a template. 2. Multiple Template PCR Reaction This technique uses multiple templates and several primer sets in the same reaction tube. Presence of multiple primers may lead to cross-hybridization with each other and the possibility of mis-priming with other templates. Multiplex PCR has applications in a range of studies, including pathogen identification, high throughput SNP genotyping, mutation analysis, gene deletion analysis, template quantification, linkage analysis, RNA detection and forensic studies.
Next-generation sequencing (NGS) is a DNA sequencing technology that was introduced in 2005 and replaced previously used Sanger (chain termination) method. NGS revolutionized genomic research by enabling the production of large volumes of sequence data in parallel. The term NGS is now often replaced by High-Throughput Sequencing.
PCR was developed by Kary Mullis in 1983. During PCR, a selected DNA fragment (DNA sequence, DNA barcode) is amplified so that an appropriate quantity of this DNA sequence is available for further analysis (e.g. gel-electrophoresis, sequencing). The whole process is based on thermal cycling, which enables the different steps of the whole reaction to take place in order. PCR requires: ● target DNA fragment ● a pair of short, single stranded oligonucleotides, called primers (forward and reverse primers) that are complementary to the two ends of the target DNA sequence ● DNA-polymerase that is an enzyme carrying out the DNA replication by using the primers as the template to start ● free nucleotides (dNTPs) as the building blocks for the new copies of target DNA fragment ● reaction medium (buffer or water) including Magnesium ions Steps of PCR: ● denaturation: at ~96°C the two strands of DNA separate from each other ● annealing: at lower temperature (50–65°C, depending on primer GC content and length) the primers bind to their complementary sequence on the single stranded DNA ● extension: usually at 67–72°C (depending on polymerase) the DNA-polymerase enzyme starts the synthesis of the new DNA strand starting from the primer using available free nucleotides
Phylogenetics is the study of evolutionary relatedness among groups of organisms. Classic phylogenetics deals mainly with physical or morphological features. More recent advances have led to the development of modern molecular phylogenetics, which uses DNA or amino acid sequence data to infer these relationships by comparing homologous sequences.
Phylogenomics can be considered as a part of phylogenetics, referring to the analysis of genome data – large sets of sequence data – to infer evolutionary relationships between the sequences and the organisms they are derived from based on the degree of genomic similarity. Using large data sets overcomes the problem of stochastic (or sampling) errors encountered when reconstructing phylogenies from single or few nucleotide or amino acid sequences.
PlutoF is a web-based platform for storing and managing biodiversity data, which provides database and computing services for taxonomic, ecological and genetic research.
A primer is a short single strand of RNA or DNA of approximately 20 nucleotide bases. A primer defines the region of the starting point for DNA synthesis after separation of double-stranded DNA. These bases are needed for the DNA polymerase to start the replication reaction based on the opposite DNA strand.
Quantitative polymerase chain reaction (qPCR) or real-time PCR is a molecular technique that follows the PCR workflow incorporating the monitoring of the DNA amplified at each cycle of PCR. qPCR is able to determine the absolute or the relative quantities of a specific DNA sequence.
Sequence Read Archive (SRA) makes biological sequence data available for all researchers. It is one of the 35 NCBI (National Center for Biotechnology Information) databases. It stores raw sequencing HTS (High- Throughput Sequencing) data and alignment information. It does not contain taxonomic information. SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes.
Shotgun sequencing is a method for determining the sequence of long DNA strands, based on randomly breaking up DNA into numerous small fragments that can be sequenced separately. Sequenced fragments are then reassembled into the complete sequence by finding overlapping regions. This was one of the early technologies that enabled full genome sequencing.
SDMs use modeling software to estimate the likely occurrence (or likely absence) of taxa over geographic space and time using environmental data, i.e. the similarity of conditions at an unknown site compared to the conditions at a known site and presence/absence data of taxa. Species distribution modeling is also known under other names including climate envelope-modeling, habitat modeling and (environmental or ecological) niche-modeling.
Tags are short nucleotide stretches used to individually label amplicon libraries. Typically, one tag- combination is used for one sample. Tag-switching, also known as tag-jumping, refers to the phenomenon when tags occur in combinations not used causing potential errors that you need to consider when interpreting your NGS results. Tag-switching can occur if tags are added in the final PCR during the building of your library and the samples are then pooled prior sequencing. After the tagging PCR, individual samples might contain single- stranded amplicons, and when the samples are pooled these may form heteroduplexes thereby mixing different tags, making tag-switching likely to occur. Tag-switching can also occur from simple cross- contamination or during the sequencing process and therefore it is good to have long tags. The likelihood of tag-switching is typically about 1 per 1000 reads. You can identify the places of tag- switching by adding tagging PCR replicates and then account for it in your data analysis.
These terms refer to the ecological description of species diversity on different scales in ecological communities. Alpha deals with the smallest scale, the local species diversity, and the other terms refer to larger ecological units (beta = regional, gamma = total in the landscape). Alpha diversity measures the mean species diversity within local habitats.
An assemblage is a group of taxonomically related organisms from different species that share similar habitat requirements. It is also considered as a subset of the total species that coexist in a community and occur together in space and time. Assemblage is different from communities; the latter is used to describe the total set of organisms that share the same ecological habitat without any distinction about their taxonomic relationship.
The term benthic is used when relating to the bottom of a body of water, for example a lake bottom, from the shore to the deep parts. It is also used when relating to, or occurring in the depths of the ocean.
Benthos is a community of organisms that inhabit the benthic zone of freshwater and marine systems.
In ecology, beta diversity (β-diversity or true beta diversity) is the ratio between regional and local species diversity. It suggests how diverse given communities are when compared to each other. It is used to estimate gamma diversity (γ-diversity) which is the total species diversity in a landscape together with alpha diversity (α-diversity), which is the mean species diversity at the habitat level.
Bioassessment refers to a process of evaluating the ecological integrity of both terrestrial and aquatic environments by measuring characteristics of organisms or assemblages of organisms that inhabit those environments. For aquatic environments, bioassessment refers to the assessment of ecological integrity of a waterbody by measuring attributes of the assemblage of organisms inhabiting the waterbody. In conjunction with biological measurements, bioassessment of aquatic environments usually includes measurements of instream and riparian zone physical habitat constituents. Common assemblages of aquatic organisms used for bioassessment include fish, macroinvertebrates and algae.
Bioindicator is a term for an organism restricted to one or a few habitat types that potentially represent a better ecological indicator of environmental change than a habitat generalist. Organisms of that kind are used to monitor environmental changes and assess the impacts of disturbance on an ecosystem.
Biomonitoring is the practice of recording species diversity and abundance across different locations and times. The comparison of data with reference datasets is used to observe biological responses to influences and to assess the ecological condition and changes in the environment. The aim of biomonitoring is to assess the ecological status and to characterize the quality of biological environment.
Brackish water is a broad term used to describe water which salinity is between that of fresh and marine water, and these are often transitional areas where such waters mix. An estuary, which is the part where river meets the sea, is the best-known example of brackish water.
Calcareous refers to the principal composition of calcium salts (calcium carbonate), containing lime or its derived soil. The term is used in different scientific disciplines.
Conductivity is a measure of the ability of water to conduct electricity. It increases as the amount of dissolved minerals (ions) increases. These conductive ions originate from dissolved salts and inorganic materials such as alkalis, chlorides, sulfides and carbonate compounds.
Current is a movement in a body of water caused by major ocean circulation or tides, by waves along shorelines or by gravity-induced flow in rivers.
Community refers to all organisms, i.e. populations of two or more different species that interact in a common area, while assemblage refers to taxonomically related groups of species populations.
Cryptic species is a species in which the diagnostic morphological characters are not easily perceived (or absent) and that do not hybridize under normal conditions.
Detritivore is an animal that eats organic material (detritus) found in sediments, originating from dead and decaying plant and animal matter.
An ecosystem is a dynamic complex in the environment between biotic and abiotic factors, which are linked together through nutrient cycles and energy flows.
Environment is defined as the physical, chemical and biological components that surround an organism. The difference between environment and habitat is that habitat is an environment that is specific to the certain organism or group of organisms, and contains all the components required by the organisms.
EPT is an acronym for the collective reference to the insect orders Ephemeroptera, Plecoptera and Trichoptera.
Eutrophication is a consequence of nutrient overload in aquatic ecosystems usually noticed by the bloom of opportunistic green algae or cyanobacteria.
False-positive is a result indicating that a given condition is present when it is not. It is analogous to a type I error in statistics.
A false negative error is a type II error occurring when the absence or non-existence of something is falsely inferred when it is real or does exist (i.e. a conclusion based on false information).
Gamma diversity (γ-diversity) represents the overall species diversity of a range of habitats or communities within a region. It is determined by the number of species occupying individual habitats (alpha diversity, α) and by the rate of change in species composition across habitats or among communities (beta diversity, β). The term was introduced by R. H. Whittaker in 1960.
Habitat refers to all the environmental conditions, both biotic and abiotic, in which a species lives and reproduces. Habitat is a part of the environment that encompasses all the biotic and abiotic components in which several habitats exist.
Intercalibration is a procedure or state achieved by a group of researchers or laboratories engaged in collecting and analyzing a certain type of data (such as in a monitoring program) in which they produce and maintain compatible data outputs. Intercalibration overcomes the problems resulting from methodological differences (use of different field and laboratory protocols) and allows direct comparison of the results. If the sampling and analytical methods used by different researchers are not identical, results obtained from different protocols must be standardized (i.e. intercalibrated) to allow for meaningful comparison of the data.
Invasive alien species (IAS) are defined as non-indigenous species that, once introduced into an environment, successfully overcome the acclimatization period, establish a self-sustaining population and show good capacity of expansion in the new environment. They cause ecological, economic or social damages and represent a big problem for the local biodiversity.
Kick-sampling is a technique used to sample benthic invertebrates in a stream or a river, in which a net is held underwater and the substrate upstream of the net is disturbed by kicking for a predetermined period of time. In order to sample different habitats in the stream, it is important to move around the site during sampling.
Lentic is a term used to describe a body of water composed by standing water such as lakes, ponds, seasonal pools, ditches and seeps. Lentic systems are characterized by slow or non-existent movement of water. This makes the ecosystem stratified due to different light and oxygen availability according to the depth of the water, so the biotic elements found in each layer are often different.
Lotic is a term used to describe continuously flowing water bodies such as rivers and streams, from rapid torrent to slow-moving waters. These environments are usually shallower than lentic ones (such as lakes and ponds), with water flow and temperature as key abiotic factors for the ecology. These environments are also greatly affected by seasonal precipitation and snow melting.
Macrohabitat is a habitat big enough to contain multiple environments within it and support multiple types of organisms.
Macroinvertebrates are macroscopic invertebrates that are large enough to be seen with the naked eye (>0.5 mm). For aquatic macroinvertebrates, at least one life stage is bound to water (e.g. streams and rivers, lakes, groundwater or marine environments).
The Marine Strategy Framework Directive adopted in 2008 aimed to protect the marine environment across Europe more effectively. The Commission produced a set of detailed criteria and standard methodology to help achieve Good Environmental Status (GES) in the EU's marine waters by 2020 and to protect the resource base upon which marine-related economic and social activities depend. The directive requires the use of several indicators including species diversity, seafloor integrity, food web structure as well as non-indigenous and commercial species.
Meiofauna are organisms with intermediate size between the microbes and macrofauna. The term meio is a Greek term for “smaller”. Meiofauna are generally classified as protists and invertebrates between 50 and 1000 μm, although some researchers use a 500 μm upper size limit. Meiofauna occur in all aquatic ecosystems (marine to freshwater) and include representatives from two-thirds of the known animal phyla.
Microhabitat is a small or limited habitat differentiated from extensive adjacent habitat by distinct flora and fauna and environmental characteristics, such as substrate type, water velocity, light, temperature or depth.
Mock-communities are constructed communities of mixes of individuals of known identity or taxonomic composition, used as positive controls or to perform pilot studies.
Multi-habitat-sampling is a sampling technique that aims at sampling all habitats encountered at an aquatic sampling site related to its proportion at the site. A typical example in the context of the WFD: initially, 20 sampling units are taken, each of which represents 5 % substrate coverage of the bottom of the waterbody and pooled into one multi-habitat sample.
The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity, also known as the Nagoya Protocol on Access and Benefit Sharing (ABS), is a 2010 supplementary agreement to the 1992 Convention on Biological Diversity (CBD). Its aim is the implementation of one of the three objectives of the CBD: the fair and equitable sharing of benefits arising out of the utilization of genetic resources, thereby contributing to the conservation and sustainable use of biodiversity. However, there are concerns that the added bureaucracy and legislation will, overall, be damaging to the monitoring and collection of biodiversity, to conservation, to the international response to infectious diseases and to research.
Non-indigenous species (NIS), also called non-native, alien or exotic species, are species introduced intentionally or accidentally by anthropogenic disturbance outside their original, past or present, area of distribution, that might survive and subsequently reproduce. NIS might eventually become invasive alien species if they become sufficiently abundant, causing adverse effects on native plants and animals.
See “Non-indigenous”.
Operational taxonomic unit (OTU; also molecular operational taxonomic unit, MOTU) is a concept used to classify and group organisms based on similarities of their DNA sequence within the selected marker region of the genome.
Pelagic organisms are micro- and macroorganisms of flora and fauna that live in the water column of lakes, streams and oceans.
Phytobenthos refers to photosynthetic organisms (plants, algae and some prokaryotes) living at the bottom of aquatic environments.
Rare species are those that are represented by only a few individuals or restricted to particular habitats to a level that is demonstrably less than the majority of other organisms of comparable taxonomic entities. The actual rarity cutpoint is a subjective decision.
Saprobic index is one of the biological indicators for water quality assessment. It was introduced by Pantle and Buck in 1955 and it is used to measure the impact of organic pollution. Saprobic indices are based on a selection of aquatic macroinvertebrate taxa or aquatic microorganisms that are proven to be either sensitive or tolerant to organic pollution. Several saprobic indices are currently used in Europe, e.g. the Austrian, Czech and German indices.
An ecological diversity index is a quantitative measure that reflects how many different species there are in a community, and how evenly the individuals are distributed. The Shannon index has been a popular diversity index in the ecological literature, where it is also known as Shannon's diversity index, the Shannon–Wiener index, the Shannon–Weaver index or the Shannon entropy. The measure was originally proposed by Claude Shannon.
Species concept deals with the definition of the term “species” and the delimitation criteria for species that have been continuously changing through time, but it also depends on the field of study where it is used. Species concept is one of the most debated topics in biology. Today there are at least 26 different species concepts. Maybe the simplest definition is that species is the basic unit of classification of living organisms. It is often referred as the largest group of organisms in which two individuals ofthe appropriate sexes or mating types can engage in sexual reproduction and produce offspring. This definition is exclusive for organisms that reproduce sexually, but does not fit, for instance, to bacteria.
Taxonomic assignment is one of the principal tasks in analyzing sequences obtained from samples containing DNA from unknown organisms (e.g. environmental DNA). Taxonomic assignment is the identification of the taxonomic affiliation of DNA sequences in the sample, preferably to the species or genus level. Sequence reads from a sample are compared (aligned) to a set of sequences in reference databases whose species/taxon assignment is truly known. A sequence read can be equally similar to more than one reference sequence, and these ambiguities are then solved by assigning the read to a consensus sequence, such as the lowest common ancestor of all the candidate sequences in a reference taxonomy or some more specific candidate reference sequence. Reference databases and taxonomies are essential to determine the phylogenetic affiliation of sequence reads. The precision of identification depends on the number of identified sequences in the reference databases. In the absence of a reference database, or in the presence of environmental sequences from unknown organisms, taxonomic assignment is not possible and sequence reads are grouped into clusters of related species, usually called operational taxonomic units (OTUs).
The Water Framework Directive 2000/60/EC is an EU directive that commits European Union member states to achieve good qualitative and quantitative status of all water bodies (including marine waters up to one nautical mile from shore) by 2015.