Species Modelling

Species modelling (also known as niche modelling) plays an important role in the prediction of species distributions. It provides a way to study biodiversity distribution, past and present, to understand its causes, and to propose scenarios and strategies for sustainable use and for preservation initiatives.

Ecological niche modeling involves many different algorithms. Some of the more commonly used ones are:

Generalized Linear Model (GLM)
Generalized Additive Model (GAM)
Classification and Regression Tree (CART)
BIOCLIM Model
Climate Space Model (CSM)
Distance to Average Model
Minimum Distance Model
Environmental Distance Model
Domain Modelling Domain (DOMAIN)
Genetic Algorithm for Rule-set Production (GARP)
Ecological Niche Factor Analysis (ENFA)
Maximum Entropy Modeling

Species modelling can be used as a tool to assist in a variety of ecosystem-related studies and assessments:

to predict the types of organisms that may be found associated with a particular type of terrain
to map out areas where sensitive species may be located
to map out areas of high species diversity
to locate potential areas for restoration or habitat banking

Ocean Ecology uses a number of freely available programs to carry out species modelling.

EcoSim

EcoSim is an interactive computer program for null model analysis in community ecology. EcoSim use the following models:

Co-Occurrence
Macroecology
Niche Overlap
Size Overlap
Species Diversity
Standard Tests
Guild Structure

The kinds of questions you can ask with EcoSim are:

Is the species richness and evenness of unpolluted habitats significantly different from that of polluted habitats?
Is there evidence for segregation in microhabitats of co-occurring species?
Does the taxonomic diversity in satellite communities differ from that of the adjacent source pool?

Example of a plot generated by EcoSim. Image from EcoSim, null model software for ecologists – see link.

Biomapper

Biomapper is a kit of GIS and statistical tools designed to build habitat suitability (HS) models and maps for organisms. It is based on the Ecological Niche Factor Analysis (ENFA) which enables HS models to be created without requiring absence data (e.g., data documenting locations where the organism is not present). ENFA determines which environmental factors are most responsible for the ecological distribution of a species.

DesktopGarp

DesktopGarp is a software package for biodiversity and ecologic research that allows the user to predict and analyze wild species distributions. GARP (Genetic Algorithm for Rule-set Production) is a genetic algorithm that creates ecological niche models for species. The models describe environmental conditions under which the species should be able to maintain populations. For input, GARP uses a set of point localities where the species is known to occur and a set of geographic layers representing the environmental parameters that might limit the species’ capabilities to survive.

The following image is a probability distribution map of a bird species (Cerulean warbler, Dendroica cerulea) created by DesktopGarp. Red is high probability, green intermediate and blue is low.

GraspeR

Generalized Regression Analysis and Spatial Prediction for R (GraspeR) is a general method for making spatial predictions about a variable of interest (VOI; e.g., the presence or absence of a particular species) using point surveys of the VOI and spatial coverages of important environmental drivers (IEDs; e.g., environmental conditions which might affect the distribution of the species of interest). GraspeR uses R, a free programming language for environment and statistical computing. GraspeR can be downloaded from SourceForge

openModeller

openModeller provides a flexible, user friendly, cross-platform environment where the entire process of conducting a fundamental niche modelling experiment can be carried out. The software includes facilities for reading species occurrence and environmental data, selection of environmental layers on which the model should be based, creating a fundamental niche model and projecting the model into an environmental scenario. A number of algorithms are provided as plugins, including:

BIOCLIM Model
Climate Space Model
Distance to Average Model
Environmental Distance Model
GARP (Genetic Algorithm for Rule-set Production)
Minimum Distance Model

A number of models generated using openModeller with different algorithms.

Biodiversity-R

Biodiversity-R is a GUI (Graphical User Interface, via the R-Commander) for R. It provides some utility functions (often based on the vegan package, an R package for vegetation ecologists) for statistical analysis of biodiversity and ecological communities, including:

species accumulation curves. These are plots in which the accumulated number of species [i.e. the number of new species found in each successive quadrat added to the total already found] is plotted on the Y axis against the quadrats [in the order tallied] on the X axis. Species accumulation curves are useful tools for determining the number of quadrats needed to sample a single stratum of a community.
diversity indices
Renyi profiles (a series of diversity measures which depend on how much weight is given to rare species)
Generalized Linear Models (GLMs) for analysis of species abundance and presence-absence
distance matrices. A distance matrix is a two-dimensional array containing the distances, taken pairwise, between members within a set of points. In ecology, ecological distance measures are constructed to compare differences in species composition among communities or sites (e.g., how “far” apart communities are from each other with respect to species composition).
Mantel tests (tests used to compare the similarity of two distance matrices)
cluster, constrained and unconstrained ordination analysis. Communities can be represented in a multidimensional space where each species forms a separate axis, and communities are plotted as points by their species abundance values. Ordination methods transform this multidimensional space so that there are fewer dimensions, but these will show more information on the ecological distances between communities.

A screenshot of Biodiversity-R. Image from Chapter 3: Doing biodiversity analysis with Biodiversity-R – see link.

Marine Geospatial Ecology Tools

Marine Geospatial Ecology Tools (MGET), also known as the GeoEco Python package, is an open source geoprocessing toolbox designed for coastal and marine researchers and GIS analysts who work with spatially-explicit ecological and oceanographic data in scientific or management fields. MGET includes over 150 tools useful for a variety of tasks, such as converting oceanographic data to ArcGIS formats, identifying fronts in sea surface temperature images, fitting and evaluating statistical models such as GAMs and GLMs by integrating ArcGIS with the R statistics program, analyzing coral reef connectivity by simulating hydrodynamic larval dispersal, and building grids that summarize fishing effort, CPUE and other statistics. Currently under development are tools for identifying rings and eddy cores in sea surface height images, for analyzing connectivity networks, for estimating fishing effort when no effort data are available, and for predicting hard bottom habitat from coarse grain bathymetry.

A predicted species distribution map which was generated using ArcRstats (HabMod), a component of MGET.

Diva-GIS

Diva-GIS is a geographic information system (GIS), that can be used for many different purposes. It is particularly useful for mapping and analyzing biodiversity data, such as the distribution of species, or other ‘point-distributions’.

With Diva-GIS you can:

Make large or small scale maps which integrate political boundaries, rivers, satellite images, locations of sites where an animal species was observed, and other data.
Make grid maps of the distribution of biological diversity to identify diversity “hotspots”.
Map and query climate data.
Predict species distributions using the BIOCLIM or DOMAIN models.

A screenshot of Diva-GIS. Image from DIVA GIS – International Potato Centre (CIP) – see link.

Maximum Entropy for Species Distribution Modeling

Maximum Entropy for Species Distribution Modeling (MaxEnt) is a program for maximum entropy modelling of species geographic distributions. The model for a species is determined from a set of environmental or climate layers (or “coverages”) for a set of grid cells in a landscape, together with a set of sample locations where the species has been observed. The model expresses the suitability of each grid cell as a function of the environmental variables at that grid cell. A high value of the function at a particular grid cell indicates that the grid cell is predicted to have suitable conditions for that species. The computed model is a probability distribution over all the grid cells. The distribution chosen is the one that has maximum entropy subject to some constraints: it must have the same expectation for each feature (derived from the environmental layers) as the average over sample locations.

The figure below shows a distribution model for the brown-throated three-toed sloth, Bradypus variegatus, which was generated by MaxEnt.

Image from A Brief Tutorial on Maxent – see link.

R-Biomod

R-Biomod is an ecological GLM/GAM tool for R and Splus that is also capable of generating models using CART (classification and regression trees), ANN (artificial neural network), and BRT (boosted regression trees). BRT has recently been shown to be among the most powerful techniques.

An Example of Species Modelling

A wide variety of information may be input into a species model, including data such as habitat parameters, temperatures, currents, and tides. The information which is entered into the model should relate in some way to the distribution of the species being modelled (e.g., they should limit or control the distribution of the species in some manner). An example of some typical input data is shown in the stacked plot below. The data sets shown are, from top to bottom, species presence, seafloor hardness, bathymetric position index, rugosity, slope, depth, and substrate type.

Using a maximum entropy modelling program, a model for the distribution of this species was created. This model is shown below as a map and as a 3D overlay on the depth of the site. Species presence is shown as a probability (e.g., a value of 1.0 indicates that the species is present, whereas a value of 0 indicates that it is absent; values in between 1.0 and 0 indicate the likelyhood of the species being present).

Note that this particular organism’s preferred habitat was in the shallow subtidal flats.

Now that a model has been developed which relates the organism’s distribution to specific habitat features, this model can be used to predict the possible presence of the organism at other locations where habitat data has been collected but no population surveys for the organism have been carried out. The inputs for the predictive stage of the model are the same as for the developmental stage of the model except that there are no species presence data.

Stacked plot of data for model prediction.

From these inputs, a map showing the possible locations of the species is generated. The highest probability for species presence on this map is 0.2 – much lower than the highest value of 0.8 on the original model map. This site is a much less likely habitat for the study organism than the first site.

Species distribution predicted by the model.

PDF 📄