Table of Contents
A biodiversity simulation framework
We have designed a simulation framework modelling biodiversity reduction to improve and validate conservation policies (in this context, choices about details collecting and area protection throughout a landscape) applying an RL algorithm. We carried out a spatially explicit person-dependent simulation to evaluate long term biodiversity improvements primarily based on organic procedures of mortality, alternative and dispersal. Our framework also incorporates anthropogenic processes such as habitat modifications, selective elimination of a species, rapid local climate alter and present conservation efforts. The simulation can incorporate hundreds of species and tens of millions of people today and observe inhabitants dimensions and species distributions and how they are afflicted by anthropogenic exercise and weather adjust (for a detailed description of the model and its parameters see Supplementary Approaches and Supplementary Desk 1).
In our model, anthropogenic disturbance has the result of altering the natural mortality charges on a species-unique level, which relies upon on the sensitivity of the species. It also affects the overall number of people (the carrying capacity) of any species that can inhabit a spatial unit. For the reason that sensitivity to disturbance differs between species, the relative abundance of species in each individual cell alterations immediately after adding disturbance and on achieving the new equilibrium. The outcome of local weather improve is modelled as regionally affecting the mortality of people primarily based on species-specific climatic tolerances. As a outcome, extra tolerant or hotter-tailored species will have a tendency to change sensitive species in a warming surroundings, hence inducing selection shifts, contraction or growth across species relying on their climatic tolerance and dispersal capability.
We use time-forward simulations of biodiversity in time and house, with growing anthropogenic disturbance through time, to enhance conservation guidelines and evaluate their overall performance. Along with a illustration of the all-natural and anthropogenic evolution of the process, our framework involves an agent (that is, the plan maker) using two varieties of actions: (1) checking, which gives data about the recent condition of biodiversity of the process, and (2) shielding, which works by using that information to select parts for safety from anthropogenic disturbance. The checking coverage defines the level of element and temporal resolution of biodiversity surveys. At a small degree, these include species lists for each and every cell, while far more in-depth surveys present counts of populace size for every single species. The security policy is educated by the outcomes of checking and selects protected parts in which further more anthropogenic disturbance is managed at an arbitrarily very low benefit (Fig. 1). Due to the fact the full variety of locations that can be secured is constrained by a finite spending plan, we use an RL algorithm42 to optimize how to conduct the defending actions dependent on the information presented by monitoring, these kinds of that it minimizes species decline or other requirements depending on the coverage.
We give a comprehensive description of the simulation process in the Supplementary Techniques. In the sections under we existing the optimization algorithm, explain the experiments carried out to validate our framework and show its use with an empirical dataset.
Conservation preparing in a reinforcement discovering framework
In our design we use RL to optimize a conservation policy below a predefined plan goal (for instance, to lower the reduction of biodiversity or optimize the extent of guarded area). The CAPTAIN framework incorporates a place of steps, specifically checking and protecting, that are optimized to increase a reward R. The reward defines the optimality criterion of the simulation and can be quantified as the cumulative price of species that do not go extinct in the course of the timeframe evaluated in the simulation. If the benefit is established equal across all species, the RL algorithm will minimize in general species extinctions. Nevertheless, unique definitions of value can be utilized to lower decline dependent on evolutionary distinctiveness of species (for instance, reducing phylogenetic variety reduction), or their ecosystem or financial price. Alternatively, the reward can be established equal to the sum of secured space, in which case the RL algorithm maximizes the number of cells safeguarded from disturbance, irrespective of which species arise there. The amount of money of spot that can be secured by the protecting motion is identified by a spending plan Bt and by the expense of defense (C_t^c), which can differ throughout cells c and via time t.
The granularity of monitoring and preserving steps is based on spatial units that may perhaps involve a single or far more cells and which we determine as the security units. In our technique, defense models are adjacent, non-overlapping places of equivalent sizing (Fig. 1) that can be guarded at a price tag that cumulates the expenses of all cells bundled in the unit.
The checking action collects information and facts in just each individual security device about the state of the method St, which contains species abundances and geographic distribution:
$$S_t={H_t,D_t,F_t,T_t,C_t,P_{t},B_{t}}$$
(1)
where Ht is the matrix with the quantity of individuals across species and cells, Dt and Ft are matrices describing anthropogenic disturbance on the process, Tt is a matrix quantifying climate, Ct is the charge matrix, Pt is the latest defense matrix and Bt is the offered budget (for a lot more specifics see Supplementary Procedures and Supplementary Table 1). We outline as characteristic extraction the final result of a perform X(St), which returns for every security device a set of capabilities summarizing the point out of the system in the unit. The selection and assortment of capabilities (Supplementary Strategies and Supplementary Desk 2) depends on the monitoring plan πX, which is made the decision a priori in the simulation. A predefined checking coverage also establishes the temporal frequency of this action throughout the simulation, for case in point, only at the initially time phase or repeated at every single time step. The functions extracted for every single device depict the enter upon which a defending action can just take position, if the funds lets for it, next a protection policy πY. These attributes (listed in Supplementary Desk 2) incorporate the range of species that are not now guarded in other models, the amount of uncommon species and the price of the unit relative to the remaining price range. Distinct subsets of these options are utilized based on the monitoring policy and on the optimality criterion of the protection coverage πY.
We do not believe species-particular sensitivities to disturbance (parameters ds, fs in Supplementary Table 1 and Supplementary Techniques) to be identified functions, for the reason that a specific estimation of these parameters in an empirical situation would require focused experiments, which we think about unfeasible across a significant range of species. Instead, species-certain sensitivities can be discovered from the procedure via the observation of variations in the relative abundances of species (x3 in Supplementary Desk 2). The functions analyzed across distinctive guidelines are specified in the subsection Experiments beneath and in the Supplementary Strategies.
The protecting motion selects a security unit and resets the disturbance in the involved cells to an arbitrarily very low stage. A secured unit is also immune from potential anthropogenic disturbance increases, but safety does not stop local weather change in the device. The design can consist of a buffer region together the perimeter of a shielded unit, in which the level of protection is reduced than in the centre, to mimic the usually damaging edge effects in safeguarded areas (for example, increased vulnerability to severe weather). Although guarding a disturbed spot theoretically allows it to return to its original biodiversity ranges, population progress and species composition of the protected spot will nonetheless be controlled by the death–replacement–dispersal processes described earlier mentioned, as well as by the state of neighbouring parts. As a result, safeguarding an region that has currently been through biodiversity reduction may possibly not outcome in the restoration of its first biodiversity degrees.
The protecting motion has a price established by the cumulative cost of all cells in the picked defense device. The value of protection can be set equivalent throughout all cells and continual through time. Alternatively, it can be outlined as a functionality of the current level of anthropogenic disturbance in the mobile. The price tag of each individual preserving action is taken from a predetermined finite spending budget and a device can be protected only if the remaining funds lets it.
Coverage definition and optimization algorithm
We frame the optimization difficulty as a stochastic regulate challenge where by the point out of the technique St evolves through time as described in the segment higher than (see also Supplementary Methods), but it is also motivated by a established of discrete steps decided by the protection plan πY. The security plan is a probabilistic policy: for a provided set of plan parameters and an enter state, the policy outputs an array of chances linked with all feasible protecting steps. When optimizing the design, we extract actions according to the possibilities generated by the coverage to make guaranteed that we explore the house of actions. When we operate experiments with a fixed plan instead, we pick the motion with highest probability. The enter state is reworked by the aspect extraction perform X(St) described by the checking coverage, and the options are mapped to a probability by a neural network with the architecture described beneath.
In our simulations, we repair monitoring coverage πX, therefore predefining the frequency of monitoring (for example, at every time action or only at the first time action) and the amount of money of info created by X(St), and we enhance πY, which determines how to very best use the readily available finances to maximize the reward. Every action A has a cost, described by the operate Value(A, St), which listed here we established to zero for the monitoring action (X) across all checking insurance policies. The price tag of the protecting motion (Y) is rather set to the cumulative price tag of all cells in the selected security device. In the simulations presented below, unless of course normally specified, the defense plan can only incorporate one guarded unit at each and every time move, if the spending plan enables, that is if Price tag(Y, St) < Bt.
The protection policy is parametrized as a feed-forward neural network with a hidden layer using a rectified linear unit (ReLU) activation function (Eq. (3)) and an output layer using a softmax function (Eq. (5)). The input of the neural network is a matrix x of J features extracted through the most recent monitoring across U protection units. The output, of size U, is a vector of probabilities, which provides the basis to select a unit for protection. Given a number of nodes L, the hidden layer h(1) is a matrix U × L:
$$h_ul^(1)=gleft(mathopsumlimits_j =1^Jx_ujW_jl^(1)right)$$
(2)
where u ∈1, …, U identifies the protection unit, l ∈1, …, L indicates the hidden nodes and j ∈1, …, J the features and where
is the ReLU activation function. We indicate with W(1) the matrix of J × L coefficients (shared among all protection units) that we are optimizing. Additional hidden layers can be added to the model between the input and the output layer. The output layer takes h(1) as input and gives an output vector of U variables:
$$h_u^(2)=sigma left(mathopsumlimits_l=1^Lh_ul^(1)W_l^(2)right)$$
(4)
where σ is a softmax function:
$$sigma(x_i) = fracexp(x_i)sum_uexp(x_u)$$
(5)
We interpret the output vector of U variables as the probability of protecting the unit u.
This architecture implements parameter sharing across all protection units when connecting the input nodes to the hidden layer this reduces the dimensionality of the problem at the cost of losing some spatial information, which we encode in the feature extraction function. The natural next step would be to use a convolutional layer to discover relevant shape and space features instead of using a feature extraction function. To define a baseline for comparisons in the experiments described below, we also define a random protection policy (hatpi ), which sets a uniform probability to protect units that have not yet been protected. This policy does not include any trainable parameter and relies on feature x6 (an indicator variable for protected units Supplementary Table 2) to randomly select the proposed unit for protection.
The optimization algorithm implemented in CAPTAIN optimizes the parameters of a neural network such that they maximize the expected reward resulting from the protecting actions. With this aim, we implemented a combination of standard algorithms using a genetic strategies algorithm43 and incorporating aspects of classical policy gradient methods such as an advantage function44. Specifically, our algorithm is an implementation of the Parallelized Evolution Strategies43, in which two phases are repeated across several iterations (hereafter, epochs) until convergence. In the first phase, the policy parameters are randomly perturbed and then evaluated by running one full episode of the environment, that is, a full simulation with the system evolving for a predefined number of steps. In the second phase, the results from different runs are combined and the parameters updated following a stochastic gradient estimate43. We performed several runs in parallel on different workers (for example, processing units) and aggregated the results before updating the parameters. To improve the convergence we followed the standard approach used in policy optimization algorithms44, where the parameter update is linked to an advantage function A as opposed to the return alone (Eq. (6)). Our advantage function measures the improvement of the running reward (weighted average of rewards across different epochs) with respect to the last reward. Thus, our algorithm optimizes a policy without the need to compute gradients and allowing for easy parallelization. Each epoch in our algorithm works as:
for every worker p do
(epsilon _pleftarrow mathcalN(0,sigma )), with diagonal covariance and dimension W + M
for t = 1,…,T do
Rt ← Rt−1 + rt(θ + ϵp)
end for
end for
R ← average of RT across workers
Re ← αR + (1 − α)Re−1
for every coefficient θ in W + M do
θ ← θ + λA(Re, RT, ϵ)
end for
where (mathcalN) is a normal distribution and W + M is the number of parameters in the model (following the notation in Supplementary Table 1). We indicate with rt the reward at time t, with R the cumulative reward over T time steps. Re is the running average reward calculated as an exponential moving average where α = 0.25 represents the degree of weighting decrease and Re−1 is the running average reward at the previous epoch. λ = 0.1 is a learning rate and A is an advantage function defined as the average of final reward increments with respect to the running average reward Re on every worker p weighted by the corresponding noise ϵp:
$$A(R_e,R_T,epsilon )=frac1Pmathopsumlimits_p(R_e-R_T^p)epsilon _p.$$
(6)
Experiments
We used our CAPTAIN framework to explore the properties of our model and the effect of different policies through simulations. Specifically, we ran three sets of experiments. The first set aimed at assessing the effectiveness of different policies optimized to minimize species loss based on different monitoring strategies. We ran a second set of simulations to determine how policies optimized to minimize value loss or maximize the amount of protected area may impact species loss. Finally, we compared the performance of the CAPTAIN models against the state-of-the-art method for conservation planning (Marxan25). A detailed description of the settings we used in our experiments is provided in the Supplementary Methods. Additionally, all scripts used to run CAPTAIN and Marxan analyses are provided as Supplementary Information.
Analysis of Madagascar endemic tree diversity
We analysed a recently published33 dataset of 1,517 tree species endemic to Madagascar, for which presence/absence data had been approximated through species distribution models across 22,394 units of 5 × 5 km spanning the entire country (Supplementary Fig. 5a). Their analyses included a spatial quantification of threats affecting the local conservation of species and assumed the cost of each protection unit as proportional to its level of threat (Supplementary Fig. 5b), similarly to how our CAPTAIN framework models protection costs as proportional to anthropogenic disturbance.
We re-analysed these data within a limited budget, allowing for a maximum of 10% of the units with the lowest cost to be protected (that is, 2,239 units). This figure can actually be lower if the optimized solution includes units with higher cost. We did not include temporal dynamics in our analysis, instead choosing to simply monitor the system once to generate the features used by CAPTAIN and Marxan to place the protected units. Because the dataset did not include abundance data, the features only included species presence/absence information in each unit and the cost of the unit.
Because the presence of a species in the input data represents a theoretical expectation based on species distribution modelling, it does not consider the fact that strong anthropogenic pressure on a unit (for example, clearing a forest) might result in the local disappearance of some of the species. We therefore considered the potential effect of disturbance in the monitoring step. Specifically, in the absence of more detailed data about the actual presence or absence of species, we initialized the sensitivity of each species to anthropogenic disturbance as a random draw from a uniform distribution (d_s sim mathcalU(0,1)) and we modelled the presence of a species s in a unit c as a random draw from a binomial distribution with a parameter set equal to (p_s^c=1-d_stimes D^c), where Dc ∈ [0, 1] is the disturbance (or ‘threat’ sensu Carrasco et al.33) in the unit. Under this approach, most of the species expected to live in a unit are considered to be present if the unit is undisturbed. Conversely, many (especially sensitive) species are assumed to be absent from units with high anthropogenic disturbance. This resampled diversity was used for feature extraction in the monitoring steps (Fig. 1c). While this approach is an approximation of how species might respond to anthropogenic pressure, the use of additional empirical data on species-specific sensitivity to disturbance can provide a more realistic input in the CAPTAIN analysis.
We repeated this random resampling 50 times and analysed the resulting biodiversity data in CAPTAIN using the one-time protection model, trained through simulations in the experiments described in the previous section and in the Supplementary Methods. We note that it is possible, and perhaps desirable, in principle to train a new model specifically for this empirical dataset or at least fine-tune a model pretrained through simulations (a technique known as transfer learning), for instance, using historical time series and future projections of land use and climate change. Yet, our experiment shows that even a model trained solely using simulated datasets can be successfully applied to empirical data. Following Carrasco et al.33, we set as the target of our policy the protection of at least 10% of each species range. To achieve this in CAPTAIN, we modified the monitoring action such that a species is counted as protected only when at least 10% of its range falls within already protected units. We ran the CAPTAIN analysis for a single step, in which all protection units are established.
We analysed the same resampled datasets using Marxan with the initial budget used in the CAPTAIN analyses and under two configurations. First, we used a BLM (BLM = 0.1) to penalize the establishment of non-adjacent protected units following the settings used in Carrasco et al.33. After some testing, as suggested in Marxan’s manual45, we set penalties on exceeding the budget, such that the cost of the optimized results indeed does not exceed the total budget (THRESHPEN1 = 500, THRESHPEN2 = 10). For each resampled dataset we ran 100 optimizations (with Marxan settings NUMITNS = 1,000,000, STARTTEMP = –1 and NUMTEMP = 10,000 (ref. 45) and used the best of them as the final result. Second, because the BLM adds a constraint that does not have a direct equivalent in the CAPTAIN model, we also repeated the analyses without it (BLM = 0) for comparison.
To assess the performance of CAPTAIN and compare it with that of Marxan, we computed the fraction of replicates in which the target was met for all species, the average number of species for which the target was missed and the number of protected units (Supplementary Table 4). We also calculated the fraction of each species range included in protected units to compare it with the target of 10% (Fig. 6c,d and Supplementary Fig. 6c,d). Finally, we calculated the frequency at which each unit was selected for protection across the 50 resampled datasets as a measure of its relative importance (priority) in the conservation plan.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.