| Title: | Build Species Distribution Modeling using 'caret' |
|---|---|
| Description: | Use machine learning algorithms and advanced geographic information system tools to build Species Distribution Modeling in a extensible and modern fashion. |
| Authors: | Luíz Fernando Esser [aut, cre, cph] (ORCID: <https://orcid.org/0000-0003-2982-7223>), Reginaldo Ré [aut] (ORCID: <https://orcid.org/0000-0001-6452-3466>), Marcos R. Lima [aut] (ORCID: <https://orcid.org/0000-0002-5901-0911>), Edivando Couto [aut] (ORCID: <https://orcid.org/0000-0003-4264-8449>), José Hilário Delconte Ferreira [aut] (ORCID: <https://orcid.org/0000-0002-7116-2600>), Valéria Batista [aut] (ORCID: <https://orcid.org/0000-0002-6574-7338>), Dayani Bailly [aut] (ORCID: <https://orcid.org/0000-0002-6954-9902>) |
| Maintainer: | Luíz Fernando Esser <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.9.5 |
| Built: | 2026-06-08 05:43:19 UTC |
| Source: | https://github.com/luizesser/caretsdm |
sdm_area
This function includes new predictors to the sdm_area object.
add_predictors(sa, pred, variables_selected = NULL, gdal = TRUE, lines_as_sdm_area = FALSE) get_predictors(i)add_predictors(sa, pred, variables_selected = NULL, gdal = TRUE, lines_as_sdm_area = FALSE) get_predictors(i)
sa |
A |
pred |
|
variables_selected |
|
gdal |
Boolean. Force the use or not of GDAL when available. See details. |
lines_as_sdm_area |
Boolean. If |
i |
|
add_predictors returns a sdm_area object with a grid built upon the x parameter.
There are two ways to make the grid and resample the variables in sdm_area: with and
without gdal. As standard, if gdal is available in you machine it will be used (gdal = TRUE),
otherwise sf/stars will be used. lines_as_sdm_area and gdal parameters are passed
to sdm_area function, so they will be used in the grid creation and resampling of
predictors. They will be retrieved automatically from the sdm_area object.
For add_predictors the same input sdm_area object is returned including the
pred data binded to the previous grid.
get_predictors retrieves the grid from the i object.
Luíz Fernando Esser ([email protected]) and Reginaldo Ré. https://luizfesser.wordpress.com
sdm_area get_predictor_names bioc
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Retrieve predictors data: get_predictors(sa)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Retrieve predictors data: get_predictors(sa)
sdm_area
This function includes scenarios in the sdm_area object.
add_scenarios(sa, scen = NULL, scenarios_names = NULL, pred_as_scen = TRUE, variables_selected = NULL, stationary = NULL, crop_area = NULL) set_scenarios_names(i, scenarios_names = NULL) scenarios_names(i) get_scenarios_data(i) select_scenarios(i, scenarios_names = NULL)add_scenarios(sa, scen = NULL, scenarios_names = NULL, pred_as_scen = TRUE, variables_selected = NULL, stationary = NULL, crop_area = NULL) set_scenarios_names(i, scenarios_names = NULL) scenarios_names(i) get_scenarios_data(i) select_scenarios(i, scenarios_names = NULL)
sa |
A |
scen |
|
scenarios_names |
Character vector with names of scenarios. |
pred_as_scen |
Logical. If |
variables_selected |
Character vector with variables names in |
stationary |
Names of variables from |
crop_area |
A |
i |
A |
The function add_scenarios adds scenarios to the sdm_area or input_sdm
object. If scen has variables that are not present as predictors the function will use
only variables present in both objects. stationary variables are those that don't change
through the scenarios. It is useful for hidrological variables in fish habitat modeling, for
example (see examples below). When adding multiple scenarios in multiple runs, the function will
always add a new "current" scenario. To avoid that, set pred_as_scen = FALSE.
add_scenarios returns the input sdm_area or input_sdm object with a
new slot called scenarios with scen data as a list, where each slot of the
list holds a scenario and each scenario is a sf object.
set_scenarios_names sets new names for scenarios in sdm_area/input_sdm
object.
scenarios_names returns scenarios' names.
get_scenarios_data retrieves scenarios data as a list of sf objects.
select_scenarios selects scenarios from sdm_area/input_sdm object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen[1:2]) |> select_predictors(c("bio1", "bio12")) # Set scenarios names: sa <- set_scenarios_names(sa, scenarios_names = c( "future_1", "future_2", "current" )) scenarios_names(sa) # Get scenarios data: scenarios_grid <- get_scenarios_data(sa) scenarios_grid # Select scenarios: sa <- select_scenarios(sa, scenarios_names = c("future_1")) # Setting stationary variables in scenarios: sa <- sdm_area(rivs[c(1:200), ], cell_size = 100000, output_crs = 6933, lines_as_sdm_area = TRUE) |> add_predictors(bioc) |> add_scenarios(scen, stationary = c("LENGTH_KM", "DIST_DN_KM"))# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen[1:2]) |> select_predictors(c("bio1", "bio12")) # Set scenarios names: sa <- set_scenarios_names(sa, scenarios_names = c( "future_1", "future_2", "current" )) scenarios_names(sa) # Get scenarios data: scenarios_grid <- get_scenarios_data(sa) scenarios_grid # Select scenarios: sa <- select_scenarios(sa, scenarios_names = c("future_1")) # Setting stationary variables in scenarios: sa <- sdm_area(rivs[c(1:200), ], cell_size = 100000, output_crs = 6933, lines_as_sdm_area = TRUE) |> add_predictors(bioc) |> add_scenarios(scen, stationary = c("LENGTH_KM", "DIST_DN_KM"))
A data.frame with characteristics of each algorithm available in caretSDM. Each
column is a different characteristics. This can be helpful for more experienced modelers select
algorithms. See the source for a selection method using this data.
algorithmsalgorithms
## 'algorithms'
A data.frame with 230 rows and 60 columns:
Algorithms names
Algorithms attributes
<https://topepo.github.io/caret/models-clustered-by-tag-similarity.html>
This function obtains background data given a set of predictors.
background(occ, pred = NULL, method = "random", n_set = 1, n_bg = 10000, proportion = NULL) n_background(i) background_method(i) background_data(i)background(occ, pred = NULL, method = "random", n_set = 1, n_bg = 10000, proportion = NULL) n_background(i) background_method(i) background_data(i)
occ |
A |
pred |
A |
method |
Method to obtain the background data. One of: "random" or a custom function (see details). |
n_set |
|
n_bg |
|
proportion |
|
i |
A |
background is used in the SDM workflow to obtain background data, a step necessary for
MaxEnt algorithm to run. This function helps avoid the use of pseudoabsence data in background
algorithms and the use of background data in pseudoabsence algorithms, a very common mistake.
If user provides a custom function, it must have the arguments env_sf and occ_sf,
which will consist of two "sf"s. The first has the predictor values for the whole study
area, while the second has the presence records for the species. The function must return a
vector with cell_ids of the background data.
n_background returns the number of background records obtained per species.
background_data returns a list of species names. Each species name will have a
lists with background data from class sf.
A occurrences_sdm or input_sdm object with background data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
link{input_sdm} pseudoabsences occurrences_sdm
get_occurrences get_predictors
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Background generation: i <- background(i, proportion = 1) # All available data is obtained as background data.# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Background generation: i <- background(i, proportion = 1) # All available data is obtained as background data.
A stars object with bioclimatic variables (bio1, bio4 and bio12) for the Parana state in Brazil.
Data obtained from WorldClim 2.1 at 10 arc-min resolution.
biocbioc
## 'bioc'
A stars with 1 attribute and 3 bands:
Annual Mean Temperature
Temperature Seasonality
Annual Precipitation
<https://www.worldclim.org/>
Create buffer around records in occ_data to be used as study area
buffer_sdm(occ_data, size = NULL, occ_crs = NULL, mcp = FALSE)buffer_sdm(occ_data, size = NULL, occ_crs = NULL, mcp = FALSE)
occ_data |
A |
size |
|
occ_crs |
|
mcp |
|
A sf buffer around occ_data records.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: study_area <- buffer_sdm(occ, size = 50000, occ_crs = 6933) plot(study_area)# Create sdm_area object: study_area <- buffer_sdm(occ, size = 50000, occ_crs = 6933) plot(study_area)
This function aims to unveil the correlation of different algorithms outputs. For that, it uses the predictions on current scenario, but other scenarios can be tested.
correlate_sdm(i, scenario = "current")correlate_sdm(i, scenario = "current")
i |
A |
scenario |
A |
A data.frame with pearson correlation between projections.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Check correlations: correlate_sdm(i) }if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Check correlations: correlate_sdm(i) }
Data cleaning wrapper using CoordinateCleaner package.
data_clean(occ, pred = NULL, species = NA, lon = NA, lat = NA, capitals = TRUE, centroids = TRUE, duplicated = TRUE, identical = TRUE, institutions = TRUE, invalid = TRUE, terrestrial = TRUE, independent_test = TRUE, fun = NULL)data_clean(occ, pred = NULL, species = NA, lon = NA, lat = NA, capitals = TRUE, centroids = TRUE, duplicated = TRUE, identical = TRUE, institutions = TRUE, invalid = TRUE, terrestrial = TRUE, independent_test = TRUE, fun = NULL)
occ |
A |
pred |
A |
species |
A |
lon |
A |
lat |
A |
capitals |
Boolean to turn on/off the exclusion from countries capitals coordinates (see |
centroids |
Boolean to turn on/off the exclusion from countries centroids coordinates (see |
duplicated |
Boolean to turn on/off the exclusion from duplicated records (see |
identical |
Boolean to turn on/off the exclusion from records with identical lat/long values (see |
institutions |
Boolean to turn on/off the exclusion from biodiversity institutions coordinates (see |
invalid |
Boolean to turn on/off the exclusion from invalid coordinates (see |
terrestrial |
Boolean to turn on/off the exclusion from coordinates falling on sea (see |
independent_test |
Boolean. If |
fun |
Function. A custom function to apply to occurrence data. It must receive a |
If the user does not used GBIF_data function to obtain species records, the function may
have problems to find which column from the presences table has species, longitude and latitude
information. In this regard, we implemented the parameters species, lon and
lat so the use can explicitly inform which columns should be used. If they remain as NA
(standard) the function will try to guess which columns are the correct one.
A occurrences_sdm object or input_sdm with cleaned presence data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
GBIF_data occurrences_sdm sdm_area input_sdm
get_predictor_names
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Clean coordinates (terrestrial is set to false to make the run quicker): i <- data_clean(i, terrestrial = FALSE)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Clean coordinates (terrestrial is set to false to make the run quicker): i <- data_clean(i, terrestrial = FALSE)
Calculates ensemble predictions for species distribution models using custom or implemented methods.
ensemble_sdm(m, scen = NULL, method = "average", metric = NULL, fun = NULL ) get_ensembles( i, type = "matrix", spp_name = NULL, scenario = NULL, ensemble_type = NULL ) add_ensembles(e1, e2)ensemble_sdm(m, scen = NULL, method = "average", metric = NULL, fun = NULL ) get_ensembles( i, type = "matrix", spp_name = NULL, scenario = NULL, ensemble_type = NULL ) add_ensembles(e1, e2)
m |
A |
scen |
A |
method |
Character or a function. Which ensembles should be calculated? See details. |
metric |
Character. Used with |
fun |
Function. If |
i |
A |
type |
Character. Output format desired. One of |
spp_name |
Character or |
scenario |
Character or |
ensemble_type |
Character or |
e1 |
A |
e2 |
A |
ensembles could be set to three different strategies OR a custom function.
The three implemented strategies are:
average is the mean occurrence probability, which is a simple mean of predictions;
weighted_average is the same average, but weighted by a metric, which needs to be
set using argument metric (see mean_validation_metrics for the metrics available).
committee_average is the committee average, as known as majority rule, where predictions
are binarized and then a mean is obtained. To binarize predictions, user can set a custom
function in the fun argument to calculate a threshold for each model. Standardly, the
committee average uses the caret::thresholder function to find the threshold that
maximizes the sum of sensitivity and specificity (through caretSDM:::.MaxSeSp).
Custom function (fun) must use the argument mod, which is the model output from
caret package (see get_models) and must return a numeric value (see example).
method can also be set to a custom function, which must receive the argument pred_mat,
which is a matrix of predictions (columns are models and rows are cells) and return a vector of
predictions (one value per cell). See the median example below for a custom function.
get_predictions returns the list of all predictions to all scenarios, all species,
all algorithms and all repetitions. Useful for those who wish to implement their own ensemble
methods.
get_ensembles returns a matrix of data.frames, where each column is a
scenario and each row is a species.
scenarios_names returns the scenarios names in a sdm_area or input_sdm
object.
get_scenarios_data returns the data from scenarios in a sdm_area or
input_sdm object.
A input_sdm or a predictions object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
sdm_area input_sdm mean_validation_metrics
if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") i } # Example from a custom function to obtain the threshold that maximizes # the sensitivity plus specificity: MaxSeSp <- function(mod) { th <- caret::thresholder(mod, threshold = seq(0, 1, by = 0.001), final = TRUE, statistics = c("Sensitivity", "Specificity") ) th <- th$prob_threshold[which.max(th$Sensitivity + th$Specificity)] if (length(th) > 1) mean(th) else th } # Example from a custom function to obtain ensembles using the median instead of the mean: median_ensemble <- function(pred_mat) { apply(pred_mat, 1, median, na.rm = TRUE) }if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") i } # Example from a custom function to obtain the threshold that maximizes # the sensitivity plus specificity: MaxSeSp <- function(mod) { th <- caret::thresholder(mod, threshold = seq(0, 1, by = 0.001), final = TRUE, statistics = c("Sensitivity", "Specificity") ) th <- th$prob_threshold[which.max(th$Sensitivity + th$Specificity)] if (length(th) > 1) mean(th) else th } # Example from a custom function to obtain ensembles using the median instead of the mean: median_ensemble <- function(pred_mat) { apply(pred_mat, 1, median, na.rm = TRUE) }
This function is a wrapper to get records from GBIF using rgbif and return a
data.frame ready to be used in caretSDM.
GBIF_data(s, file = NULL, as_df = FALSE, ...)GBIF_data(s, file = NULL, as_df = FALSE, ...)
s |
|
file |
|
as_df |
Should the output be a |
... |
Arguments to pass on |
A data.frame with species occurrences data, or an occurrences object if
as_df = FALSE.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
https://www.gbif.org
## Select species names: # s <- c("Araucaria angustifolia", "Salminus brasiliensis") ## Run function: # oc <- GBIF_data(s)## Select species names: # s <- c("Araucaria angustifolia", "Salminus brasiliensis") ## Run function: # oc <- GBIF_data(s)
An ensembling method to group different GCMs into one SSP scenario
gcms_ensembles(i, gcms = NULL)gcms_ensembles(i, gcms = NULL)
i |
A |
gcms |
GCM codes in |
A input_sdm object with grouped GCMs.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
GBIF_data occurrences_sdm sdm_area input_sdm
get_predictor_names
if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm, variables_selected = c("bio1", "bio12") ) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") i # Ensemble GCMs: i <- gcms_ensembles(i, gcms = c("ca", "mi")) i }if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm, variables_selected = c("bio1", "bio12") ) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") i # Ensemble GCMs: i <- gcms_ensembles(i, gcms = c("ca", "mi")) i }
input_sdmThis function creates a new input_sdm object.
input_sdm(...) add_input_sdm(i1, i2)input_sdm(...) add_input_sdm(i1, i2)
... |
Data to be used in SDMs. Can be a |
i1 |
A |
i2 |
A |
If sdm_area is used, it can include predictors and scenarios. In this case,
input_sdm will detect and include as scenarios and predictors in the
input_sdm output. Objects can be included in any order, since the function will work by
detecting their classes.
The returned object is used throughout the whole workflow to apply functions.
A input_sdm object containing:
grid |
|
bbox |
Four corners for the bounding box (class |
cell_size |
|
epsg |
|
predictors |
|
Luiz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa)
is_class functions to check caretSDM data classes.This functions returns a boolean to check caretSDM object classes.
is_input_sdm(x) is_sdm_area(x) is_occurrences(x) is_models(x) is_predictions(x)is_input_sdm(x) is_sdm_area(x) is_occurrences(x) is_models(x) is_predictions(x)
x |
Object to be tested. |
Boolean.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) is_sdm_area(sa) is_input_sdm(sa)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) is_sdm_area(sa) is_input_sdm(sa)
Join cell_id data from sdm_area to a occurrences
join_area(occ, pred)join_area(occ, pred)
occ |
A |
pred |
A |
This function is key in this SDM workflow. It attaches cell_id values to occ, deletes
records outside pred and allows the use of pseudoabsences. This function also tests if
CRS from both occ and pred are equal, otherwise the CRS of pred is used to
convert occ.
A occurrences object with cell_id to each record.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
occurrences_sdm sdm_area input_sdm
pseudoabsences
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) |> join_area(sa)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) |> join_area(sa)
Apply multicollinearity calculation on predictors.
multicollinearity_sdm(pred, method = NULL, variables_selected = NULL, cumulative_proportion = 0.99, th = 0.5, ...) selected_variables(i)multicollinearity_sdm(pred, method = NULL, variables_selected = NULL, cumulative_proportion = 0.99, th = 0.5, ...) selected_variables(i)
pred |
A |
method |
Which method should be used to detect multicollinearity. Can be a |
variables_selected |
A vector with pre-selected variables names to filter variables. |
cumulative_proportion |
A |
th |
Threshold to be applied in VIF routine. See ?usdm::vifcor. |
... |
Further arguments to be passed to the applied method. |
i |
A |
multicollinearity_sdm is a wrapper function to run usdm::vifcor, usdm::vifstep or a pca
in caretSDM, but also provides a way to implement custom functions to reduce multicollinearity.
If user provides a custom function, it must have the arguments env_sf and occ_sf,
which will consist of two sfs. The first has the predictor values for the whole study
area, while the second has the presence records for the species. The function must return a
vector with selected variables.
A input_sdm or predictors object with VIF data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
vif_predictors pca_predictors get_predictor_names
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # VIF calculation: i <- multicollinearity_sdm(i, method = "vifcor", th = 0.5) i # Retrieve information about vif: vif_summary(i) selected_variables(i) # Example of custom function: custom_function <- function(env_sf, occ_sf) { env_df <- dplyr::select(sf::st_drop_geometry(env_sf), -"cell_id") correlations <- cor(env_df) col <- caret::findCorrelation(correlations, cutoff = 0.7) selected <- colnames(correlations)[-col] return(selected) }# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # VIF calculation: i <- multicollinearity_sdm(i, method = "vifcor", th = 0.5) i # Retrieve information about vif: vif_summary(i) selected_variables(i) # Example of custom function: custom_function <- function(env_sf, occ_sf) { env_df <- dplyr::select(sf::st_drop_geometry(env_sf), -"cell_id") correlations <- cor(env_df) col <- caret::findCorrelation(correlations, cutoff = 0.7) selected <- colnames(correlations)[-col] return(selected) }
A data.frame object with Araucaria angustifolia occurrence data obtained from GBIF and
filtered with Parana state sf.
occocc
## 'occ'
A data.frame with 420 rows and 3 columns (EPSG:6933):
Species name
Longitude in meters
Latitude in meters
<https://www.gbif.org>
This function creates and manage occurrences objects.
occurrences_sdm(occ, independent_test = NULL, p = 0.1, occ_crs = NULL, independent_test_crs = NULL, crs = NULL, ...) n_records(i) species_names(i) get_coords(i) get_occurrences(i) occurrences_as_df(i) add_occurrences(oc1, oc2)occurrences_sdm(occ, independent_test = NULL, p = 0.1, occ_crs = NULL, independent_test_crs = NULL, crs = NULL, ...) n_records(i) species_names(i) get_coords(i) get_occurrences(i) occurrences_as_df(i) add_occurrences(oc1, oc2)
occ |
A |
independent_test |
Boolean. If |
p |
Numeric. Fraction of data to be used as independent test. Standard is 0.1. |
occ_crs |
Numeric. CRS of |
independent_test_crs |
Numeric. CRS of |
crs |
Deprecated. Use occ_crs instead. |
... |
A vector with column names addressing the columns with species names, longitude and
latitude, respectively, in |
i |
|
oc1 |
A |
oc2 |
A |
occ must have three columns: species, decimalLongitude and decimalLatitude. When sf
it is only necessary a species column.
n_records return the number of presence records to each species.
species_names return the species names.
get_coords return a data.frame with coordinates of species records.
get_occurrences return a sf with coordinates of species records, species names and
cell_ids.
add_occurrences return a occurrences. This function sums two occurrences objects.
It can also sum a occurrences object with a data.frame object.
occurrences_as_df returns a data.frame with species names and coordinates.
A occurrences object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create occurrences: oc <- occurrences_sdm(occ, crs = 6933)# Create occurrences: oc <- occurrences_sdm(occ, crs = 6933)
A sf object with a polygon for the Paraná state in Brazil. This is a subset of the
brazilian map provided by official government agency (IBGE)
paranaparana
## 'parana'
A sf with 1 row and 5 columns:
State code
State's phone code
Name of the state
Abbreviation of the state's name
Geometry column of the sf
<https://www.ibge.gov.br/geociencias/cartas-e-mapas/bases-cartograficas-continuas/15759-brasil.html>
Transform predictors data into PCA-axes.
pca_predictors(i, cumulative_proportion = 0.99) pca_summary(i) get_pca_model(i)pca_predictors(i, cumulative_proportion = 0.99) pca_summary(i) get_pca_model(i)
i |
A |
cumulative_proportion |
A |
pca_predictors Transform predictors data into PCA-axes. If the user wants to use PCA-axes
as future scenarios, then scenarios should be added after the PCA transformation (see examples).
pca_summary Returns the summary of prcomp function. See ?stats::prcomp.
get_pca_model Returns the model built to calculate PCA-axes.
input_sdm object with variables from both predictors and scenarios
transformed in PCA-axes.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
vif_predictors sdm_area add_scenarios add_predictors
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # PCA transformation: i <- pca_predictors(i)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # PCA transformation: i <- pca_predictors(i)
Obtain the Partial Dependence Plots (PDP) to each variable.
pdp_sdm(i, spp = NULL, algo = NULL, variables_selected = NULL, mean.only = FALSE) get_pdp_sdm(i, spp = NULL, algo = NULL, variables_selected = NULL)pdp_sdm(i, spp = NULL, algo = NULL, variables_selected = NULL, mean.only = FALSE) get_pdp_sdm(i, spp = NULL, algo = NULL, variables_selected = NULL)
i |
A |
spp |
A |
algo |
A |
variables_selected |
A |
mean.only |
Boolean. Should only the mean curve be plotted or a curve to each run should be included? Standard is FALSE. |
A plot (for pdp_sdm) or a data.frame (for get_pdp_sdm) with PDP values.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 3) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) # PDP plots: pdp_sdm(i) get_pdp_sdm(i)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 3) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) # PDP plots: pdp_sdm(i) get_pdp_sdm(i)
This function creates different plots depending on the input.
plot_occurrences(i, spp_name = NULL, pa = TRUE, pa_id = 1, ...) plot_grid(i, ...) plot_predictors(i, variables_selected = NULL, ...) plot_scenarios(i, variables_selected = NULL, scenario = NULL, ...) plot_predictions(i, spp_name = NULL, scenario = NULL, id = NULL, ...) plot_ensembles( i, spp_name = NULL, scenario = NULL, id = NULL, ensemble_type = NULL, ... ) mapview_grid(i) mapview_occurrences(i, spp_name = NULL, pa = TRUE) mapview_predictors(i, variables_selected = NULL) mapview_scenarios(i, variables_selected = NULL, scenario = NULL) mapview_predictions(i, spp_name = NULL, scenario = NULL, id = NULL) mapview_ensembles( i, spp_name = NULL, scenario = NULL, id = NULL, ensemble_type = NULL ) plot_background(i, variables_selected = NULL, ...) plot_niche( i, spp_name = NULL, variables_selected = NULL, scenario = NULL, id = NULL, ensemble_type = NULL, raster = FALSE, ... )plot_occurrences(i, spp_name = NULL, pa = TRUE, pa_id = 1, ...) plot_grid(i, ...) plot_predictors(i, variables_selected = NULL, ...) plot_scenarios(i, variables_selected = NULL, scenario = NULL, ...) plot_predictions(i, spp_name = NULL, scenario = NULL, id = NULL, ...) plot_ensembles( i, spp_name = NULL, scenario = NULL, id = NULL, ensemble_type = NULL, ... ) mapview_grid(i) mapview_occurrences(i, spp_name = NULL, pa = TRUE) mapview_predictors(i, variables_selected = NULL) mapview_scenarios(i, variables_selected = NULL, scenario = NULL) mapview_predictions(i, spp_name = NULL, scenario = NULL, id = NULL) mapview_ensembles( i, spp_name = NULL, scenario = NULL, id = NULL, ensemble_type = NULL ) plot_background(i, variables_selected = NULL, ...) plot_niche( i, spp_name = NULL, variables_selected = NULL, scenario = NULL, id = NULL, ensemble_type = NULL, raster = FALSE, ... )
i |
Object to be plotted. Can be a |
spp_name |
A character with species to be plotted. If NULL, the first species is plotted. |
pa |
Boolean. Should pseudoabsences be plotted together? (not implemented yet.) |
pa_id |
The id of pseudoabsences to be plotted (only used when |
... |
Plotting arguments to pass to ggplot2 function. |
variables_selected |
A character vector with names of variables to be plotted. |
scenario |
description |
id |
The id of models to be plotted (only used when |
ensemble_type |
Character of the type of ensemble to be plotted. |
raster |
Should the niche be extrapolated to a raster covering all possibe values in the environmental space? |
We implemented a bestiary of plots to help visualizing the process and results. If you are not familiar with mapview, consider using it to better visualize maps.
The plot or mapview desired.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
This function projects SDM models to new scenarios
predict_sdm(m, scen = NULL, metric = "ROC", th = 0.9, tp = "prob", file = NULL, add.current = TRUE) get_predictions(i) add_predictions(p1, p2)predict_sdm(m, scen = NULL, metric = "ROC", th = 0.9, tp = "prob", file = NULL, add.current = TRUE) get_predictions(i) add_predictions(p1, p2)
m |
A |
scen |
A |
metric |
A character containing the metric in which the |
th |
Thresholds for metrics. Can be numeric or a function. |
tp |
Type of output to be retrieved. See details. |
file |
File to sabe predictions. |
add.current |
If current scenario is not available, predictors will be used as the current scenario. |
i |
A |
p1 |
A |
p2 |
A |
tp is a parameter to be passed on caret to retrieve either the probabilities of classes
(tp="prob") or the raw output (tp="raw"), which could vary depending on the algorithm used, but
usually would be on of the classes (factor vector with presences and pseudoabsences).
get_predictions returns the list of all predictions to all scenarios, all species,
all algorithms and all repetitions. Useful for those who wish to implement their own ensemble
methods.
scenarios_names returns the scenarios names in a sdm_area or input_sdm
object.
get_scenarios_data returns the data from scenarios in a sdm_area or
input_sdm object.
A input_sdm or a predictions object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
sdm_area input_sdm mean_validation_metrics
if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) i }if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) i }
Provides an automate way for the visualization of projections gain, loss, and stability between different scenarios.
prediction_change_sdm(i, scenario = NULL, ensemble_type = NULL, species = NULL, th = 0.5)prediction_change_sdm(i, scenario = NULL, ensemble_type = NULL, species = NULL, th = 0.5)
i |
A |
scenario |
Character. One of the scenarios that were projected. Can be ensembles as well. |
ensemble_type |
Character. Type of ensemble to be used. Standard is NULL, but will return the average. |
species |
Character. Species to be analyzed. Standard is NULL. |
th |
Numeric. Threshold to binarize the ensemble. |
A plot with comparison between current and other scenario.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm, variables_selected = c("bio1", "bio12") ) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") # Ensemble GCMs: i <- gcms_ensembles(i, gcms = c("ca", "mi")) i # Change Analysis prediction_change_sdm(i, scenario = "_ssp585_2090", ensemble_type = "mean_occ_prob") }if (interactive()) { # Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Include scenarios: sa <- add_scenarios(sa, scen) |> select_predictors(c("bio1", "bio12")) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm, variables_selected = c("bio1", "bio12") ) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Ensemble: i <- ensemble_sdm(i, method = "average") # Ensemble GCMs: i <- gcms_ensembles(i, gcms = c("ca", "mi")) i # Change Analysis prediction_change_sdm(i, scenario = "_ssp585_2090", ensemble_type = "mean_occ_prob") }
Print method for ensembles
## S3 method for class 'ensembles' print(x, ...)## S3 method for class 'ensembles' print(x, ...)
x |
ensembles object |
... |
passed to other methods |
Concatenate structured characters to showcase what is stored in the object.
Print method for input_sdm
## S3 method for class 'input_sdm' print(x, ...)## S3 method for class 'input_sdm' print(x, ...)
x |
input_sdm object |
... |
passed to other methods |
Concatenate structured characters to showcase what is stored in the object.
Print method for models
## S3 method for class 'models' print(x, ...)## S3 method for class 'models' print(x, ...)
x |
models object |
... |
passed to other methods |
Concatenate structured characters to showcase what is stored in the object.
Print method for occurrences
## S3 method for class 'occurrences' print(x, ...)## S3 method for class 'occurrences' print(x, ...)
x |
occurrences object |
... |
passed to other methods |
Concatenate structured characters to showcase what is stored in the object.
Print method for predictions
## S3 method for class 'predictions' print(x, ...)## S3 method for class 'predictions' print(x, ...)
x |
predictions object |
... |
passed to other methods |
Concatenate structured characters to showcase what is stored in the object.
This function obtains pseudoabsences given a set of predictors.
pseudoabsences(occ, pred = NULL, method = "random", n_set = 10, n_pa = NULL, variables_selected = NULL, th = 0, size = 1, size_crs = 4326, mcp = FALSE) n_pseudoabsences(i) pseudoabsence_method(i) pseudoabsence_data(i)pseudoabsences(occ, pred = NULL, method = "random", n_set = 10, n_pa = NULL, variables_selected = NULL, th = 0, size = 1, size_crs = 4326, mcp = FALSE) n_pseudoabsences(i) pseudoabsence_method(i) pseudoabsence_data(i)
occ |
A |
pred |
A |
method |
Method to create pseudoabsences. One of: "random", "bioclim", "mahal.dist" or "buffer_sdm". User can also provide a custom function (see details). |
n_set |
|
n_pa |
|
variables_selected |
A vector with variables names to be used while building pseudoabsences. Only used when method is not "random". |
th |
|
size |
|
size_crs |
|
mcp |
|
i |
A |
pseudoabsences is used in the SDM workflow to obtain pseudoabsences, a step necessary for
most of the algorithms to run. We implemented four methods: "random", which is
self-explanatory, "buffer_sdm", "mahal.dist" and "bioclim". The two last are
built with the idea that pseudoabsences should be environmentally different from presences. Thus,
we implemented two presence-only methods to infer the distribution of the species. "bioclim"
uses an envelope approach (bioclimatic envelope), while "mahal.dist" uses a distance
approach (mahalanobis distance). th parameter enters here as a threshold to binarize those
results. Pseudoabsences are retrieved outside the projected distribution of the species. If user
provides a custom function, it must have the arguments env_sf and occ_sf, which will
consist of two "sf"s. The first has the predictor values for the whole study area, while
the second has the presence records for the species. The function must return a vector with
cell_ids of the pseudoabsences, which is the first column of both objects. For buffer_sdm,
user needs to specifiy the size of the buffer compatible with buffer CRS.
n_pseudoabsences returns the number of pseudoabsences obtained per species.
pseudoabsence_method returns the method used to obtain pseudoabsences.
pseudoabsence_data returns a list of species names. Each species name will have a
lists with pseudoabsences data from class sf.
A occurrences_sdm or input_sdm object with pseudoabsence data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
link{input_sdm} background occurrences_sdm get_occurrences
get_predictors
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ[1:50, ], occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i0 <- pseudoabsences(i, method = "random") # Custom method example: buffer_pa_custom <- function(env_sf, occ_sf, buffer_dist = 3) { # Create buffer around occurrence points buffer <- sf::st_buffer(occ_sf, dist = buffer_dist) # Union buffers into a single geometry buffer_union <- sf::st_union(buffer) # Identify cells outside the buffer outside_buffer <- sf::st_difference(env_sf, buffer_union)[, 1] # Randomly extract cell_ids outside the buffer pa_ids_sample <- sample(outside_buffer$cell_id, nrow(occ_sf)) return(pa_ids_sample) } i1 <- pseudoabsences(i, method = buffer_pa_custom)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ[1:50, ], occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i0 <- pseudoabsences(i, method = "random") # Custom method example: buffer_pa_custom <- function(env_sf, occ_sf, buffer_dist = 3) { # Create buffer around occurrence points buffer <- sf::st_buffer(occ_sf, dist = buffer_dist) # Union buffers into a single geometry buffer_union <- sf::st_union(buffer) # Identify cells outside the buffer outside_buffer <- sf::st_difference(env_sf, buffer_union)[, 1] # Randomly extract cell_ids outside the buffer pa_ids_sample <- sample(outside_buffer$cell_id, nrow(occ_sf)) return(pa_ids_sample) } i1 <- pseudoabsences(i, method = buffer_pa_custom)
A sf LINESTRING object with hydrologic variables (LENGTH_KM and DIST_DN_KM) for the Paraná
state in Brazil. Data obtained from HydroSHEDS for river flows >= 10m3/s.
rivsrivs
## 'rivs'
A sf with 1031 attributes and 2 fiels:
Length of the river reach segment, in kilometers.
Distance from the reach outlet, i.e., the most downstream pixel of the reach, to the final downstream location along the river network, in kilometers. This downstream location is either the pour point into the ocean or an endorheic sink.
<https://www.hydrosheds.org/>
A data.frame object with Salminus brasiliensis occurrence data obtained from GBIF and
filtered with Parana state sf.
salmsalm
## 'salm'
A data.frame with 46 rows and 3 columns (EPSG:6933):
Species name
Longitude in meters
Latitude in meters
<https://www.gbif.org>
A stars object with bioclimatic variables (bio1, bio4 and bio12) and four future scenarios for
the Parana state in Brazil. Data from MIROC6 GCM from WorldClim 2.1 at 10 arc-min resolution.
scenscen
## 'scen'
A stars with 4 attribute and 3 bands:
Intermediate scenario for the year 2090 and GCM CanESM5
Extreme scenario for the year 2090 and GCM CanESM5
Intermediate scenario for the year 2090 and GCM MIROC6
Extreme scenario for the year 2090 and GCM MIROC6
Annual Mean Temperature
Temperature Seasonality
Annual Precipitation
<https://www.worldclim.org/>
A stars object with bioclimatic variables (bio1, bio4 and bio12) and four future scenarios for
the Rio Grande do Sul state in Brazil. Data from MIROC6 GCM from WorldClim 2.1 at 10 arc-min
resolution.
scen_rsscen_rs
## 'scen_rs'
A stars with 5 attribute and 3 bands:
Current scenario with the average values for the years 1970-2000
Intermediate scenario for the year 2090 and GCM CanESM5
Extreme scenario for the year 2090 and GCM CanESM5
Intermediate scenario for the year 2090 and GCM MIROC6
Extreme scenario for the year 2090 and GCM MIROC6
Annual Mean Temperature
Temperature Seasonality
Annual Precipitation
<https://www.worldclim.org/>
sdm_area objectThis function creates a new sdm_area object.
sdm_area(x, cell_size = NULL, output_crs = NULL, variables_selected = NULL, gdal = TRUE, crop_by = NULL, lines_as_sdm_area = FALSE, crs = NULL) get_sdm_area(i) add_sdm_area(sa1, sa2)sdm_area(x, cell_size = NULL, output_crs = NULL, variables_selected = NULL, gdal = TRUE, crop_by = NULL, lines_as_sdm_area = FALSE, crs = NULL) get_sdm_area(i) add_sdm_area(sa1, sa2)
x |
A shape or a raster. Usually a shape from |
cell_size |
|
output_crs |
|
variables_selected |
A |
gdal |
Boolean. Force the use or not of GDAL when available. See details. |
crop_by |
A shape from |
lines_as_sdm_area |
Boolean. If |
crs |
Deprecated. Use output_crs instead. |
i |
A |
sa1 |
A |
sa2 |
A |
The function returns a sdm_area object with a grid built upon the x parameter.
There are two ways to make the grid and resample the variables in sdm_area: with and
without gdal. As standard, if gdal is available in you machine it will be used
(gdal = TRUE),
otherwise sf/stars will be used.
get_sdm_area will return the grid built by sdm_area.
add_sdm_area will sum two sdm_area objects. As geoprocessing in caretSDM is
performed using sf objects, add_sdm_area simply applies a rbind in the two
different areas.
A sdm_area object containing:
grid |
|
cell_size |
|
Luíz Fernando Esser ([email protected]) and Reginaldo Ré. https://luizfesser.wordpress.com
WorldClim_data parana input_sdm, add_predictors
# Create sdm_area object: sa_area <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Create sdm_area using a subset of rivs (lines): sa_rivers <- sdm_area(rivs[c(1:100), ], cell_size = 100000, output_crs = 6933, lines_as_sdm_area = TRUE)# Create sdm_area object: sa_area <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Create sdm_area using a subset of rivs (lines): sa_rivers <- sdm_area(rivs[c(1:100), ], cell_size = 100000, output_crs = 6933, lines_as_sdm_area = TRUE)
sdm_as_X functions to transform caretSDM data into other classes.This functions transform data from a caretSDM object to be used in other packages.
sdm_as_stars(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL) sdm_as_raster(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL) sdm_as_terra(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL)sdm_as_stars(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL) sdm_as_raster(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL) sdm_as_terra(x, what = NULL, spp = NULL, scen = NULL, id = NULL, ens = NULL)
x |
A |
what |
Sometimes multiple data inside |
spp |
|
scen |
|
id |
|
ens |
|
The output is the desired class.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
if (interactive()) { # Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Transform in stars: sdm_as_stars(i) }if (interactive()) { # Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Predict models: i <- predict_sdm(i, th = 0.8) # Transform in stars: sdm_as_stars(i) }
Set of functions to facilitate the use of caretSDM through tidyverse grammatics.
select_predictors(x, ...) ## S3 method for class 'sdm_area' select(.data, ...) ## S3 method for class 'input_sdm' select(.data, ...) ## S3 method for class 'sdm_area' mutate(.data, ...) ## S3 method for class 'input_sdm' mutate(.data, ...) ## S3 method for class 'sdm_area' filter(.data, ..., .by, .preserve) ## S3 method for class 'input_sdm' filter(.data, ..., .by, .preserve) ## S3 method for class 'occurrences' filter(.data, ..., .by, .preserve) filter_species(x, spp = NULL, ...)select_predictors(x, ...) ## S3 method for class 'sdm_area' select(.data, ...) ## S3 method for class 'input_sdm' select(.data, ...) ## S3 method for class 'sdm_area' mutate(.data, ...) ## S3 method for class 'input_sdm' mutate(.data, ...) ## S3 method for class 'sdm_area' filter(.data, ..., .by, .preserve) ## S3 method for class 'input_sdm' filter(.data, ..., .by, .preserve) ## S3 method for class 'occurrences' filter(.data, ..., .by, .preserve) filter_species(x, spp = NULL, ...)
x |
|
... |
|
.data |
Data to pass to tidyr function. |
.by |
See ?dplyr::filter. |
.preserve |
See ?dplyr::filter. |
spp |
Species to be filtered. |
The transformed sdm_area/input_sdm object.
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12"))# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12"))
This function manage predictors names in sdm_area objects.
get_predictor_names(x) ## S3 method for class 'input_sdm' set_predictor_names(x, new_names) ## S3 method for class 'sdm_area' set_predictor_names(x, new_names) get_predictor_names(x) test_variables_names(sa, scen) set_variables_names(s1 = NULL, s2 = NULL, new_names = NULL)get_predictor_names(x) ## S3 method for class 'input_sdm' set_predictor_names(x, new_names) ## S3 method for class 'sdm_area' set_predictor_names(x, new_names) get_predictor_names(x) test_variables_names(sa, scen) set_variables_names(s1 = NULL, s2 = NULL, new_names = NULL)
x |
A |
new_names |
A |
sa |
A |
scen |
A |
s1 |
A |
s2 |
A |
This functions is available so users can modify predictors names to better represent them. Use
carefully to avoid giving wrong names to the predictors. Useful to make sure the predictors names
are equal the names in scenarios.
test_variables_names Tests if variables in a stars object (scen argument)
matches the given sdm_area object (sa argument).
set_variables_names will set s1 object variables names as the s2 object
variables names OR assign new names to it.
get_predictor_names returns a character vector with predictors names.
test_variables_names returns a logical informing if all variables are equal in both
objects (TRUE) or not (FALSE).
set_variables_names returns the s1 object with new names provided by s2 or
new_names.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Check predictors' names: get_predictor_names(sa)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 50000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) # Check predictors' names: get_predictor_names(sa)
This function builds a meta-model (Layer 2) using the out-of-fold predictions from models trained in Layer 1.
stack_sdm(m, meta_algo = "glm", ctrl = NULL, ...)stack_sdm(m, meta_algo = "glm", ctrl = NULL, ...)
m |
A |
meta_algo |
A character string specifying the algorithm for the meta-learner. |
ctrl |
A |
... |
Additional arguments passed to |
A stacked_models object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
input_sdm sdm_area algorithms train_sdm
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl(method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all") # Train models: i <- train_sdm(i, algo = c("naive_bayes", "kknn"), ctrl = ctrl_sdm) |> suppressWarnings() # Train stacked ensemble: i <- stack_sdm(i, meta_algo = "nnet", ctrl = ctrl_sdm)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl(method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all") # Train models: i <- train_sdm(i, algo = c("naive_bayes", "kknn"), ctrl = ctrl_sdm) |> suppressWarnings() # Train stacked ensemble: i <- stack_sdm(i, meta_algo = "nnet", ctrl = ctrl_sdm)
This function is used in caret::trainControl(summaryFunction=summary_sdm) to calculate
performance metrics across resamples.
summary_sdm(data, lev = NULL, model = NULL, custom_fun=NULL) summary_sdm_presence_only(data, lev, threshold) validate_on_independent_data(model, data_independent, obs_col_name)summary_sdm(data, lev = NULL, model = NULL, custom_fun=NULL) summary_sdm_presence_only(data, lev, threshold) validate_on_independent_data(model, data_independent, obs_col_name)
data |
A |
lev |
A |
model |
Models names taken from |
custom_fun |
A custom function to be applied in models (not yet implemented). |
threshold |
Threshold for presence-only models. |
data_independent |
independent data.frame to calculate metrics. |
obs_col_name |
The name of the column with observed values. |
See ?caret::defaultSummary for more details and options to pass on
caret::trainControl.
A input_sdm or a predictions object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl(method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all") # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings()# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl(method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all") # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings()
This function is a wrapper to fit models in caret using caretSDM data.
train_sdm(occ, pred = NULL, algo, ctrl = NULL, variables_selected = NULL, parallel = FALSE, ...) get_tune_length(i) algorithms_used(i) get_models(i) get_validation_metrics(i) mean_validation_metrics(i) models_hyperparameters(i) add_models(m1, m2)train_sdm(occ, pred = NULL, algo, ctrl = NULL, variables_selected = NULL, parallel = FALSE, ...) get_tune_length(i) algorithms_used(i) get_models(i) get_validation_metrics(i) mean_validation_metrics(i) models_hyperparameters(i) add_models(m1, m2)
occ |
A |
pred |
A |
algo |
A |
ctrl |
A |
variables_selected |
A |
parallel |
Should a paralelization method be used (not yet implemented)? |
... |
Additional arguments to be passed to |
i |
A |
m1 |
A |
m2 |
A |
The object algorithms has a table comparing algorithms available. If the function
detects that the necessary packages are not available it will ask for installation. This will
happen just in the first time you use the algorithm.
caret::trainControl holds multiple resources for validation and model tuning. Make sure
to understand its parameters beforehand. As it is a key function in the modeling process, we also
implemented spatial crossvalidation on it. You can set methods to be cv_spatial or
cv_cluster and train_sdm will detect that and apply the method according to
blockCV package.
get_tune_length return the length used in grid-search for tunning.
algorithms_used return the names of the algorithms used in the modeling process.
get_models returns a list with trained models (class train) to each species.
get_validation_metrics return a list with a data.frame to each species
with complete values for ROC, Sensitivity, Specificity, with their respectives Standard
Deviations (SD) and TSS to each of the algorithms and pseudoabsence datasets used.
mean_validation_metrics return a list with a tibble to each species
summarizing values for ROC, Sensitivity, Specificity and TSS to each of the algorithms used.
models_hyperparameters returns the hyperparameters that returned the best tuning to each
model to each species.
A models or a input_sdm object.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings()# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings()
This function calculates tSNE with presences and pseudoabsences data and returns a list of plots.
tsne_sdm(occ, pred = NULL, variables_selected = NULL)tsne_sdm(occ, pred = NULL, variables_selected = NULL)
occ |
A |
pred |
A |
variables_selected |
Variable to be used in t-SNE. It can also be 'vif', if previously calculated. |
A list of plots, where each plot is a tSNE for a given pseudoabsence dataset.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
This function aims to retrieve the tune grid used to build models.
tuneGrid_sdm(i)tuneGrid_sdm(i)
i |
A |
A list with data.frames each one representing the table of a given model.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Retrieve tuneGrid from model: tuneGrid_sdm(i)# Create sdm_area object: set.seed(1) sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random", n_set = 2) # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "boot", number = 1, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Retrieve tuneGrid from model: tuneGrid_sdm(i)
This functions set parameters to run a ESM when running train_sdm.
use_esm(i, spp = NULL, n_records = 20)use_esm(i, spp = NULL, n_records = 20)
i |
A |
spp |
A vector of species names containing the species which the ESM must be applied. Standard is NULL. |
n_records |
Numeric. Number of species records to apply the ESM. Standard is 20. |
We supply two different ways to apply the ESM. If species names are provided, then ESM will be
applied only in given species. If a number of species records is provided, then ESM will be
applied in every species with number of records bellow the given threshold. As standard,
use_esm will be apply to every species with less then 20 records.
A input_sdm or occurrences object with ESM parameters.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Use MEM: i <- use_esm(i, n_records = 999)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Use MEM: i <- use_esm(i, n_records = 999)
This function sums all species records into one. Should be used before the data cleaning routine.
use_mem(i, add = TRUE, name = "MEM")use_mem(i, add = TRUE, name = "MEM")
i |
A |
add |
Logical. Should the new MEM records be added to the pool ( |
name |
How should the new records be named? Standard is "MEM". |
A input_sdm or occurrences object with MEM data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Use MEM: i <- use_mem(i)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Use MEM: i <- use_mem(i)
This function retrieves variable importance as a function of ROC curves to each predictor.
varImp_sdm(m, id = NULL, ...)varImp_sdm(m, id = NULL, ...)
m |
A |
id |
Vector of model ids to filter varImp calculation. |
... |
Parameters passing to caret::varImp(). |
A data.frame with variable importance data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Variable importance: varImp_sdm(i)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 100000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12")) # Include scenarios: sa <- add_scenarios(sa) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # Pseudoabsence generation: i <- pseudoabsences(i, method = "random") # Custom trainControl: ctrl_sdm <- caret::trainControl( method = "repeatedcv", number = 2, repeats = 1, classProbs = TRUE, returnResamp = "all", summaryFunction = summary_sdm, savePredictions = "all" ) # Train models: i <- train_sdm(i, algo = c("naive_bayes"), ctrl = ctrl_sdm) |> suppressWarnings() # Variable importance: varImp_sdm(i)
Apply Variance Inflation Factor (VIF) calculation.
vif_predictors(pred, area = "all", th = 0.5, maxobservations = 5000, variables_selected = NULL) vif_summary(i)vif_predictors(pred, area = "all", th = 0.5, maxobservations = 5000, variables_selected = NULL) vif_summary(i)
pred |
A |
area |
Character. Which area should be used in vif selection? Standard is |
th |
Threshold to be applied in VIF routine. See ?usdm::vifcor. |
maxobservations |
Max observations to use to calculate the VIF. |
variables_selected |
If there is a subset of predictors that should be used in this
function, it can be informed using this parameter. If set to |
i |
A |
vif_predictors is a wrapper function to run usdm::vifcor in caretSDM.
A input_sdm or predictors object with VIF data.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com
# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # VIF calculation: i <- vif_predictors(i) i # Retrieve information about vif: vif_summary(i) selected_variables(i)# Create sdm_area object: sa <- sdm_area(parana, cell_size = 25000, output_crs = 6933) # Include predictors: sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio4", "bio12")) # Include scenarios: sa <- add_scenarios(sa, scen) # Create occurrences: oc <- occurrences_sdm(occ, occ_crs = 6933) # Create input_sdm: i <- input_sdm(oc, sa) # VIF calculation: i <- vif_predictors(i) i # Retrieve information about vif: vif_summary(i) selected_variables(i)
This function allows to download data from WorldClim v.2.1 (https://www.worldclim.org/data/index.html) considering multiple GCMs, time periods and SSPs.
WorldClim_data(path = NULL, period = "current", variable = "bioc", year = "2090", gcm = "mi", ssp = "585", resolution = 10)WorldClim_data(path = NULL, period = "current", variable = "bioc", year = "2090", gcm = "mi", ssp = "585", resolution = 10)
path |
Directory path to save downloads. |
period |
Can be "current" or "future". |
variable |
Allows to specify which variables you want to retrieve Possible entries are: "tmax","tmin","prec" and/or "bioc". |
year |
Specify the year you want to retrieve data. Possible entries are: "2030", "2050", "2070" and/or "2090". You can use a vector to provide more than one entry. |
gcm |
GCMs to be considered in future scenarios. You can use a vector to provide more than one entry. |
ssp |
SSPs for future data. Possible entries are: "126", "245", "370" and/or "585". You can use a vector to provide more than one entry. |
resolution |
You can select one resolution from the following alternatives: 10, 5, 2.5 OR 30. |
This function will create a folder. All the data downloaded will be stored in this folder. Note that, despite being possible to retrieve a lot of data at once, it is not recommended to do so, since the data is very heavy.
If data is not downloaded, the function downloads the data and has no return value.
Luíz Fernando Esser ([email protected]) [https://luizfesser.wordpress.com](https://luizfesser.wordpress.com)
[https://www.worldclim.org/data/index.html](https://www.worldclim.org/data/index.html)
## download data from multiple periods: # year <- c("2050", "2090") # WorldClim_data(path = "", # period = "future", # variable = "bioc", # year = year, # gcm = "mi", # ssp = "126", # resolution = 10) ## download data from one specific period # WorldClim_data(path = "", # period = "future", # variable = "bioc", # year = "2070", # gcm = "mi", # ssp = "585", # resolution = 10)## download data from multiple periods: # year <- c("2050", "2090") # WorldClim_data(path = "", # period = "future", # variable = "bioc", # year = year, # gcm = "mi", # ssp = "126", # resolution = 10) ## download data from one specific period # WorldClim_data(path = "", # period = "future", # variable = "bioc", # year = "2070", # gcm = "mi", # ssp = "585", # resolution = 10)
This function exports caretSDM data.
write_ensembles(x, path = NULL, ext = ".tif", centroid = FALSE) write_predictions(x, path = NULL, ext = ".tif", centroid = FALSE) write_predictors(x, path = NULL, ext = ".tif", centroid = FALSE) write_models(x, path = NULL) write_gpkg(x, file_path, file_name) ## S3 method for class 'sdm_area' write_gpkg(x, file_path, file_name) write_occurrences(x, path = NULL, grid = FALSE, ...) write_pseudoabsences(x, path = NULL, ext = ".csv", centroid = FALSE) write_background(x, path = NULL, ext = ".csv", centroid = FALSE) write_grid(x, path = NULL, centroid = FALSE) write_validation_metrics(x, path = NULL)write_ensembles(x, path = NULL, ext = ".tif", centroid = FALSE) write_predictions(x, path = NULL, ext = ".tif", centroid = FALSE) write_predictors(x, path = NULL, ext = ".tif", centroid = FALSE) write_models(x, path = NULL) write_gpkg(x, file_path, file_name) ## S3 method for class 'sdm_area' write_gpkg(x, file_path, file_name) write_occurrences(x, path = NULL, grid = FALSE, ...) write_pseudoabsences(x, path = NULL, ext = ".csv", centroid = FALSE) write_background(x, path = NULL, ext = ".csv", centroid = FALSE) write_grid(x, path = NULL, centroid = FALSE) write_validation_metrics(x, path = NULL)
x |
Object to be written. Can be of class |
path |
A path with filename and the proper extension (see details) or the directory to save files in. |
ext |
How it should be saved? |
centroid |
Should coordinates for the centroids of each cell be included? Standard is FALSE. |
file_path |
A path to save the |
file_name |
The name of the |
grid |
Boolean. Return a grid. |
... |
Arguments to pass to |
ext can be set accordingly to the desired output. Possible values are .tif and .asc for
rasters, .csv for for a spreadsheet, but also one of: c("bna", "csv", "e00", "gdb", "geojson",
"gml", "gmt", "gpkg", "gps", "gtm", "gxt", "jml", "map", "mdb", "nc", "ods", "osm", "pbf", "shp",
"sqlite", "vdv", "xls", "xlsx").
path ideally should only provide the folder. We recommend using:
results/what_are_you_writting. So for writting ensembles users are advised to run:
path = "results/ensembles"
No return value, called for side effects.
Luíz Fernando Esser ([email protected]) https://luizfesser.wordpress.com