Into the palaeoverse

A community-driven R package

Development team

Funders

  • European Union’s Horizon 2020 research and innovation program (MAPAS project)
    • Grant number: 947921
  • The Royal Society
    • Grant numbers: RF_ERE_210013, RGF_R1_180020, and RGF_EA_180318
  • Juan de la Cierva-formación fellowship
    • FJC2020-044836-I / MCIN /AEI / 10.13039 /501100011033
  • ETH+ Research Grant (BECCY project)
  • FAPESP Postdoctoral fellowship
    • Grant number: 2022/05697-9
  • Population Biology Program of Excellence Postdoctoral Fellowship (the University of Nebraska-Lincoln)
  • Lerner-Gray Postdoctoral Research Fellowship (American Museum of Natural History)

Introduction

The long and the short of it 📏

What is Palaeoverse?



Palaeoverse is a project that aims to bring the palaeobiology community together.

What is the palaeoverse R package?

palaeoverse provides auxiliary functions to support data preparation and exploration.

Improve code readability, reusability and reproducibility.

What makes palaeoverse different?

What makes palaeoverse different?

  • Community-informed development
    • Authors (n = 13)
    • Survey participants (n = 35)
  • Well-documented & peer-reviewed code
    • Formal review process

Functionality

A whistle-stop tour of palaeoverse 🚋

What’s available?

  • axis_geo
  • bin_lat
  • bin_time
  • data
  • group_apply
  • lat_bins
  • look_up
  • palaeorotate
  • phylo_check
  • tax_check
  • tax_expand_lat
  • tax_expand_time
  • tax_range_space
  • tax_range_time
  • tax_unique
  • time_bins

Expected input

A lot of data, a lot of sources, and a lot of unique features.






Data structure, not source.

occdf \(\rightarrow\) function(x) \(\rightarrow\) df

Occurrence dataframe*

Getting started

Let’s dive in 🤿…

Installation

palaeoverse can be installed from the CRAN using:

install.packages("palaeoverse")


The development version can be installed using devtools:

devtools::install_github("palaeoverse-community/palaeoverse")


Once installed, load the package in the usual manner:

library(palaeoverse)

Example datasets

Two example occurrence datasets are available.

Carboniferous–Early Triassic tetrapods (n = 5270, Paleobiology Database).

Code
# Get details on dataset
?tetrapods
# Load dataset
data("tetrapods")
# Available variables
colnames(tetrapods)
##  [1] "occurrence_no"     "collection_no"     "identified_name"  
##  [4] "identified_rank"   "accepted_name"     "accepted_rank"    
##  [7] "early_interval"    "late_interval"     "max_ma"           
## [10] "min_ma"            "phylum"            "class"            
## [13] "order"             "family"            "genus"            
## [16] "abund_value"       "abund_unit"        "lng"              
## [19] "lat"               "collection_name"   "cc"               
## [22] "formation"         "stratgroup"        "member"           
## [25] "zone"              "lithology1"        "environment"      
## [28] "pres_mode"         "taxon_environment" "motility"         
## [31] "life_habit"        "diet"

Phanerozoic reef occurrences (n = 4363, PaleoReefs Database).

Code
# Get details on dataset
?reefs
# Load dataset
data("reefs")
# Available variables
colnames(reefs)
##  [1] "r_number"   "name"       "formation"  "system"     "series"    
##  [6] "interval"   "biota_main" "biota_sec"  "lng"        "lat"       
## [11] "country"    "authors"    "title"      "year"

Reference datasets

Two reference datasets are available.

Geological Time Scale 2012 & 2020 (Gradstein et al. 2012; 2020).

# Get details on dataset
?GTS2012
?GTS2020
# Load dataset
data("GTS2012")
data("GTS2020")
# Increase output width
options(width = 120)
# Print first few rows
head(GTS2012, n = 3)
##   interval_number      interval_name  rank max_ma mid_ma min_ma duration_myr  font  colour abbr
## 1               1           Holocene stage 0.0117 0.0059 0.0000       0.0117 black #FDEDEC <NA>
## 2               2  Upper Pleistocene stage 0.1260 0.0688 0.0117       0.1143 black #FFF2D3 <NA>
## 3               3 Middle Pleistocene stage 0.7810 0.4535 0.1260       0.6550 black #FFF2C7 <NA>
head(GTS2020, n = 3)
##   interval_number interval_name  rank max_ma  mid_ma min_ma duration_myr  font  colour abbr
## 1               1    Meghalayan stage 0.0042 0.00210 0.0000       0.0042 black #FDEDEC <NA>
## 2               2 Northgrippian stage 0.0082 0.00620 0.0042       0.0040 black #FDECE4 <NA>
## 3               3  Greenlandian stage 0.0117 0.00995 0.0082       0.0035 black #FEECDB <NA>

Stratigraphic time bins

# Get stage-level time bins
bins <- time_bins(interval = "Phanerozoic", rank = "stage", plot = TRUE)
# Get first few rows
head(bins, n = 3)
##   bin interval_name  rank max_ma mid_ma min_ma duration_myr abbr  colour  font
## 1   1     Fortunian stage    541  535.0    529           12   Fo #99B575 black
## 2   2       Stage 2 stage    529  525.0    521            8   S2 #A6BA80 black
## 3   3       Stage 3 stage    521  517.5    514            7   S3 #A6C583 black

Macrostrat time bins

# Get North American Land Mammal Ages
bins <- time_bins(scale = "North American Land Mammal Ages", plot = TRUE)
# Get first few rows
head(bins, n = 3)
##   bin interval_name                            rank max_ma mid_ma min_ma duration_myr abbr  colour  font
## 1   1       Puercan North American Land Mammal Ages  66.00 65.375  64.75         1.25    P #FDB469 black
## 2   2   Torrejonian North American Land Mammal Ages  64.75 63.500  62.25         2.50   To #FEBA64 black
## 3   3     Tiffanian North American Land Mammal Ages  62.25 59.875  57.50         4.75   Ti #FEBF6A black

Near-equal-length time bins

# Get stage-level time bins
bins <- time_bins(interval = "Phanerozoic", rank = "stage", size = 15, plot = TRUE)
# Get first few rows
head(bins, n = 3)
##   bin max_ma mid_ma min_ma duration_myr grouping_rank                 intervals  colour  font
## 1   1    541 535.00  529.0         12.0         stage                 Fortunian #80cdc1 black
## 2   2    529 521.50  514.0         15.0         stage          Stage 3, Stage 2 #80cdc1 black
## 3   3    514 507.25  500.5         13.5         stage Drumian, Wuliuan, Stage 4 #80cdc1 black

Temporal occurrence binning

Five temporal binning methods for age range data:

# Use tetrapod example data
occdf <- tetrapods

# Get stage-level time bins
bins <- time_bins(interval = "Phanerozoic", rank = "stage")

# Assign via midpoint age of fossil occurrence data
ex1 <- bin_time(occdf = occdf, bins = bins, method = "mid")

# Assign to all bins that age range covers
ex2 <- bin_time(occdf = occdf, bins = bins, method = "all")

# Assign via majority overlap based on fossil occurrence age range
ex3 <- bin_time(occdf = occdf, bins = bins, method = "majority")

# Randomly assign to overlapping bins based on fossil occurrence age range
ex4 <- bin_time(occdf = occdf, bins = bins, method = "random", reps = 10)

# Randomly assign point estimates (e.g. uniform distribution) based on fossil occurrence age range
ex5 <- bin_time(occdf = occdf, bins = bins, method = "point", reps = 10)

Latitudinal occurrence binning

Generate and bin latitudinal data:

# Generate latitudinal bins
bins <- lat_bins(size = 10, plot = TRUE)
# Use reef example data
occdf <- reefs
# Bin occurrences
occdf <- bin_lat(occdf = occdf, bins = bins, lat = "lat")

Spatial occurrence binning

Generate and bin spatial data:

# Get reef data
occdf <- reefs[1:500, ]
# Bin data using a hexagonal equal-area grid
occdf <- bin_space(occdf = occdf, spacing = 250, return = TRUE)
# Plot world and grid using ggplot2
library(ggplot2)
library(rnaturalearth)
world <- ne_countries(scale = "small",returnclass = "sf")
ggplot() +
  geom_sf(data = world, colour = "black", fill = "lightgrey") + 
  geom_sf(data = occdf$grid, fill = "orange", colour = "black") + 
  theme_void()