Deployment Summary Tables

Understanding Our Source Data

Now that we’ve setup a data structure for efficient access, imported the source data into R, and converted that data into a spatial data set, it’s time to explore and see what we have to work with. This is an important step so you can recognize any problems with the data or inconsistencies that need to be further investigated.

Summary tables are a good way of splitting large data into components of interest and learning how our data might be distributed across those components. One example might be a simple calculation of the number of location observations within each month across the individual animals. This might identify anomalies such as locations in months prior to deployment or missing data when it was expected.

We’ll first want to group our location records by deployment and month.

library(sf)
library(lubridate)
library(dplyr)

dat <- akbs_locs %>% 
  sf::st_drop_geometry() %>% 
  mutate(month = lubridate::month(date)) %>% 
  group_by(deploy_id, month) %>% 
  count(name = "num_locs")

To create a sensible table, let’s just focus on a single deploy_id, EB2009_3000_06A1346

library(knitr)

dat %>% 
  dplyr::filter(deploy_id == "EB2009_3000_06A1346") %>% 
  dplyr::arrange(month) %>% 
  knitr::kable()

deploy_id	month	num_locs
EB2009_3000_06A1346	1	307
EB2009_3000_06A1346	2	446
EB2009_3000_06A1346	3	567
EB2009_3000_06A1346	4	153
EB2009_3000_06A1346	6	183
EB2009_3000_06A1346	7	778
EB2009_3000_06A1346	8	1007
EB2009_3000_06A1346	9	764
EB2009_3000_06A1346	10	748
EB2009_3000_06A1346	11	753
EB2009_3000_06A1346	12	476

One oddity that immediately stands out is the lack of locations in May. This, however, is to be expected as these deployments started in June and stopped transmitting in April of the following year and matches expectations for battery life.

If we look at another deployment, EB2009_3001_06A1332, we can see that this deployment ended in March.

dat %>% 
  dplyr::filter(deploy_id == "EB2009_3001_06A1332") %>% 
  dplyr::arrange(month) %>% 
  knitr::kable()

deploy_id	month	num_locs
EB2009_3001_06A1332	1	476
EB2009_3001_06A1332	2	458
EB2009_3001_06A1332	3	63
EB2009_3001_06A1332	6	119
EB2009_3001_06A1332	7	735
EB2009_3001_06A1332	8	641
EB2009_3001_06A1332	9	792
EB2009_3001_06A1332	10	789
EB2009_3001_06A1332	11	581
EB2009_3001_06A1332	12	677

Lastly, deployment EB2011_3001_10A0552 appears to have stopped transmitting in January of the following year which is 3-4 months earlier than any of the other devices.

dat %>% 
  dplyr::filter(deploy_id == "EB2011_3001_10A0552") %>% 
  dplyr::arrange(month) %>% 
  knitr::kable()

deploy_id	month	num_locs
EB2011_3001_10A0552	1	675
EB2011_3001_10A0552	6	482
EB2011_3001_10A0552	7	993
EB2011_3001_10A0552	8	1062
EB2011_3001_10A0552	9	1134
EB2011_3001_10A0552	10	904
EB2011_3001_10A0552	11	896
EB2011_3001_10A0552	12	1019

This is not so unusual, but such an anomaly is worth investigating further to ensure there were no issues with the data processing or other important deployment metadata.