14 Remote Data Extraction

Up to this point, we’ve been using locally downloaded datasets derived from the course datasets folder. We downloaded these files in bulk or individually like so…

download.file("https://github.com/jsimkins2/geog473-673/tree/master/datasets/TreeData.csv", destfile = "/Users/james/Downloads/TreeData.csv" , mode='wb')

While Github is excellent for code, it’s not a cloud service for datasets. THREDDS and ERDDAP are the future of environmental data repositories.

14.1 THREDDS

THREDDS (Thematic Realtime Environmental Distributed Data Services) is an efficient way to extract specific areas or time periods of a dataset. For example, if you’re studying 2000-2020 water temperatures of the Delaware Bay, you don’t necessarily want a water temperature dataset covering the Atlantic Ocean from 1960-2020. It’s a waste of time to have to download, store, and process all of that data just to sub-select the Delaware Bay from 2000-2020. THREDDS makes it possible to download your desired subset from the get-go, saving you time and hard-drive space.

Here are some NASA/NOAA/UD THREDDS servers:

  1. https://thredds.jpl.nasa.gov/thredds/catalog.html - NASA Jet Propulsion Labratory
  2. https://thredds.daac.ornl.gov/thredds/catalog.html - Oak Ridge National Lab
  3. https://pae-paha.pacioos.hawaii.edu/thredds/catalog.html - Pacific Islands Ocean Observing System
  4. http://thredds.demac.udel.edu/thredds/catalog.html - UDEL DEMAC
  5. http://basin.ceoe.udel.edu/thredds/catalog.html - UDEL Satellite Receiving Station
  6. http://www.smast.umassd.edu:8080/thredds/catalog.html - UMASS Thredds

If you have a dataset or type of data you’re interested in, google search it with thredds or thredds server after it.

Today we’ll use UD’s Satellite Receiving Station THREDDS (#5 on the list). It’s located at this URL - http://basin.ceoe.udel.edu/thredds/catalog.html

Here’s what that looks like:

If we click on JPL SST, we see we have some different avenues for data extraction.

OPeNDAP (Open-source Project for a Network Data Access Protocol) is a great way to subselect the data. Opendap offers html files of the data (BAD IDEA, THIS WILL CRASH YOUR BROWSER) or netCDF files of the data (great idea)

Now you can use this page to download subset datasets, or we can make this really easy and use R to accomplish that task. This is a high temporal resolution dataset, so let’s say we want Delaware Bay data from July 14 - July 16, 2019. All we need to make this happen is the url of the opendap page - http://basin.ceoe.udel.edu/thredds/dodsC/GOESJPLSST.nc.html - and the ncdf4 package.

14.1.1 Opening a Channel to the Database

library(ncdf4)
goes.nc = nc_open("http://basin.ceoe.udel.edu/thredds/dodsC/GOESJPLSST.nc")
goes.nc
## File http://basin.ceoe.udel.edu/thredds/dodsC/GOESJPLSST.nc (NC_FORMAT_CLASSIC):
## 
##      8 variables (excluding dimension variables):
##         float projection[]   
##             proj4_string: +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
##             epsg_code: 4326
##         float sea_surface_temperature[longitude,latitude,time]   
##             long_name: sea surface sub-skin temperature
##             standard_name: sea_surface_subskin_temperature
##             comment: SST obtained by regression with buoy measurements, sensitive to skin SST. Further information at (Petrenko et al., JGR, 2014; doi:10.1002/2013JD020637)
##             units: kelvin
##         float quality_level[longitude,latitude,time]   
##             long_name: quality level of SST pixel
##             flag_values: 1
##              flag_values: 2
##              flag_values: 3
##              flag_values: 4
##              flag_values: 5
##             flag_meanings: no_data bad_data not_used not_used not_used best_quality
##         float dt_analysis[longitude,latitude,time]   
##             long_name: deviation from SST reference
##             comment: Deviation from reference SST, i.e., dt_analysis = SST - reference SST
##             units: kelvin
##         float satellite_zenith_angle[longitude,latitude,time]   
##             long_name: satellite zenith angle
##             comment: satellite zenith angle
##             units: degrees
##         float sses_bias[longitude,latitude,time]   
##             long_name: SSES bias estimate
##             comment: Bias is derived against Piecewise Regression SST produced by local regressions with buoys. Subtracting sses_bias from sea_surface_temperature produces more accurate estimate of SST at the depth of buoys. Further information at (Petrenko et al., JTECH, 2016; doi:10.1175/JTECH-D-15-0166.1)
##             units: kelvin
##         float sses_standard_deviation[longitude,latitude,time]   
##             long_name: SSES standard deviation
##             units: kelvin
##             comment: Standard deviation of sea_surface_temperature from SST measured by drifting buoys. Further information at (Petrenko et al., JTECH, 2016; doi:10.1175/JTECH-D-15-0166.1)
##         float wind_speed[longitude,latitude,time]   
##             long_name: wind speed
##             units: m s-1
##             comment: Typically represents surface winds (10 meters above the sea surface)
## 
##      3 dimensions:
##         latitude  Size:1800
##             standard_name: latitude
##             units: degrees_north
##         longitude  Size:2500
##             standard_name: longitude
##             units: degrees_east
##         time  Size:27278
##             standard_name: time
##             long_name: EPOCH Time
##             units: seconds since 1970-01-01T00:00:00Z
## 
##     9 global attributes:
##         _NCProperties: version=2,netcdf=4.7.1,hdf5=1.10.5
##         creator_name: James Simkins
##         creator_email: simkins@udel.edu
##         institution: University of Delaware Ocean Exploration, Remote Sensing and Biogeography Group (ORB)
##         url: http://orb.ceoe.udel.edu/
##         source: NOAA/NESDIS/STAR GHRSST- http://www.star.nesdis.noaa.gov - SST 
##         groundstation: University of Delaware, Newark, Center for Remote Sensing
##         summary: NOAA/NESDIS/STAR GHRSST GOES16 SST product, fit to UDel ORB lab NW Atlantic extent, reprojected to EPSG:4326.
##         acknowledgement: These data were provided by Group for High Resolution Sea Surface Temperature (GHRSST) and the National Oceanic and Atmospheric Administration (NOAA)

Just with that one line of code, we’ve opened a connection with the GOES-R dataset on the THREDDS server. Printing the netcdf dataset provides some metadata info. Let’s use this metadata and extract the time period / spatial extent that we want.

# print out the names of the variables in our dataset
names(goes.nc$var)
## [1] "projection"              "sea_surface_temperature"
## [3] "quality_level"           "dt_analysis"            
## [5] "satellite_zenith_angle"  "sses_bias"              
## [7] "sses_standard_deviation" "wind_speed"
# how is the time stored?
goes.nc$dim$time$units
## [1] "seconds since 1970-01-01T00:00:00Z"

Seconds since 1970-01-01 is referred to as EPOCH time. Basically, this datetime is considered the inception of the internet. Computers are very good at storing information in this format and this is why we use this. Let’s take out the last value -

lastVal = length(goes.nc$dim$time$vals)
lastVal
## [1] 27278
epoch_val = goes.nc$dim$time$vals[lastVal]

There you go, that’s an EPOCH time value. Let’s convert it to a human timestamp…

human_time = as.POSIXct(epoch_val, origin="1970-01-01")
human_time
## [1] "2021-03-03 11:00:00 EST"

as.POSIXct is a datetime package in R. It is a gold standard and you’ll see it as you gain more experience in playing with datetime conversions. You can also use anytime package.

library(anytime)
anytime(epoch_val)
## [1] "2021-03-03 11:00:00 EST"

At this point, all we have to do is convert our human dates to EPOCH so we can extract the data. In order to do this all we need to do is convert a datetime object to a numeric. R handles it for us…

start_time = "2019-07-14" # year dash month dash day
epoch_start_time = as.numeric(as.POSIXct(start_time, format="%Y-%m-%d")) # %Y-%m-%d is telling the computer the format of our datestring is year dash month dash day

end_time = "2019-07-16" # year dash month dash day
epoch_end_time = as.numeric(as.POSIXct(end_time, format="%Y-%m-%d")) # %Y-%m-%d is telling the computer the format of our datestring is year dash month dash day

14.1.2 Indexing

We have the time values converted to the format of the dataset, but now we need to find the index - i.e. where that value lies in the dataset. We can find this using this code…

which.min(abs(array - value))

Which reads as -

  • which.min() - which value is the minimum of this array
  • abs() - absolute value - we take the absolute value because negative numbers confuse the math
index_start_time = which.min(abs(goes.nc$dim$time$vals - epoch_start_time))
index_start_time
## [1] 13384

That’s the index! This is just a big matching game essentially.

goes.nc$dim$time$vals[index_start_time]
## [1] 1563076800
epoch_start_time
## [1] 1563076800

Alright, we have our start time index! What about the latitudes and longitudes? We’ll need to find the index of the lat/lon grid we want. Delaware bay is approximately between -77W, -73W, 36N, and 42N.

# print out a few longitude values - notice that the entire dataset is on this grid right here. 
head(goes.nc$dim$lon$vals)
## [1] -99.99 -99.97 -99.95 -99.93 -99.91 -99.89
# notice how we extract those values using indexing - the 100th value 
goes.nc$dim$lon$vals[100]
## [1] -98.01

The 100th lon value is -98.20815. Aka, a lon index of 100 returns -98.20815.

So…which value is the minimum of the absolute value of the array of values minus the specific value? Let’s plug it in…

west_lon = -77
index_west_lon = which.min(abs(goes.nc$dim$longitude$vals - west_lon))
index_west_lon
## [1] 1150
goes.nc$dim$longitude$vals[index_west_lon]
## [1] -77.01

So our desired west_lon is -77, and the closest value within our longitude array is -76.99001. Not bad…let’s run the rest.

east_lon = -73
index_east_lon = which.min(abs(goes.nc$dim$longitude$vals - east_lon))

north_lat = 42
index_north_lat = which.min(abs(goes.nc$dim$latitude$vals - north_lat))
south_lat = 36
index_south_lat = which.min(abs(goes.nc$dim$latitude$vals - south_lat))

Everything is indexed! Now we can use these indexes to extract the exact data that we want via ncvar_get from the ncdf4 package. We can enter in arguments named start and count. This tells ncvar_get at what space / time to start grabbing values. count tells ncvar_get how long to count in space / time. For example, if the resolution of our data is hourly and we start at 12:00:00 and the count is 4, that means we grab data at 12:00:00 , 13:00:00 , 14:00:00 , 15:00:00 , 16:00:00. The same goes for lat and lon - the count all depends on what resolution your data is in. If you have a spatial dataset at 5 degree resolution, each count will bring in another 5 degree lat/lon value.

At this point, we know our start values. At this point, we only have end values, not count values…Let’s figure those out

time_count = which.min(abs(goes.nc$dim$time$vals - epoch_end_time)) - which.min(abs(goes.nc$dim$time$vals - epoch_start_time))
time_count
## [1] 48

So, if we count 48 time values from the start_time of 2019-07-14, we’ll arrive at the end_time of 2019-07-16. Let’s do the same for our lat/lons.

# latitude counts
lat_count = abs(index_north_lat - index_south_lat)
lon_count = abs(index_west_lon - index_east_lon)

Why do we take the absolute value? Because sometimes data is stored backwards. We don’t care which way it’s stored, we just need a positive number to count up from the starting value. Longitude values are weird because many times they are stored as negative values if they’re west of the meridian. Latitude values are also negative if we go south of the equator.

Now we have our count values, we can proceed. Remember, here is what the start argument is looking for…via (?ncvar_get)

A vector of indices indicating where to start reading the passed values (beginning at 1). The length of this vector must equal the number of dimensions the variable has. Order is X-Y-Z-T (i.e., the time dimension is last). If not specified, reading starts at the beginning of the file (1,1,1,...).

So x is longitude, y is latitude, and t is time. Whichever value is lower between index_west_lon and index_east_lon is our start value, and vice versa for latitudes.

index_west_lon
## [1] 1150
index_east_lon
## [1] 1350
index_south_lat
## [1] 800
index_north_lat
## [1] 500

Wait just a second…the southern index is greater than the northern index. Typically, values are stored from least to greatest. However, we’ve seen before that netCDF files can be stored upside-down before. So, instead of counting from our southern latitude to our northern latitude, we need to count from our northern latitude to our southern latitude. Here is what the count argument is looking for…

A vector of integers indicating the count of values to read along each dimension (order is X-Y-Z-T). The length of this vector must equal the number of dimensions the variable has. If not specified and the variable does NOT have an unlimited dimension, the entire variable is read. As a special case, the value "-1" indicates that all entries along that dimension should be read.

We tell ncvar_get() where to start it’s longitude, latitude, time and count until we have our lat/lon box and time slices selected. We already have our count values so now we plug them in and run - note this will take a few seconds to run and even longer if you have poor internet connection. This is data being extracted through the internet in real time.

14.1.3 Downloading the Selected Data

# cool, let's grab sea_surface_temperature for Delaware Bay
sst.c <- ncvar_get(goes.nc, "sea_surface_temperature",start = c(index_west_lon,index_north_lat,index_start_time), 
                   count = c(lon_count,lat_count,time_count))
dim(sst.c)
## [1] 200 300  48

And there is the data! It returns the raw data in the format we pulled it in - Lon, Lat, Time. We have a 2 dimensional array (lon x lat) with 48 time slices. Let’s convert one to raster and plot it really quickly.

library(maptools) # also loads sp package
library(ncdf4)
library(raster)
library(rasterVis)
arr.sst = sst.c[,,4]
arr.sst[arr.sst < 273] = NA
test.sst = raster(x = arr.sst)
test.sst
## class      : RasterLayer 
## dimensions : 200, 300, 60000  (nrow, ncol, ncell)
## resolution : 0.003333333, 0.005  (x, y)
## extent     : 0, 1, 0, 1  (xmin, xmax, ymin, ymax)
## crs        : NA 
## source     : memory
## names      : layer 
## values     : 294.54, 302.97  (min, max)

What’s missing? Well we just gave it the raw data, we still need to plug in the extents! Also, we will need to plug in the CRS - luckily we know this data is in lat/lon because of how awesome this metadata is!

library(ggplot2)
# we need to transpose this dataset just like we did in week 4 of the R intro course
test.sst = t(test.sst)

# define the projection
sst.crs = ncatt_get(goes.nc, "projection")
extent(test.sst) = c(west_lon, east_lon, south_lat, north_lat)
crs(test.sst) = sst.crs$proj4_string
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum WGS_1984 in CRS definition,
##  but +towgs84= values preserved
# convert raster to dataframe for ggplot
df <- as.data.frame(test.sst, xy = TRUE)
# Throw together the usa spatial polygons data frame
ggplot() +
  geom_raster(data = df , aes(x = x, y = y, fill = layer)) + 
  borders(fill="white", xlim = c(-77,-73), ylim=c(36,42),alpha = 0.5) +
  coord_quickmap(xlim = c(-77,-73), ylim=c(36,42),expand = FALSE)    

There’s the data we wanted and we didn’t have to leave R to get it. We also saved a ton of time and hard drive space.

14.2 ERDDAP

THREDDS is very fast and great when you’re already familiar with a dataset. ERDDAP, on the other hand, is meant for humans. ERDDAP is a data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP can generate maps on the fly so you can check out the data before you proceed to download. Here are some NASA / NOAA / UDEL ERDDAP pages:

  1. https://upwell.pfeg.noaa.gov/erddap/index.html - NOAA Global Earth Observation Over 10,000 datasets available here
  2. https://coastwatch.pfeg.noaa.gov/erddap/index.html - NOAA Ocean ERDDAP - over 1400 datasets available here
  3. https://gliders.ioos.us/erddap/index.html - IOOS Ocean Glider Data - Over 600 datasets here
  4. http://www.neracoos.org/erddap/index.html - NERACOOS Ocean/Met - Over 200 Datasets
  5. http://basin.ceoe.udel.edu/erddap/index.html - UDEL Satellite Receiving Station ERDDAP

Let’s check out the UDEL Satellite Receiving Station ERDDAP page - here’s what it looks like

Find the Dataset ID named jpl_goesr_sst.

Click the graph button along this row.

You can use the Data Access Form at the top of the page to select data and download, or we can streamline this and use R to accomplish this task. In order to do this, we just need the rerddap package.

14.2.1 rerddap

install.packages("rerddap")

library("rerddap")
## Registered S3 method overwritten by 'hoardr':
##   method           from
##   print.cache_info httr
?rerddap
  • the list of servers rerddap knows about - server()
  • search an ERDDAP server for terms - ed_search(query, page = NULL, page_size = NULL, which = "griddap", url = eurl(), ...)
  • get a list of datasets on an ERDDAP server - ed_datasets(which = "tabledap", url = eurl())
  • obtain information about a dataset - info(datasetid, url = eurl(), ...)
  • extract data from a griddap dataset - griddap(x, ..., fields = "all", stride = 1, fmt = "nc", url = eurl(), store = disk(), read = TRUE, callopts = list())
  • extract data from a tabledap dataset - tabledap(x, ..., fields = NULL, distinct = FALSE, orderby = NULL, orderbymax = NULL, orderbymin = NULL, orderbyminmax = NULL, units = NULL, url = eurl(), store = disk(), callopts = list())

Be careful when using the functions ed_search() and ed_datasets(). The default ERDDAP has over 9,000 datasets, most of which are grids, so that a list of all the gridded datasets can be quite long. A seemly reasonable search:

14.2.2 Finding the Data You Want

The first way to find a dataset is to browse the built-in web page for a particular ERDDAP server. A list of some of the publicly available ERDDAP servers can be obtained from the rerddap command:

servers()
#>                                                                                             name
#> 1                                                          Marine Domain Awareness (MDA) - Italy
#> 2                                                                     Marine Institute - Ireland
#> 3                                                       CoastWatch Caribbean/Gulf of Mexico Node
#> 4                                                                     CoastWatch West Coast Node
#> 5                     NOAA IOOS CeNCOOS (Central and Northern California Ocean Observing System)
#> 6  NOAA IOOS NERACOOS (Northeastern Regional Association of Coastal and Ocean Observing Systems)
#> 7                                         NOAA IOOS NGDAC (National Glider Data Assembly Center)
#> 8    NOAA IOOS PacIOOS (Pacific Islands Ocean Observing System) at the University of Hawaii (UH)
#> 9                     NOAA IOOS SECOORA (Southeast Coastal Ocean Observing Regional Association)
#> 10                            NOAA NCEI (National Centers for Environmental Information) / NCDDC
#> 11                                                NOAA OSMC (Observing System Monitoring Center)
#> 12                                                           NOAA UAF (Unified Access Framework)
#> 13                                                                   ONC (Ocean Networks Canada)
#> 14                    UC Davis BML (University of California at Davis, Bodega Marine Laboratory)
#> 15                                                                            R.Tech Engineering
#> 16                                     French Research Institute for the Exploitation of the Sea
#>                                         url
#> 1  https://bluehub.jrc.ec.europa.eu/erddap/
#> 2           http://erddap.marine.ie/erddap/
#> 3       http://cwcgom.aoml.noaa.gov/erddap/
#> 4  https://coastwatch.pfeg.noaa.gov/erddap/
#> 5     http://erddap.axiomalaska.com/erddap/
#> 6           http://www.neracoos.org/erddap/
#> 7       http://data.ioos.us/gliders/erddap/
#> 8       http://oos.soest.hawaii.edu/erddap/
#> 9            http://129.252.139.124/erddap/
#> 10   http://ecowatch.ncddc.noaa.gov/erddap/
#> 11             http://osmc.noaa.gov/erddap/
#> 12     https://upwell.pfeg.noaa.gov/erddap/
#> 13           http://dap.onc.uvic.ca/erddap/
#> 14    http://bmlsc.ucdavis.edu:8080/erddap/
#> 15            http://meteo.rtech.fr/erddap/
#> 16  http://www.ifremer.fr/erddap/index.html

The second way to find and obtain the desired data is to use functions in rerddap. The basic steps are:

  1. Find the dataset on an ERDDAP server (rerddap::servers(), rerddap::ed_search(), rerddap::ed_datasets() ).
  2. Get the needed information about the dataset (rerddap::info() )
  3. Think about what you are going to do.
  4. Make the request for the data (rerddap::griddap() or rerddap::tabledap() ).

14.2.3 Tabledap

rerddap::tabledap() - Useful for downloading point datasets, like buoy data or weather station. Here is an example of finding datasets that contain the word buoy

whichBUOYS <- rerddap::ed_search(query = "buoy")

Although it may not show up in printed search results, within the lower portions of these results is a Dataset ID called cwwcNDBCMet. The rerddap package has referenced this Dataset ID to NOAA’s National Data Buoy Center (NDBC) which collects buoy data from around the world. We can browse this dataset using the browse() function, which takes an argument of a recognized Dataset ID.

rerddap::browse('cwwcNDBCMet')

That’s the ERDDAP site that we’re going to pull data from…

library(rerddap)
library(ggplot2)
library(mapdata)

# obtain the info of this Dataset ID
info('cwwcNDBCMet')

# save the info
BuoysInfo <- info('cwwcNDBCMet')

# use the tabledap function to grab station, latitudes, and longitudes of the buoys within our bounding box
locationBuoys <- tabledap(BuoysInfo, distinct = TRUE, fields = c("station", "longitude", "latitude"), "longitude>=-76", "longitude<=-74", "latitude>=38", "latitude<=40")

# subset to the top 4 buoys
locationBuoys = locationBuoys[1:4,]

# convert the lat/lons from character to numeric
locationBuoys$latitude <- as.numeric(locationBuoys$latitude)
locationBuoys$longitude <- as.numeric(locationBuoys$longitude)

# create our limits aka the bounding geographic box
xlim <- c(-76, -74)
ylim <- c(38, 40)

# save map data of the bounding box to use in ggplot2
coast <- map_data("worldHires", ylim = ylim, xlim = xlim)

# plot with ggplot2
ggplot() +
  geom_point(data = locationBuoys, aes(x = longitude , y = latitude, colour = factor(station) )) +
  geom_polygon(data = coast, aes(x = long, y = lat, group = group), fill = "grey80") +
  theme_bw() + ylab("latitude") + xlab("longitude") +
  coord_fixed(1.3, xlim = xlim, ylim = ylim) +
  ggtitle("Delaware Bay Buoys")

14.2.4 Griddap

rerddap::griddap() - Useful for downloading spatial datasets, like sea surface temperature for the Atlantic Ocean.

whichSST <- ed_search(query = "SST")

returns about 1000 responses. Let’s use a more focused query search of SST MUR - MUR is a reanlysis sea surface temperature product that has SST data for the entire world.

whichSST <- ed_search(query = "SST MUR")

Awesome - let’s use the dataset_id of jplMURSST41 !

info('jplMURSST41')
## <ERDDAP info> jplMURSST41 
##  Base URL: https://upwell.pfeg.noaa.gov/erddap/ 
##  Dataset Type: griddap 
##  Dimensions (range):  
##      time: (2002-06-01T09:00:00Z, 2021-03-02T09:00:00Z) 
##      latitude: (-89.99, 89.99) 
##      longitude: (-179.99, 180.0) 
##  Variables:  
##      analysed_sst: 
##          Units: degree_C 
##      analysis_error: 
##          Units: degree_C 
##      mask: 
##      sea_ice_fraction: 
##          Units: 1

Alright, let’s extract some of this data and plot it. For this, we can use ‘last’ for the latest time instead of plugging in a time string.

require("ggplot2")
require("mapdata")
## Loading required package: mapdata
require("rerddap")

# grab the latest MUR SST info
sstInfo <- info('jplMURSST41')

# get latest daily sst with griddap
murSST <- griddap(sstInfo, latitude = c(30, 45), longitude = c(-80., -70), time = c('last','last'), fields = 'analysed_sst')
## info() output passed to x; setting base url to: https://upwell.pfeg.noaa.gov/erddap/
# select colors from rerddap color palettes
mycolor <- colors$temperature

# grab boundary data from our limits
w <- map_data("worldHires", ylim = c(30, 45), xlim = c(-80., -70))

# plot
ggplot(data = murSST$data, aes(x = lon, y = lat, fill = analysed_sst)) + 
    geom_polygon(data = w, aes(x = long, y = lat, group = group), fill = "grey80") +
    geom_raster(interpolate = FALSE) +
    scale_fill_gradientn(colours = mycolor, na.value = NA) +
    theme_bw() + ylab("latitude") + xlab("longitude") +
    coord_fixed(1.3, xlim = c(-80, -70),  ylim = c(30, 45)) + ggtitle("Latest MUR SST")

14.2.5 Custom ERDDAP URL

What if we want to use a custom URL that’s not pre-loaded into rerddap ? Let’s use Ireland’s Marine Institute ERDDAP. Follow these steps:

  1. Go to http://erddap.marine.ie/erddap/ and select datasets
  2. Look to the far right column named Dataset ID
  3. Locate this Dataset ID - IMI_NEATL
  4. Select Graph under ‘Make a Graph’ Column for that Dataset ID
  5. Look at the variables - see sea_surface_temperature
# define the base URL 
urlBase <- "http://erddap.marine.ie/erddap/"

# select Dataset ID from the URL
dataInfo <- rerddap::info("IMI_NEATL", url = urlBase)

# define parameters
parameter <- "sea_surface_temperature"
sstTimes <- c("last", "last")
sstLats <- c(48.00625, 57.50625)
sstLons <- c(-17.99375, -1.00625)

# define limits
xlim <- c(-17.99375, -1.00625)
ylim <- c(48.00625, 57.50625)

# use griddap to grab this gridded dataset
NAtlsst <- griddap(dataInfo, longitude = sstLons, latitude = sstLats, time = sstTimes, fields = parameter, url = urlBase)
## info() output passed to x; setting base url to: http://erddap.marine.ie/erddap/
str(NAtlsst$data)
## 'data.frame':    1034960 obs. of  4 variables:
##  $ time                   : chr  "2021-01-25T00:00:00Z" "2021-01-25T00:00:00Z" "2021-01-25T00:00:00Z" "2021-01-25T00:00:00Z" ...
##  $ lat                    : num  48 48 48 48 48 ...
##  $ lon                    : num  -18 -18 -18 -18 -17.9 ...
##  $ sea_surface_temperature: num  11.9 11.9 11.9 11.9 12.5 ...
# select the temperature colorbar from rerrdap
my.col <- colors$temperature 

# grab map data for our domain
w <- map_data("worldHires", ylim = ylim, xlim = xlim)

#  plot
myplot <- ggplot() + 
    geom_raster(data = NAtlsst$data, aes(x = lon, y = lat, fill = sea_surface_temperature), interpolate = FALSE) + 
    geom_polygon(data = w, aes(x = long, y = lat, group = group), fill = "grey80") +
    theme_bw() + scale_fill_gradientn(colours = my.col, na.value = NA, limits = c(5,15), name = "Temperature") +
    ylab("latitude") + xlab("longitude") +
    coord_fixed(1.3, xlim = xlim, ylim = ylim, expand=FALSE) + 
    ggtitle(paste("Sea Surface Temperature : ", NAtlsst$data$time[1]))
myplot

14.3 Assignment

Deliverables: - R Code/Script - Images/Plots

  1. Select any dataset from any THREDDS server. Create a script that loads in desired data (specific lat/lons/time etc.) and plot it. In your script, explain why you chose that dataset, detail each step and describe what is happening. Turn in your R script and the Images/Plots you’ve made to UD Canvas.

  2. Select any dataset from any ERDDAP server. Create a script that loads in desired data (specific lat/lons/time etc.) and plot it. In your script, explain why you chose that dataset, detail each step and describe what is happening. Turn in your R script and the Images/Plots you’ve made to UD Canvas.

14.3.1 Extra Credit - 2 Points

Find a dataset that uses TableDaps instance of ERDDAP. Create a routine that downloads and extracts data using the tabledap() function from rerddap