15 Functions and Code Presentation

In this tutorial, we’ll focus on functions and displaying R code & results with style. Functions are important for code speed and documentation. Presenting R code is important if you need to explain how to do something using R (i.e. this tutorial). We’ll start by explaining functions and proceeding through a complex example. Afterwards, we’ll show how RMarkdown works so you can present a complex function of your own.

15.1 Functions

We create functions to speed up and organize code. So far, we’ve only been coding things via scripting and occasionally using for loops. While this is fine, it’s slower than writing functions would be. Writing or creating functions is considered much cleaner and more proper. As a recap, whenever we use the phrase library(ncdf4) we are loading the ncdf4 package. Within that package are a number of .R scripts/files that contain functions that someone has been kind enough to write for the world. By using library(ncdf4), R reads in all of those functions and thus knows what we want when we execute a function like nc_open(). Instead of re-writing the pure function of nc_open() over and over again every time we want to open a netcdf file, we are able to just load the function. Before we get further into ncdf4, let’s write some R code to square a sequence of values.

# square a sequence of values with start of 5 and stop of 10
arg1 = 1
arg2 = 5
my.seq = seq(arg1, arg2, 1)
my.seq.squared = my.seq ** 2
my.seq.squared

## [1]  1  4  9 16 25

This code is straight-forward and light. Let’s now assume, however, that we want to run this code on multiple sets of start and stop values. If we turn the code above into a function, this is streamlined.

# create a function called seq_squared
seq_squared = function(arg1, arg2){
  my.seq = seq(arg1, arg2, 1)
  my.seq.squared = my.seq ** 2
  return(my.seq.squared)
}

seq_squared(arg1 = 5, arg2 = 10)

## [1]  25  36  49  64  81 100

Above we have a function that takes in start and stop values of a sequence, uses these values to make a sequence, and returns the squares of each of these values. We create a new function in R using the function R call and name it seq_squared. When we write a function of our own, we can add in our own arguments as we see fit. In this case, we create arguments named arg1 and arg2. These arguments are used to create my.seq which is a sequence within our xfunc function. We create a my.seq.squared variable which is a vector of my.seq values squared. Ultimately, my.seq.squared is the output of our function. It’s what we include in our return statement. Every function requires something to be returned, and my.seq.squared is the output of our seq_squared.

15.2 NetCDF Processing Function

Loading in all of the weather data from Willow Creek, WI is cumbersome. Let’s write a function that reads in the data and converts everything to a dataframe for us.

library(ncdf4)
read.wcr = function(fname) {
  fullname = strsplit(fname, "/")
  dataset_str = fullname[[1]][length(fullname[[1]])]
  datname = strsplit(dataset_str, "_")[[1]][1]
  data.year = substr(dataset_str, nchar(dataset_str)-6, nchar(dataset_str)-3)
  data.date = seq(from=as.POSIXct(paste0(data.year,"-1-1 0:00", tz="UTC")),to=as.POSIXct(paste0(data.year,"-12-31 23:00", tz="UTC")),by="hour")
  vars.info <- data.frame(CF.name = c("date", "air_temperature", "precipitation_flux", "surface_downwelling_shortwave_flux_in_air",
                                      "specific_humidity", "surface_downwelling_longwave_flux_in_air", "air_pressure",
                                      "eastward_wind", "northward_wind", "wind_speed"))
  df <- list()
  tem <- ncdf4::nc_open(fname)
  dim <- tem$dim
  for (j in seq_along(vars.info$CF.name)) {
    if (exists(as.character(vars.info$CF.name[j]), tem$var)) {
      df[[j]] <- ncdf4::ncvar_get(tem, as.character(vars.info$CF.name[j]))
    } else {
      df[[j]] = NA
    }
  }
  names(df) <- vars.info$CF.name
  df <- data.frame(df)
  nc_close(tem)
  if(all(is.na(df$date))){
    df$date = data.date
  }
  return(df)
}

In the function above, we’ve refactored code that we previously wrote to read in a Willow Creek netCDF file. read.wcr is our function name, the function() command tells R we’re declaring a function. fname is one of the arguments that the function takes. For example, the function what we use for opening netCDF files - nc_open() has the following arguments…

nc_open( filename, write=FALSE, readunlim=TRUE, verbose=FALSE, auto_GMT=TRUE, suppress_dimvals=FALSE )

The function we’ve coded above only takes the filename, but if our routine was more complex we could add more arguments in. Once the argument is taken in, we generalize naming schemes that are unimportant. Notice how we use object names like tem for example. This is short for temporary, and is set up to be overwritten if we were to use this function within a for loop to open multiple files.

Finally, focus in on the return(df) line - return is what we get out of a function. Now, let’s actually use the read.wcr() function we wrote above.

wcr_df = read.wcr(fname = "~/Documents/Github/geog473-673/datasets/WCr_1hr.2010.nc")

summary(wcr_df)

##       date                     air_temperature precipitation_flux 
##  Min.   :2010-01-01 00:00:00   Min.   :248.4   Min.   :0.000e+00  
##  1st Qu.:2010-04-02 06:45:00   1st Qu.:270.5   1st Qu.:0.000e+00  
##  Median :2010-07-02 12:30:00   Median :280.9   Median :0.000e+00  
##  Mean   :2010-07-02 12:30:00   Mean   :279.8   Mean   :2.564e-05  
##  3rd Qu.:2010-10-01 18:15:00   3rd Qu.:289.4   3rd Qu.:0.000e+00  
##  Max.   :2010-12-31 23:00:00   Max.   :304.0   Max.   :9.102e-03  
##  surface_downwelling_shortwave_flux_in_air specific_humidity  
##  Min.   :   0.00                           Min.   :0.0001864  
##  1st Qu.:   0.00                           1st Qu.:0.0025622  
##  Median :   0.00                           Median :0.0051646  
##  Mean   :  78.04                           Mean   :0.0065146  
##  3rd Qu.:  12.62                           3rd Qu.:0.0100803  
##  Max.   :1045.39                           Max.   :0.0258134  
##  surface_downwelling_longwave_flux_in_air  air_pressure   eastward_wind     
##  Min.   :139.7                            Min.   :91108   Min.   :-10.6414  
##  1st Qu.:263.0                            1st Qu.:94818   1st Qu.: -1.7298  
##  Median :301.7                            Median :95236   Median : -0.2718  
##  Mean   :304.7                            Mean   :95191   Mean   : -0.2817  
##  3rd Qu.:351.6                            3rd Qu.:95650   3rd Qu.:  1.2213  
##  Max.   :444.1                            Max.   :96918   Max.   : 13.7162  
##  northward_wind     wind_speed    
##  Min.   :-12.3305   Mode:logical  
##  1st Qu.: -1.5300   NA's:8760     
##  Median : -0.2183                 
##  Mean   : -0.1984                 
##  3rd Qu.:  1.0876                 
##  Max.   :  6.5341

It’s as simple as that. This is why programming can become a black box and why we don’t always look under the hood. Code and functions like the one above can be messy and as long as we understand the output we don’t always care how it gets done.

Remember that we have Willow Creek weather data for 2010, 2011, and 2012. Let’s set up a for loop for this to show how we might read in all 3 years of data. We’re going to use the rbind() function to latch on extra data here.

15.3 Functions in For Loops

# create a sequence of years for the years we have data
yearSeq = seq(2010,2012)

# begin the for loop to open each year of data
for (y in yearSeq){
  # use our read.wcr() function and paste in the filename that changes for each year of data
  tem_df = read.wcr(fname = paste0("~/Documents/Github/geog473-673/datasets/WCr_1hr.",y,".nc"))
  # != means DOES NOT EQUAL. This statement reads as - if y DOES NOT EQUAL the first year in yearSeq, proceed to use rbind
  # otherwise, if y is indeed the first year in yearSeq, we need to initialize wcr_df BEFORE we overwrite tem_df
  if (y != yearSeq[1]){
    wcr_df = rbind(wcr_df, tem_df)
  } else {
    wcr_df = tem_df
  }
}

length(wcr_df$date)

## [1] 26304

summary(wcr_df)

##       date                     air_temperature precipitation_flux 
##  Min.   :2010-01-01 00:00:00   Min.   :245.0   Min.   :0.000e+00  
##  1st Qu.:2010-10-02 00:45:00   1st Qu.:270.4   1st Qu.:0.000e+00  
##  Median :2011-07-03 00:30:00   Median :280.0   Median :0.000e+00  
##  Mean   :2011-07-03 00:30:00   Mean   :279.6   Mean   :1.937e-05  
##  3rd Qu.:2012-04-02 00:15:00   3rd Qu.:289.4   3rd Qu.:0.000e+00  
##  Max.   :2012-12-31 23:00:00   Max.   :305.5   Max.   :9.102e-03  
##  surface_downwelling_shortwave_flux_in_air specific_humidity 
##  Min.   :   0.00                           Min.   :0.000001  
##  1st Qu.:   0.00                           1st Qu.:0.002611  
##  Median :   0.00                           Median :0.004790  
##  Mean   :  79.62                           Mean   :0.006324  
##  3rd Qu.:  13.01                           3rd Qu.:0.009615  
##  Max.   :1045.39                           Max.   :0.027334  
##  surface_downwelling_longwave_flux_in_air  air_pressure   eastward_wind     
##  Min.   :132.8                            Min.   :91108   Min.   :-12.3726  
##  1st Qu.:266.0                            1st Qu.:94838   1st Qu.: -1.7470  
##  Median :304.8                            Median :95287   Median : -0.3203  
##  Mean   :305.5                            Mean   :95222   Mean   : -0.2160  
##  3rd Qu.:349.3                            3rd Qu.:95676   3rd Qu.:  1.3130  
##  Max.   :465.8                            Max.   :97104   Max.   : 13.7162  
##  northward_wind     wind_speed    
##  Min.   :-12.3305   Mode:logical  
##  1st Qu.: -1.7196   NA's:26304    
##  Median : -0.3798                 
##  Mean   : -0.4407                 
##  3rd Qu.:  0.8609                 
##  Max.   :  8.0738

Remember, != means DOES NOT EQUAL. This statement reads as - if y DOES NOT EQUAL the first year in yearSeq, proceed to use rbind(). Otherwise, if y is indeed the first year in yearSeq, we need to initialize wcr_df BEFORE we overwrite tem_df.

Based on the summary we see that we have successfully merged the 3 datasets into one continuous dataset. Since we defined the function above, we don’t need to make a messy for loop with all of the function bits in it. Functions speed up our program and also make for more sophisticated coding.

15.4 RMarkdown

Code is often messy and difficult to explain. One R feature that can make explaining code or a function easier is RMarkdown. This is how this textbook has been built. RMarkdown is a combination of HTML like syntax with R. Here is how you make an RMarkdown document.

Open a new RMarkdown document - File -> New File -> R Markdown...

Enter in details

The initial RMarkdown should look like this

We tell RMarkdown we want to write and run R code with 3 backticks and {r} - ```{r}
We end our R code section with 3 backticks - ```
Thus, for our my.seq.squared function above, it would be coded like this in RMarkdown.

```{r}

# create a function called seq_squared
seq_squared = function(arg1, arg2){
  my.seq = seq(arg1, arg2, 1)
  my.seq.squared = my.seq ** 2
  return(my.seq.squared)
}

```

In our example here, R code is keyed in the R Markdown document like so.

Notice the cars inside of {r cars}. This names the coding section but has no impact on the code itself. We do not need to name each coding section - {r} alone will suffice.

Anything outside of the R code sections will show up as plain text. When we are finished with our R Markdown document, we need to Knit the document. See the blue yarn ball at the top of your script? This will create a new PDF file.

Et Voila! A professional looking RMarkdown PDF that makes your R code easier to explain and present. Here’s what the output above should look like

15.5 Assignment

Create a function that extracts and plots a spatial dataset given a time. Follow the steps below:

Choose an online dataset like we did last week.
Create a function that extracts the data given a time or time range, creates a data frame and plots the dataset.
Include arguments within the function such as time.
Create an RMarkdown PDF that details how the function works.
Include an example of the function working to plot the given dataset
Submit resulting PDF to UD Canvas week 7 assignment