3 Definitions & Rules
This is a reference guide to look back on when you’re stuck. These definitions and rules are not expected to be understood right now but it’s important you know you can look back on this as a quick-reference.
3.1 Important R Programming Definitions
Read through these now to get familiar and refer back to these whenever you need a refresher. You’re not expected to have these memorized or even understood at this moment. These will make more sense as we progress through the course.
Coding Name | Example | Definition |
---|---|---|
syntax | R code | the nomenclature and structure of a programming language |
debugging | Failed R run | debugging involves fixing R code that is written incorrectly and doesn’t run |
variable | my_var |
Variables are used to store data, whose value can be changed according to our need. Variables can be declared using <- (tradiational way) or by = (conventional way) |
package | library(ggplot2) |
A collection of functions prewritten in R |
function | print() |
A function is a set of statements organized together to perform a specific task. R has a set of preloaded functions that are part of the base package. If a function cannot be found as part of the base package, the function has likely already been built under another package that needs to be loaded in. Functions can be identified due to their enclosing parantheses () |
arguments | read.csv(file = "datasets/tv_shows.csv", header = FALSE) |
Components of a function that are separated by commas and declared using the = sign. Arguments in this example are file = and header = |
index | tv_data[3,55] |
The position of data within a data frame, matrix, list, vector, etc. In R, data is indexed as [row,column] and indexing is done via brackets [] |
loop | for (n in names){print(n)} |
Repeats a task for a specified number of times. Saves a programmer from repeating codelines with different parameters. |
logical | TRUE, FALSE |
TRUE and FALSE logical operators are declared using all caps |
arithmetic operators | +,-,*,/,^ |
Math operators used for addition, subtraction, multiplication, division, exponent, respectively. |
comparison operators | ==, <, >, <=, >=, != |
Is equal to, less than, greater than, less than or equal to, greater than or equal to, is NOT equal to, respectively |
and/or operators | &, | |
AND, OR |
string | a_string = "anythign within quotes, single or double" |
Any value written within a pair of single quote or double quotes in R is treated as a string. |
numeric | 1 |
Any number - integer, float, etc. |
vector | as.vector(x = c(1,2,3,4)) |
Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character and raw. |
lists | list('Peter', 'Sarah', 'Tom', 'Helen') |
Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it |
matrix | matrix(c(1:5), nrow = 3, byrow = TRUE) |
Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. |
array | array(data = c(1,2,3)) |
Arrays are the R data objects which can store data in more than two dimensions. For example − If we create an array of dimension (1, 2, 3) then it creates 3 rectangular matrices each with 1 rows and 2 columns. Arrays can store only one data type. |
data frame | data.frame(tv_data) |
R version of Excel Spreadsheet. A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. |
factor | factor() |
Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like “Male,”Female" and True, False etc. They are useful in data analysis for statistical modeling. |
help | help(read.csv) |
Default helper function in R. Opens up documentation on a particular function in the lower right quadrant of R. |
class | class(tv_data) |
Tells us what R is recognizing something as |
concatenate (c ) |
c(“a”, “b”, “c”) | A quick utility for concatenating strings together |
filepath | “/Users/james/Downloads/” | The location on your computer where a file is stored. A filepath with a leading slash (akak “/” ) is also referred to as root. Root is the furthest back you can go on your computer. Think of a filepath like this - “/Earth/UnitedStates/Delaware/Newark/” |
install.packages() |
install.packages("ggplot2") |
R base function for installing packages |
3.2 Rules
- Variable names must be assigned.
<- list('Peter', 'Sarah', 'Tom', 'Helen') names_list
#
is the Comment Operator - anything on the same line of the # comment operator will not be run by R.
# comments can be above
<- list('Peter', 'Sarah', 'Tom', 'Helen') # comments can be outside
names_list
## comments can be anywhere.
- Parantheses (), Brackets [], Curly brackets {}, Quotations "" must be used in pairs
= list('Peter', 'Sarah', 'Tom', 'Helen')
names_list
for (n in names_list){
print(n)
}
- If you don’t have a package, you must install that package. (After you install once, you don’t need to install again.)
install.packages('ggplot2')
- Packages MUST be loaded for each R session.
library(ggplot2)
- Variable names should not replicate function/package names.
3.3 General Recommendations
Comment, comment, comment. A comment is a brief note on what you were doing when you wrote a line of code. For example, if you write some R code that edits part of a dataframe (R’s version of an Excel Spreadsheet), comment what you were thinking here and why you did it this way. Once you become comfortable coding in R, you’ll be able to churn out new R scripts at a faster rate. It’s very important that you comment on what you’re doing at each step in the script so if you need to look back on something you wrote you can reference what you were doing there. A comment in R is declared using the pound symbol (#).
Keep raw data raw. An advantage of R is being able to read in an original spreadsheet and output a new spreadsheet as a separate file. In other words, when you read in a dataset (for example,
tv_shows.csv
) and make changes to this file, do not save it wastv_shows.csv
- thus overwriting the file. Instead, name it something liketv_shows_edited.csv
. Also, notice how we use underscores (_
) in between words of a filename - this is good practice that should be replicated (spaces are bad, see 4)When in doubt, Google your R question - look for StackOverflow links. StackOverflow is a web-forum where programmers can post questions for help. This is an incredible tool that even advanced programmers and developers use daily. There are other helpful forums out there - StackOverflow is the most popular.
Spaces in variable/file names are BAD. A variable is an object or column that you create in R. For example, if you have a list of student names (
student_names = list("John", "Peter", "Sebastian")
, the variable here would bestudent_names
. Let’s get into the habit of using underscores ’_’ or dashes ‘-’ or periods ‘.’ to separate words instead of spaces. From the computers side of variable name storage, it’s much safer to declare a variable name such as data_file as opposed to data file
Keep in mind, these will make more sense after we get more familiar with R - it’s alright if they’re confusing right now!