2 Some Basics

“Learning to write programs stretches your mind, and helps you think better."

- Bill Gates, 1955-

2.1 First Steps

Upon opening R in Windows, two things will appear in the console of the R Graphical User Interface (R-GUI)¹⁸. These are the license disclaimer (blue text at the top of the console) and the command line prompt, i.e., $\boldsymbol{>}$ (Fig 2.1). The prompt indicates that R is ready for a command. All commands in R must begin at $\boldsymbol{>}$.

The default appearance of the R-GUI will vary slightly among operating systems. In Windows, the command line prompt and user commands are colored red (Fig 2.1), and output, including errors and warnings, are colored blue. In Mac OS, the command line prompt will be purple, user inputs will be blue, and output will be black. In Unix/Linux, wherein R will generally run from a shell command line, absent of any menus, all three will be black¹⁹. These console defaults can often be modified/customized when using R with an appropriate Integrated Development Environment (IDE) like RStudio (Section 2.8.3).

We can exit R at any time by typing q() in the console, closing the GUI window (non-Linux only), or by selecting Exit from the pulldown File menu (non-Linux only).

Figure 2.1: An aged, but still recognizable R console: R version 2.15.1, ‘Roasted Marshmallows’, ca. 2012.

2.2 First Operations

As an introduction we can use R to evaluate a simple mathematical expression. Type 2 + 2 and press Enter.

2 + 2

[1] 4

The output term [1] means, “this is the first requested element.” In this case there is just one requested element, $4$, the solution to $2 + 2$. If the output elements cannot be held on a single console line, then R would begin the second line of output with the element number comprising the first element of the new line. For instance, the command rnorm(20) will take 20 pseudo-random samples (see footnote in Section 9.5.10) from a standard normal distribution (see Ch 3 in Aho (2014)). We have:

rnorm(20)

 [1] -0.01191915  0.96290906  0.76284707 -0.48533212 -0.84082914
 [6]  2.66308935  0.11240932  0.45346593  1.97841253  1.27562307
[11] -0.60755132  1.19854595 -1.78489837 -1.23507048 -0.94929972
[16]  1.68012443  1.09905229  1.24485358 -0.90536880 -1.25759311

The reappearance of the command line prompt indicates that R is ready for another command. Multiple commands can be entered on a single line, separated by semicolons. Note, however, that this is considered poor programming style, as it may make your code more difficult to understand by a third party.

2 + 2; 3 + 2

[1] 4

[1] 5

R commands are generally insensitive to white spaces, including tabs. This allows the use of spaces to make code more legible. To my eyes, the command 2 + 2 is simply easier to read (and potentially debug) than 2+2.

2.2.1 Use Your Scroll Keys

As with many other command line environments, the scroll keys (Fig 2.2) provide an important shortcut in R. Instead of editing a line of code by tediously mouse-searching for an earlier command to copy, paste and then modify, you can simply scroll back through your earlier work using the upper scroll key, i.e., $\uparrow$ . Accordingly, scrolling down using $\downarrow$ will allow you to move forward through earlier commands.

Figure 2.2: Typical scroll direction keys on a keyboard.

2.2.2 Note to Self: `#`

R will not recognize commands preceded by #. As a result this is a good way for us to leave messages to ourselves.

# Note at beginning of line
2 + 2

[1] 4

We can even place comments in the middle of an expression, as long the expression is finished on a new line.

2 + # Note in middle of line
+ 2

[1] 4

In the “best” code writing style it is recommended that one place a space after # before beginning a comment, and to insert two spaces following code before placing # in the middle of a line. This convention is followed above.

2.2.3 Unfinished Commands

R will be unable to move on to a new task when a command line is unfinished. For example, type

2 +

and press Enter. We note that the continuation prompt, +, is now where the command prompt should be. R is telling us the command is unfinished. We can get back to the command prompt by finishing the function, clicking Misc$>$Stop current computation or Misc$>$Stop all computations from the R-toolbar (non-Linux only), typing Ctrl + C (Linux), or by pressing the Esc key (all OS).

2.3 Expressions and Assignments

All entries in R are either expressions or assignments. If an entry is an expression, it will be evaluated, printed, and discarded. Examples include: 2 + 2. Conversely, an assignment evaluates an expression, and binds the expression output to a name, thereby creating a referable R-object. This important activity has prompted the motto: “everything created or loaded in R is an object”[^02-ch2-1]. [^02-ch2-1]: Although everything created or loaded in R can be viewed as an object, not all R objects fit neatly into the object oriented programming (OOP) perspective of “object-oriented.” This is true because R base objects (which are not object oriented) come from S, which was developed before anyone considered the need for an S OOP system (see Wickham (2019) and Chambers (2008)).

To create an object, we use the assignment operator: <- . The operator represents an arrow that points toward it’s user-defined name.

Example 2.1 $\text{}$
To create an R-object named y, that contains the result of the expression 2 + 2, I can type:

y <- 2 + 2

The code: y <- 2 + 2 literally means: “$2 + 2$ is bound to the name y” (Wickham 2019).

The assignment operator can go on either side of an expression. Thus, as an alternative, I could have typed:

2 + 2 -> y

The leftward assignment operator, <-, is generally used instead of the rightward, ->, because it is easier to conceptualize the relationship object name <- object.

$\blacksquare$

Results of an assignment are generally not automatically printed. However, for most common object classes (see Section 2.3.5) summaries can be easily obtained²⁰.

Example 2.2 $\text{}$
To print the result of Example 2.1 (to see the object bound to the name y), I can simply type:

[1] 4

print(y)

[1] 4

$\blacksquare$

The mathematical equals operator, =, can also be used as an assignment operator. Like <-, = assigns from right to left.

Example 2.3 $\text{}$
For instance, to obtain the assignment result shown in Example 2.1, I could have typed:

y = 2 + 2
y

[1] 4

$\blacksquare$

Notably, the equals sign has limited applicability as an assignment operator, compared to <-²¹. Thus, in this document, I use <- for object assignments, and save = for specifying arguments in R functions.

R objects need not be numeric. In computer programming, a character string or string is a collective sequence of characters representing text²². Character strings in R are delimited with quotes: " " or ' '.

Example 2.4 $\text{}$
Here I define y to be a well-known character string.

y <- "Hello world"

[1] "Hello world"

$\blacksquare$

2.3.1 Functions and their Arguments

Importantly, the script print(y), in Example 2.2 provides one of our first clear uses of a special type of R object called a function. R functions generally require a user to specify arguments –that parameterize and control the function– within parentheses, following the function name. Thus, we use the following scripting framework to call an R function: function.name(argument1, argument2, argument3, etc). The function print() only requires one argument: the name of the object to be printed.

A list of function arguments, and their default values, can (generally) be obtained with the function formals().

formals(print)

$x


$...

The first argument in print(), x, refers to the name of the object to be printed. The second argument is the so-called triple dot placeholder, .... This (optional) argument, which is formally considered in Section 8.3.3, allows additional arguments to be passed from various printing methods that can be called using the generic function name print() (see Section 8.7).

Arguments in R functions can be set by users in two ways.

One can provide acceptable values for arguments, in the order that the arguments occur in the list reported by formals(). For example, for the function a_function, if I wish to assign the values x and y to the first and second arguments, respectively, I could type: a_function(x, y).
One can an refer to an argument by its name, and specify values for the argument using the = operator. That is, for some function a_function, with some arguments foo and bar, that I wish to assign the values x and y, I could type: a_function(foo = x, bar = y). This approach should be used if one does not remember the order of arguments in a function (if you don’t remember whether foo is the first or fifth argument), or if one wishes to change/specify only certain arguments from a large number of arguments.

Example 2.5 $\text{}$
Under approach 2, we can print the object y, created in Example 2.4, by typing:

print(x = y)

[1] "Hello world"

Of course, data object names other than y can be supplied to the argument x in print(). For example, to print an object named z, I could use either print(z) or print(x = z).

$\blacksquare$

One can maintain the default value for an argument by simply ignoring that argument in the function call. For example, if a_function, has argument defaults foo = x, bar = y, and I wished to change the value of foo to baz, while maintaining the default value for bar, I could type: a_function(foo = baz) Occasionally, a function’s defaults will allow it to run without user value specifications for any arguments. In this case, I could run a_function by typing: a_function().

2.3.2 Naming Objects

When binding an R-object to a name, we should try to keep the name simple, and avoid names that already represent important definitions and functions. These include: TRUE, FALSE, NULL, NA, NaN, and Inf. In addition, we cannot have names:

beginning with a numeric value,
containing spaces, colons, or semicolons,
containing mathematical operators (e.g., *, +, -, ^, /, =),
containing important R metacharacters (e.g., @, #, ?, !, %, &, |).

However, even these “forbidden” names and characters can be used if one encloses them in backticks, also called accent grave characters. For example, the code, `?` <- 2 + 2 will create an object named `?`, containing the number 4.

Names should, if possible, be descriptive. Thus, for a object containing 20 random observations from a normal distribution, the name rN20 may be superior to the easily-typed, but anonymous name, x. Finally, we should remember that R is case sensitive. That is, each of the following $2^4$ combinations will be recognized as distinct: name, Name, nAme, naMe, namE, NAme, nAMe, naME, NaMe, nAmE, NamE, naME, NAMe, nAME, NaME, NAmE, NAME.

2.3.3 Listing Objects

The lexical scoping characteristics of R (Section 1.4.1.2) have important consequences when considering objects and their names. An object’s name will be assigned to a particular R environment –a specialized storage system whose features are formally considered, alongside R functions, in Ch 8. By default, an object will be assigned by R to the environment where it was defined, although this can be modified.

Only objects in the current environment can be directly accessed by calling their names²³. A list of objects assigned to particular environments can be obtained using the functions objects() or ls().

Example 2.6 $\text{}$
The R session itself is defined to be the so-called global environment: .GlobalEnv.

environment()

<environment: R_GlobalEnv>

Object searches from objects() and ls() are limited, by default, to the current environment –which, for this document, is the global environment. Currently, I only have the object y (which has been applied and modified several times) in GlobalEnv.

objects()

[1] "y"

Note that in this example I run environment() and objects() without arguments.

$\blacksquare$

2.3.4 Combining Data

To combine a collection of numbers or other data into a single entity, one can use the important R function c(), which means “combine”.

Example 2.7 $\text{}$
To define the numbers 23, 34, and 10 collectively to be an object named x, I would type:

x <- c(23, 34, 10)

We could then do something like:

x + 7

[1] 30 41 17

Note that seven was added to each element in x.

$\blacksquare$

2.3.5 Object Classes

Under the idiom of object oriented programming (OOP), an object may have attributes that allow it to be evaluated correctly, and associated methods appropriate for those attributes (e.g., specific functions for plotting, printing, etc.)²⁴.

R objects will generally have a class, identifiable with the function class().

class(x)

[1] "numeric"

Objects in class numeric (and those in several other widely-used classes) can be evaluated mathematically. Some common R classes are shown in Table 2.1, along with several new functions used to create objects with those classes, including: raw(), expression(), list(), factor(), function(), matrix(), array(), and data.frame(). We will learn about these functions, and create objects representing all of these classes over the next few chapters. We will also learn how to create our own personalized classes and associated methods (Section 8.7).

	Class	Example
1	`logical`	`x <- TRUE`
2	`numeric`	`x <- 2 + 2`
3	`integer`	`x <- 1:3`
4	`character`	`x <- c("a","b","c")`
6	`complex`	`x <- 5i`
13	`raw`	`x <- raw(2)`
7	`expression`	`x <- expression(x * 4)`
12	`list`	`x <- list()`
5	`factor`	`x <- factor("a","a","b")`
8	`function`	`x <- function(y)y + 1`
9	`matrix`	`x <- matrix(nrow = 2, rnorm(4))`
10	`array`	`x <- array(rnorm(8), c(2, 2, 2))`
11	`data.frame`	`x <- data.frame(v1 = c(1,2), v2 = c("a","b"))`

2.3.6 Object Base Types

All R objects will have so-called base types that define their underlying C language data structures Specifically, R base types correspond to an underlying C-codified typedef, an alias framework for C data types. This internal process is referred to by the R-core development team as SEXPTYPE, meaning S-expression (SEXP) type (R Core Team 2024a). There are currently 24 SEXPTYPE variants (R Core Team 2024a), each corresponding to one of the 24 R base types (Table 2.2), and it is unlikely that more will be developed in the near future (Wickham 2019). The meaning and usage of some of the base types may seem clear, for instance, integer and character, which are also class designations (Table 2.1). Most of the base types are specifically addressed in later chapters, including list, complex, logical, integer, NULL, and symbol (Chs 3, character and language (Chs 4 and 5), closure, special, builtin, environment, pairlist, S4, and promise (Ch 8) and raw and double (Ch 12). Base types meant for C-internal processes, i.e., any, bytecode, promise, ..., weakref, externalptr, and char, are not easily accessible with interpreted R code (R Core Team 2024b). Underlying SEXP types are considered infrequently through the remainder of the book.

Base type	Example	Application	`SEXP`
`NULL`	`x <- NULL`	vectors	`NILSXP`
`logical`	`x <- TRUE`	vectors	`LGLSXP`
`integer`	`x <- 1L`	vectors	`INTSXP`
`complex`	`x <- 1i`	vectors	`CPLXSXP`
`double`	`x <- 1`	vectors	`REALSXP`
`list`	`x <- list()`	vectors	`VECSXP`
`character`	`x <- "a"`	vectors	`STRXSP`
`raw`	`x <- raw(2)`	vectors	`RAWSXP`
`closure`	`x <- function(y)y + 1`	closure functions	`CLOSXP`
`special`	`x <-` `[`	special functions	`SPECIALSXP`
`builtin`	`x <- sum`	builtin functions	`BUILTINSXP`
`expression`	`x <- expression(x * 4)`	expressions	`EXPRSXP`
`environment`	`x <- globalenv()`	environments	`ENVSXP`
`symbol`	`x <- quote(a)`	language components	`SYMSXP`
`language`	`x <- quote(a + 1)`	language components	`LANGSXP`
`pairlist`	`x <- formals(mean)`	language components	`LISTSXP`
`S4`	`x <- stats4::mle(function(x=1)x^2)`	non-simple objects	`OBJSXP`
`any`	No example	C-internal	`ANYSXP`
`bytecode`	No example	C-internal	`BCODESXP`
`promise`	No example	C-internal	`PROMSXP`
`...`	No example	C-internal	`DOTSXP`
`weakref`	No example	C-internal	`WEAKREFSXP`
`externalptr`	No example	C-internal	`EXTPTRSXP`
`char`	No example	C-internal	`CHARSXP`

Base types of numeric objects define their storage mode, i.e., the way R caches them in its primary memory²⁵. Base types can be identified using the function typeof().

Example 2.8 $\text{}$
For example, for our latest version x from Example 2.7 we have:

typeof(x)

[1] "double"

We see that x has storage mode double, meaning that its numeric values are stored using up to 53 bits, resulting in recognizable and distinguishable values between approximately $5 \times 10^{-323}$ and $2 \times 10^{307}$ (see Ch 12 for more information).

The R session itself (the global environment) has base type environment:

typeof(.GlobalEnv)

[1] "environment"

$\blacksquare$

2.3.7 Object Attributes

Many R-objects will also have attributes (i.e., characteristics particular to the object or object class).

Example 2.9 $\text{}$
Typing:

attributes(x)

NULL

indicates that x (as defined in Example 2.7) does not have additional attributes. However, using coercion (Section 3.3.4) we can define x to be an object of class matrix (a collection of data in a row and column format (see Section 3.1.2)).

attributes(as.matrix(x))

$dim
[1] 3 1

Now x has the attribute dim (i.e., dimension). Specifically, x is a three-celled matrix. It has three rows and one column.

$\blacksquare$

Amazingly, underlying object characteristics allow R to simultaneously store and distinguish objects with the same name. For instance:

mean <- mean(c(1, 2, 3))
mean

[1] 2

mean(c(1, 2, 3))

[1] 2

In general, it is not advisable to name an object after a frequently used function. Nonetheless, the function mean(), which calculates the arithmetic mean of a collection of data, is distinguishable from the new user-created object mean, because these objects have different underlying characteristics. We can remove the user-created object mean, with the function rm(). This leaves behind only the function mean(), which I print below:

rm(mean)
mean

function (x, ...) 
UseMethod("mean")
<bytecode: 0x000002217824fdc0>
<environment: namespace:base>

The capacity of R to track and distinguish object names is a primary focus of Section 8.8.

2.4 Getting Help

There is no single perfect source for information/documentation for all aspects of R. Detailed manuals from CRAN are available concerning the R language definition, basic operations, and package development. These resources, however, often assume a familiarity with Unix/Linux operating systems and computer science terminology. Thus, they may not be particularly helpful to biologists who are new to R.

2.4.1 `help()` and `?`

A comprehensive help system is available for many R components including operators, and loaded package dataframes and functions. The system can be accessed via the question mark, ?, operator and the function help().

Example 2.10 $\text{}$
For instance, if I wanted to know more about the plot() function, I could type:

?plot

help(plot)

$\blacksquare$

Documentation for packaged R functions (Section 3.7) must include an annotated description of function arguments, along with other pertinent information, and documentation for packaged datasets must include descriptions of dataset variables²⁶. The quality of documentation will generally be excellent for functions from packages in the default R download (i.e., the R-distribution packages, see Section 3.7), but will vary from package to package otherwise.

For help and documentation concerning programming metacharacters used in R (for instance @, #, ?, !, %, &, |), one would enclose the metacharacters with quotes. For example, to find out more information about the logical operator & I could type help("&") or ? "&". Placing two question marks in front of a topic will cause R to search for help files concerning with respect to all packages in a workstation.

Example 2.11 $\text{}$
For instance, type:

??lm

or, alternatively

help.search(lm)

for a huge number of help files on linear model functions identified through fuzzy matching.

$\blacksquare$

Help for particular R-questions can often be found online using the search engine at http://search.r-project.org/. This link is provided in the Help pulldown menu in the R console (non-Linux only). Helpful online discussions can also be found at Stack Overflow, and Stats Exchange.

2.4.2 `demo()` and `example()`

The function demo() allows one access to coded examples that developers have worked out for a particular function or topic. For instance, type:

demo(graphics)

for a brief demonstration of R graphics. Typing

demo(persp)

will provide a demonstration of 3D perspective plots. And, typing:

demo(Hershey)

will provide a demonstration of available modifiable symbols from the Hershey family of fonts (see Ch 6 in Hershey (1967)). Finally, typing:

demo()

lists all of the demos available in the loaded libraries for a particular workstation. The function example() usually provides less involved demonstrations from the man package directories (short for user manual, see Ch 10) in an R package. For instance, type:

example(plotmath)

for a coded demonstration of mathematical graphics.

2.4.3 Vignettes

R packages often contain vignettes. These are short documents that generally describe the theory underlying algorithms and guidance on how to correctly use package functions. Vignettes can be accessed with the function vignette(). To view all vignettes for all installed packages (Section 3.7.1), type:

vignette(all = TRUE)

To view all vignettes available for loaded packages (see Section 3.7.2), type:

vignette(all = FALSE)

To view vignettes for the R contributed package asbio (following its installation), type:

vignette(package = "asbio")

To see the vignette simpson in package asbio, type:

vignette("simpson", package = "asbio")

The function browseVignettes() provides an HTML-browser that allows interactive vignette searches.

2.5 Keyboard Shortcuts

R contains a number of useful keyboard shortcuts. For example, At this point, it may be evident that the R-console can quickly become cluttered and confusing. To remove console text (without actually getting rid of any of the objects created in a session) press Ctrl + L or, from the Edit pulldown menu, click Clear console (non-Linux only). A full list of keyboard shortcuts can be obtained by typing: Alt + Shift + K (Windows and Linux) or Option + Shift + K (Mac OS). Keyboard shortcuts can often be modified, or even created, if one is running R from a sophisticated IDE like RStudio (Section 2.10).

2.6 Options

To enhance an R session, we can adjust the appearance of the R-console and customize options that affect expression output. These include the characteristics of the graphics devices, the width of print output in the R-console, and the number of print lines and print digits. Changes to some of these parameters can be made by going to Edit$>$GUI Preferences in the R-toolbar. Many other parameters can be changed using the options() function. To see all alterable options one can type:

options()

The resulting list is extensive. To modify options, one would simply define the desired change within parentheses following a call to options. For instance, to see the default number of digits, I would type:

options("digits")

$digits
[1] 7

To change the default number of digits in output from 7 to 5 in the current session, I would type:

options(digits = 5)
# demonstration using pi
pi

[1] 3.1416

One can revert back to default options by restarting an R session.

2.6.1 Advanced Options

To store user-defined options and start up procedures, an.Rprofile file will exist in your R program etc directory. This location would be something like: $\ldots$R/R-version/etc. R will silently run commands in the .Rprofile file upon opening. Thus, by customizing the .Rprofile file one can “permanently” set session options, load installed packages, define your favorite package repository (Section 3.7), and even create aliases and defaults for frequently used functions.

The .Rprofile file located in the etc directory is the so-called .Rprofile.site file. Additional .Rprofile files can be placed in the working directory (see below). R will check for these and run them after running the .Rprofile.site file.

Example 2.12 $\text{}$
Here is the content of one of my current .Rprofile files.

options(repos = structure(c("http://ftp.osuosl.org/pub/cran/")))
.First <- function(){
library(asbio)
cat("\nWelcome to R Ken! ", date(), "\n")
}
.Last <- function(){
cat("\nGoodbye Ken", date(), "\n")
}

The command options(repos = structure(c("http://ftp.osuosl.org/pub/cran/"))) (Line 1) defines my preferred CRAN repository mirror site (see Section 3.7). The function .First( ) (Lines 2-5) will be run at the start of the R session and .Last( ) (Lines 6-8) will be run at the end of the session. R functions will formally introduced in Ch 8. As we go through this book it will become clear that these lines of code force R to say hello, and to load the package asbio (R packages are formally considered in Section 3.7), and print the date/time (using the function date()) when it opens, and to say goodbye, and print the date/time when it closes (although the farewell will only be seen when running R from a shell interface, e.g., the Windows Command Prompt).

$\blacksquare$

One can create .Rprofile files, and many other types of R extension files using the function file.create(). For instance, the code:

file.create("defaults.Rprofile")

will place an empty, editable,.Rprofile file called defaults in the working directory.

2.7 The Working Directory

By default, the R working directory is set to be the home directory of the workstation. The command getwd() shows the current file path for the working directory.

The working directory can be changed with the command setwd(filepath), where filepath is the location of the desired directory, or by using pulldown menus, i.e., File$>$Change dir (non-Linux only). Because R developed under Unix, we must specify directory hierarchies using forward slashes or by doubling backslashes.

Example 2.13 $\text{}$
Here is the actual working directory of this (GitHub-linked) manuscript.

getwd()

[1] "C:/Users/ahoken/Documents/GitHub/Amalgam"

To establish a working directory file path to the Windows directory: C:\Users\User\Documents, I would type:

setwd("C:/Users/User/Documents")

setwd("C:\\Users\\User\\Documents")

$\blacksquare$

2.8 Saving and Loading Your Work

As noted in Ch 1, an R session is allocated with a fixed amount of memory that is managed in an on-the-fly manner. An unfortunate consequence of this is that if R crashes, all unsaved information from the work session will be lost. Thus, session work should be saved often. Note that R will not give a warning if you are writing over session files from the R console. The old file will simply be replaced. Three general approaches for saving non-graphics data are possible. These are: 1) saving the history, 2) saving objects, and 3) saving R script. All three of these operations can be greatly facilitated by using an R integrated development environment like RStudio (Section 2.10).

2.8.1 R History

To view the history (i.e., the commands that have been used in a session) one can use history(n) where n is the number of previous command lines one wishes to see²⁷. For instance, to see the last three commands, one would type²⁸:

history(3)

To save the session history in Windows one can use File$>$Save History or the function savehistory(). For instance, to save the session history to the working directory under the name history1, I could type:

savehistory(file = "history1.Rhistory")

We can view the code in this file from any text editor. To load the history from a previous session one can use File$>$Load History (non-Linux only) or the function loadhistory(). For instance, to load history1 I would type:

loadhistory(file = "history1.Rhistory")

To save the history at the end of (almost) every interactive Windows or Unix-alike R session, one can alter the .Rprofile file .Last function to include:

.Last <- function() if(interactive()) try(savehistory("~/.Rhistory"))

2.8.2 R Objects

To save all of the objects available in the current R-session one can use File$>$Save Workspace (non-Linux only), or simply type:

save.image()

This procedure saves session objects to the working directory as a nameless file using an .RData extension. The file will be opened, silently, with the inception of the next R- session, and cause objects used or created in the previous session to be available. Indeed, R will automatically execute all .RData files in the working directory for use in a session. Stored .RData files can also be loaded using File$>$Load Workspace (non-Linux only). One can also save .RData objects to a specific directory location and use a specific file name using: File$>$Save Workspace, or with the flexible function save(). R data file formats, including .rda, and .RData, (extensions for R data files), and .R (the format for R scripts), can be read into R using the function load(). Users new to a command line environment will be reassured by typing:

load(file.choose())

The function file.choose() will allow one to browse interactively for files to load using dialog boxes. Detailed procedures for importing (reading) and exporting (saving) data with a row and column format, and an explicit delimiter (e.g. .csv files) are described in Ch 3.

2.8.3 R Scripts

To save an R script as an source code file, it is best to use an Integrated Development Environment (IDE) compatible with R. R contains its own IDE, the R-editor, which is useful for writing, editing, and saving scripts as .r extension files (Fig 2.3). To access the R-editor go to File$>$New script (non-Linux only) or type the shortcut Ctrl + F + N (Windows or Linux) or Cmd + F + N (Mac OS) . Code written in the R-editor IDE can be sent directly to the R-console by copying and pasting or by selecting code and using the shortcut Ctrl + R (Windows and Linux) or Cmd + R (Mac OS).

Figure 2.3: The R-editor providing code for a famous computational exercise.

Aside from the R-editor, a number of other IDEs outside of R allow straightforward generation of R script files, and a direct link between text editors, that provide syntax highlighting for R code, and the R-console itself. These include RWinEdt (an R package plugin for WinEdt ), Tinn-R, a recursive acronym for Tinn is not Notepad, ESS (Emacs Speaks Statistics), Jupyter Notebook, a web-based IDE originally designed for Python, but useful for many languages, and particularly RStudio, which will be introduced later in this chapter²⁹.

Saved R scripts can be called and executed using the function source(). To browse interactively for source code files, one can type:

source(file.choose())

or go to File$>$Source R code.

2.9 Basic Mathematics

A large number of mathematical operators and functions are available with a conventional download of R.

Elementary mathematical operators, common mathematical constants, trigonometric functions, derivative functions, integration approaches, and basic statistical functions are shown in shown in Tables 2.3 - 2.9.

2.9.1 Elementary Operations

Elementary mathematical operations and functions (Table 2.3), and even those for specialized processes, can generally be applied to a wide variety of numeric object classes. For instance, the expression log(x) could be applied if x was a scalar (e.g., x = 3), or a collection of numbers, e.g., x = c(3, 7, 8). In the latter case, the natural logarithm would be be calculated for each element in x, and those transformed outcomes would be returned by the function. Notably, this form of intuitive scripting is a dramatic departure from approaches used by many other computer languages³⁰.

Operation	Function/Operator	To find:	We type:
addition	`+`	$2 + 2$	`2 + 2`
subtraction	`-`	$2 - 2$	`2 - 2`
multiplication	`*`	$2 \times 2$	`2 * 2`
division	`/`	$\frac{2}{3}$	`2/3`
modulo	`%%`	remainder of $\frac{5}{2}$	`5%%2`
integer division	`%/%`	$\frac{5}{2}$ without remainder	`5%/%2`
exponentiation	`^`	$2^3$	`2^3`
$\mid x \mid$	`abs(x)`	$\mid -23.7 \mid$	`abs(-23.7)`
round $x$ to $d$ digits	`round(x, digits = d)`	round $-23.71$ to 1 digit	`round(-23.71, 1)`
round $x$ up to closest whole num.	`ceiling(x)`	ceiling(2.3)	`ceiling(2.3)`
round $x$ down to closest whole num.	`floor(x)`	floor(2.3)	`floor(2.3)`
$\sqrt{x}$	`sqrt(x)`	$\sqrt{2}$	`sqrt(2)`
$\log_e{x}$	`log(x)`	$\log_e{5}$	`log(5)`
$\log_b{x}$	`log(x, base = b)`	$\log_{10}{5}$	`log(5, base = 10)`
$x!$	`factorial(x)`	$5!$	`factorial(5)`
$\binom{n}{x} = \frac{n!}{x!(n-x)!}$	`choose(n,x)`	$\binom{5}{2}$	`choose(5,2)`
$\Gamma(x)$	`gamma(x)`	$\Gamma(3.2)$	`gamma(3.2)`
$B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}$	`beta(a,b)`	$B(3,2)$	`beta(3,2)`
$\sum_{i=1}^{n}x_i$	`sum(x)`	sum of `x`	`sum(x)`
cumulative sum	`cumsum(x)`	cum. sum of `x`	`cumsum(x)`
$\prod_{i=1}^{n}x_i$	`prod(x)`	product of `x`	`prod(x)`
cumulative product	`cumprod(x)`	cum. prod. of `x`	`cumprod(x)`

2.9.2 Associativity and Precedence

Note that the operation:

2 + 6 * 5

[1] 32

is equivalent to $2 + (6 \cdot 5) = 32$. This is because the * operator gets higher priority (precedence) than +. Evaluation precedence can be modified with parentheses:

(2 + 6) * 5

[1] 40

In the absence of operator precedence, mathematical operations in R are (generally) read from left to right (that is, their associativity is from left to right) (Table 2.4). This corresponds to the conventional order of operations in mathematics. For instance:

2 + 2^(2 + 1)

[1] 10

Precedent	Operator	Description	Associativity
1	`^`	exponent	right to left
2	`%%`	modulo	left to right
3	`*` `/`	multiplication, division	left to right
4	`+` `-`	addition, subtraction	left to right

Example 2.14 $\text{}$
Here are some other simple mathematical examples. To solve $1/\sqrt{22!}$, I could type:

1/sqrt(factorial(22))

[1] 2.9827e-11

And to solve $\Gamma \left( \sqrt[3]{23\pi} \right)$, I could type:

gamma((23 * pi)^(1/3))

[1] 7.411

By default the function log() computes natural logarithms, i.e.,

log(exp(1))

[1] 1

The log() function can also compute logarithms to a particular base by specifying the base in an optional second argument called base. For instance, to solve the operation: $\log_{10}3 + \log_{3}5$, one could type:

log(3) + log(5)

[1] 2.7081

log(x = 3, base = 10) + log(x = 5, base = 3)

[1] 1.9421

$\blacksquare$

2.9.3 Constants

R allows easy access to most conventional constants (Table 2.5).

Operation	Operator/Function	To find:	We type:
$-\infty$	`-Inf`	$-\infty$	`-Inf`
$\infty$	`Inf`	$\infty$	`Inf`
$\pi = 3.141593 \dots$	`pi`	$\pi$	`pi`
$e = 2.718282 \dots$	`exp(1)`	$e$	`exp(1)`
$e^x$	`exp(x)`	$e^3$	`exp(3)`

2.9.4 Trigonometry

R assumes that the inputs for trigonometric functions are in radians. Of course degrees can be obtained from radians using $Degrees = Radians \times 180/\pi$, or conversely $Radians = Degrees \times \pi /180$ (Table 2.6). Note that there are no base-R functions for cotangent, secant or cosecant. However, for some angle $x$, measured in radians, these are readily obtained as: $\cot(x) = \cos(x)/sin(x)$, $\sec(x) = 1/\cos(x)$, and $\csc(x) = 1/\sin(x)$.

Operation	Operator/Function	To find:	We type:
$\text{cos}(x)$	`cos(x)`	$\text{cos}(3 \text{ rad.})$	`cos(3)`
$\text{sin}(x)$	`sin(x)`	$\text{sin}(45^{\circ})$	`sin(45 * pi/180)`
$\text{tan}(x)$	`tan(x)`	$\text{tan}(3 \text{ rad.})$	`tan(3)`
$\text{acos}(x)$	`acos(x)`	$\text{acos}(45^{\circ})$	`acos(45 * pi/180)`
$\text{asin}(x)$	`asin(x)`	$\text{asin}(3 \text{ rad.})$	`asin(3)`
$\text{atan}(x)$	`atan(x)`	$\text{atan}(45^{\circ})$	`atan(45 * pi/180)`
$\text{cosh}(x)$	`cosh(x)`	$\text{cosh}(3 \text{ rad.})$	`cosh(3)`
$\text{sinh}(x)$	`sinh(x)`	$\text{sinh}(45^{\circ})$	`sinh(45 * pi/180)`
$\text{tanh}(x)$	`tanh(x)`	$\text{tanh}(3 \text{ rad.})$	`tanh(3)`
$\text{cot}(x)$		$\text{cot}(3 \text{ rad.})$	`cos(3)/sin(3)`
$\text{sec}(x)$		$\text{sec}(3 \text{ rad.})$	`1/cos(3)`
$\text{csc}(x)$		$\text{csc}(3 \text{ rad.})$	`1/sin(3)`

2.9.5 Derivatives

The function D() finds symbolic and numerical derivatives of simple expressions. It requires two arguments, 1) a mathematical function specified as an object of class expression, and 2) the variable name in the differential (the denominator in the difference quotient).

Objects of class expression, can be created using the function expression(), and evaluated with the function eval()).

Example 2.15 $\text{}$
Here is an example of how the functions expression() and eval() can be used:

eval(expression(2 + 2))

[1] 4

Of course we wouldn’t bother to use expression() and eval() in such simple applications.

$\blacksquare$

Table 2.7 contains specific examples using D().

To find:	We type:
$\frac{d}{dx}5x$	`D(expression(5 * x), "x")`
$\frac{d^2}{dx^2} 5x^2$	`D(D(expression(5 * x^2), "x"), "x")`
$\frac{\partial}{\partial x} 5xy + y$	`D(expression(5 * x * y + y), "x")`

Example 2.16 $\text{}$
Thus, to solve: \[\frac{d}{dx} 20x^{-4}\] I could use:

e <- expression(20 * x^(-4))
D(e, "x")

20 * (x^((-4) - 1) * (-4))

Unfortunately, it is left to us to simplify the ugly output. That is, \[\begin{aligned} \frac{d}{dx}(20x^{-4}) &= \\ &= 20 \times (x^{(-4) - 1)} \times (-4))\\ &= -80x^{-5} \\ &= -\frac{80}{x^5} \end{aligned}\]

$\blacksquare$

Several other R functions provide tidier derivative results compared to D(), although they require the installation and loading of additional packages, not included in a conventional download of R. See Section 3.7 for a thorough introduction to R packages. For instance, the function Deriv(), from the package Deriv can be applied using two approaches³¹.

Under the first approach, a differentiable function is defined as an R function (see Ch 8) whose one argument is the variable name in the differential. This function is then used as the single required argument in Deriv().
With the second approach, a differentiable function is defined as a character string. This is then used as the first argument in Deriv(). The variable name in the differential is defined in a second argument.

Example 2.17 $\text{}$
To obtain the derivative in Example 2.16 using Deriv() we would first install the Deriv package (for instance using: install.packages("Deriv")) and load the package using:

library(Deriv) # loads Deriv

Under the first approach we could then type:

d <- Deriv(function(x) 20 * x^(-4))
d

function (x) 
-(80/x^5)

Note that the output, d, is a function, allowing one to obtain instantaneous slopes for specified x values.

d(c(-1, 2, 3, 5.2))

[1] 80.000000 -2.500000 -0.329218 -0.021041

Under the second approach, we could specify

Deriv("20 * x^(-4)", "x")

[1] "-(80/x^5)"

Note that the output is a character string.

Both approaches allow one to obtain higher order derivatives and partial derivatives. For instance,

Deriv(d) # second derivative

function (x) 
400/x^6

Deriv(Deriv("20 * x^(-4)", "x")) # second derivative

[1] "400/x^6"

D() results can also be simplified directly with function Simplify() from the package Deriv. For the current Example, one could use:

e <- expression(20 * x^(-4))
Simplify(D(e, "x"))

-(80/x^5)

$\blacksquare$

2.9.6 Integration

The function integrate solves definite integrals. It requires three arguments. The first is an R function defining the integrand. The second and third are the lower and upper bounds of integration.

Example 2.18 $\text{}$
To solve: \[\int^4_2 3x^2dx\] we could type:

f <- function(x){3 * x^2}
integrate(f, 2, 4)

56 with absolute error < 6.2e-13

$\blacksquare$

R functions are explicitly addressed in Ch 8.

2.9.7 Statistics

R, of course, contains a huge number of statistical functions. These will generally require sample data for summarization. Data can be brought into R from spreadsheet files or other data storage files (we will learn how to do this shortly). As we have learned, data can also be assembled in R. For instance,

x <- c(1, 2, 3)

Statistical estimators can be separated into point estimators, which estimate an underlying parameter that has a single true value (from a Frequentist viewpoint), and intervallic estimators, which estimate the bounds of an interval that is expected, preceding sampling, to contain a parameter at some probability (Aho 2014). Point estimators can be further classified as estimators of location, scale, shape, and order statistics (Table 2.8). Measures of location estimate the typical or central value from a sample. Examples include the arithmetic mean and the sample median. Measures of scale quantify data variability or dispersion. Examples include the sample standard deviation and the sample interquartile range (IQR). Shape estimators describe the shape (i.e., symmetry and peakedness) of a data distribution. Examples include the sample skewness and sample kurtosis. Finally, the $k$th order statistic of a sample is equal to its $k$th-smallest value. Examples include the data minimum, the data maximum, and other quantiles (including the median). Intervallic estimators include confidence intervals (Table 2.9). A huge number of other statistical estimating, modelling, and hypothesis testing algorithms are also available for the R environment. For guidance, see Venables and Ripley (2002), Aho (2014), and Fox and Weisberg (2019), among others.

Acronym	Function	Description	Estimator type
$\bar{x}$	`mean(x)`	arithmetic mean of $x$	location
	`mean(x, trim = t)`	trimmed mean of $x$ for $0 \leq t \leq 1$.	location
$GM$	`asbio::G.mean(x)`	geometric mean of $x$	location
$HM$	`asbio::H.mean(x)`	harmonic mean of $x$	location
$\tilde{x}$	`median(x)`	median of $x$	location order statistic
$mode(x)$	`asbio::Mode(x)`	mode of $x$	location
$s$	`sd(x)`	standard deviation of $x$	scale
$s^2$	`var(x)`	variance of $x$	scale
$cov(x,y)$	`cov(x, y)`	covariance of $x$ and $y$	scale
$r_{x,y}$	`cor(x, y)`	Pearson correlation of $x$ and $y$	scale
$IQR$	`IQR(x)`	interquartile range of $x$	scale order statistic
$MAD$	`mad(x)`	median absolute deviation of $x$	scale
$g_1$	`asbio::skew(x)`	skew of $x$	shape
$g_2$	`asbio::kurt(x)`	kurtosis of $x$	shape
$min(x)$	`min(x)`	min of $x$	order statistic
$max(x)$	`max(x)`	max of $x$	order statistic
$\hat{F}^{-1}(p)$	`quantile(x, prob = p)`	quantile of $x$ at lower-tailed probability $p$	order statistic

Function	Description
`asbio::ci.mu.z(x, conf, sigma)`	Conf. int. for $\mu$ at level `conf`. True SD = `sigma`.
`asbio::ci.mu.t(x, conf)`	Conf. int. for $\mu$ at level `conf`. $\sigma$ unknown.
`asbio::ci.median(x, conf)`	Conf. int. for true median at level `conf`.

2.10 RStudio

RStudio is an open source IDE for R (Fig 2.4). RStudio greatly facilitates writing R code, saving and examining R objects and history, and many other processes. These include, but are not limited to, documenting session workflows (Section 2.10.2), writing R package documentation (Section 10.5), calling and receiving code from other languages (Section 9.1.5), and even developing web-based graphical user interfaces (Section 11.6). RStudio can currently be downloaded at (https://posit.co/products/open-source/rstudio/). Like R itself, RStudio can be used with Windows, Mac, and Unix/Linux operating systems. Unlike R, RStudio has both freeware and commercial versions³². We will use the former here.

Figure 2.4: The RStudio logo.

RStudio is generally implemented using a four pane workspace (Fig 2.5). These panes will contain: 1) the code editor, 2) the R-console, 3) the environment and histories panel, and 4) the plots and other miscellany panel. Tabs in panels may vary to a small degree depending on the underlying character of the source code being edited, and whether an RStudio project is open (Section 2.10.1).

Figure 2.5: Interfaces for RStudio 2023.06.2 Build 561.

The RStudio Code Editor panel (Fig 2.5, Panel 1) allows one to create R scripts and even scripts for other languages that can be called to and from R (Ch 9). The code panel can also be used to create and edit session documentation files (see Section 2.10.2 below) and other important R file types. A new R script can be created for editing within the code editor by going to File$>$New$>$R Script. Commands from an R script can be sent to the R console using the shortcut Ctrl + Enter (Windows and Linux) or Cmd + Enter (Mac).
The R-console panel (Fig 2.5, Panel 2) by default, is identical in functionality to the R console of the most recent version of R on your workstation (assuming that all of the paths and environments are set up correctly on your computer). Thus, the console panel can be used directly for typing and executing R code, or for receiving commands from the code editor (Panel 1).
The Environments and History panel (Fig 2.5, Panel 3) can be used to: 1) show a list of R objects available in your R session (the Environment tab), or 2) show, search, and select from the history of all previous commands (History tab). This panel also provides an interface for point and click import of data files including .csv, .xls, and many other file formats (Import Dataset pulldown within the Environment tab).
The Plots and Miscellany panel (Fig 2.5, Panel 4) can be used to show: 1) files in the working directory, 2) a scrollable history of plots and image files, and 3) a list of available packages (via the Packages tab), with facilities for updating and installing packages. If a package is in the GUI list, then the package is currently loaded. Packages and their installation, updating, and loading are formally introduced in Section 3.7. The panel’s Files pulldown tab allows straightforward establishment of working directories (although this can still be done at the command line using setwd()) (Fig 2.7). The panel’s Help tap opens automatically when uses ? or help for particular R topics (Section 2.4).

CAUTION!

Be very careful when managing files in the Plots and Miscellany panel, as you can permanently delete files without (currently) the possibility of recovery from a Recycling Bin.

2.10.1 RStudio Project

An RStudio project can be be created via the File pulldown menu (Fig 2.7). A project allows all related files (data, figures, summaries, etc.) to be easily organized together by setting the working directory to be the location of the project .Rproj file.

2.10.2 Workflow Documentation

We can document workflow and simultaneously run/test R session code by either:

Creating an R Markdown³³ .rmd file that can be compiled to generate an .html, .pdf, or MS Word$^{\circledR}$ .doc file, or
Using Sweave, an approach that implements the LaTeX³⁴ document preparation system.

2.10.2.1 R Markdown

The R Markdown document processing workflow in RStudio is shown Fig 2.6. These steps are highly modifiable, but can also be run in a more or less automated manner, requiring little understanding of underlying processes.

The process of document creation in **R** Markdown. Functions in the package *rmarkdown* control conversion of .rmd files to Markdown .md files, using utilities in the package *knitr*. Pandoc first creates a .tex file when rendering LaTeX PDF documents.

Figure 2.6: The process of document creation in R Markdown. Functions in the package rmarkdown control conversion of .rmd files to Markdown .md files, using utilities in the package knitr. Pandoc first creates a .tex file when rendering LaTeX PDF documents.

Use of R Markdown and .rmd files requires the package rmarkdown (Allaire et al. 2024), which comes pre-installed in RStudio.

As an initial step, all underlying .rmd files must include a brief YAML³⁵ header (see below) containing document metadata. A nice summary of YAML features and options in R Markdown is provided in this cheatsheet. The remainder of the .rmd document will contain text written in Markdown syntax, and code chunks. The knit() function from package knitr Xie (2015), also installed with RStudio, executes all evaluable code within chunks, and formats the code and output for processing within Pandoc, a program for converting markup files from one language to another³⁶. Pandoc uses the YAML header to guide this conversion. As an example, if one has requested HTML output, the simple Markdown text: This is a script will be converted to the HTML formatted: <p>This is a script</p>. One can also write HTML script or CSS code³⁷ directly into an .rmd document (see Section 11.6). If the desired output is PDF, Pandoc will convert the .md file into a temporary .tex file, which is then processed by the LaTex typesetting system. Support for LaTeX can be found at its official website, and at a large number of informal user-driven venues, including Stack Exchange and Overleaf, an online LaTeX application. LaTeX will compile the .tex file into a .pdf file. In this process, the tinytex package (Xie 2024), which installs the stripped-down LaTeX distribution TinyTex, can be used.

Creating an R Markdown document is simple in RStudio. We first open an empty .rmd document by navigating to File $>$ New File $>$R Markdown (Fig 2.7).

Figure 2.7: Part of the RStudio File pulldown menu.

You will delivered to the GUI shown in Fig 2.8. Note that by default Markdown compilation generates an HTML document.

Figure 2.8: RStudio GUI for creating an R Markdown document.

The GUI opens a R Markdown (.rmd) skeleton document with a tentative YAML header.

Figure 2.9: YAML header to an R Markdown (.rmd) skeleton document.

Among other options³⁸, the default HTML output can be changed to one of:

output: pdf_document

to create a LaTex $\rightarrow$ PDF document, or

output: word_document

to create a Word$^{\circledR}$ document.

A potential concern with HTML documents is portability. Your R Markdown generated HTML may look fine when viewed from a browser program on the computer you used to create the document. This may not be true, however, if you export this file elsewhere, in the absence of a server host, and without a directory system containing necessary files and applications (see Garsiel (2018)) ³⁹. There are currently a number of inexpensive (or free) non-dynamic hosting services including GitHub.

2.10.2.1.1 Writing Text

Markdown is a relatively simple procedural markup language, allowing unformatted text to be written directly into an R Markdown document. There are particular scripting procedures, however, for creating headings, formatted text, and other content.

Pound signs (e.g., #, ##, ###) can be used as (increasingly nested) hierarchical section delimiters.
Italic, bold, and monospace code fonts can be specified by enclosing text in asterisks, double asterisks, and back ticks, respectively. That is, *italic*, **bold**, and `code` result in: italic, bold, and code.
Unordered lists can be created with newlines preceded with asterisks, *, and ordered lists can be specified with newlines beginning with numbers, e.g., 1., 2., etc.
Superscripts and subscripts can be generated using: ^script^ and ~script~, respectively. That is, `*r*^2^` and `CO~2~` produce: r² and CO₂.
Footnotes can be created using the format: `^[footnote]`.
Web hyperlinks can be created using: `[text](link)`. For instance, `[Amalgam of R](https://www.amalgamofr.org)` creates: Amalgam of R.

By default, RStudio shows R Markdown documents as raw source code. This format, however, can be changed to a more presentational markup (what you see is what you get) format by clicking on the Visual button that appears at the upper left hand side of RStudio Panel 1 (when an R Markdown document is open). The Visual interactive panel contains several interactive menus reminiscent of a word processor (Fig 2.10). These allow users to specify fonts, and to insert LaTeX equations (Section 2.10.2.1.3), section hierarchies, bulleted and numbered lists, and tables.

Figure 2.10: Additional RStudio menu options for an R Markdown document under the Visual viewing mode.

2.10.2.1.2 R Code in R Markdown Chunks

The knitr R package facilitates report-building in both HTML and LaTeX $\rightarrow$ PDF formats, within the framework of the rmarkdown package (Fig 2.6). Under knitr, R Markdown lines beginning ```{r } and ending ``` delimit an R code “chunk” to be potentially run in R.

Example 2.19 $\text{}$
For example, the chunk:

 ```{r }
 mean(c(1,2,3))
 ```

would prompt knitr to: 1) show the code in an appropriate highlighted style, 2) run the code in R (i.e., take the mean of the three numbers), and 3) print the evaluation result into a new output chunk.

$\blacksquare$

The chunk header, ```{r }, can be used to define additional options. These include the suppression of code evaluation, ```{r , eval = F}, suppression of code printing, ```{r , echo = F}, and/or elimination of the chunk from the after running, ```{r , include = F}. For a complete list of chunk options, run

str(knitr::opts_chunk$get())

If desired, global knitr options for chunks can be set using an initial R chunk or script (generally with the local chunk option include = F) that defines the components of knitr::opts_chunk.

Example 2.20 $\text{}$
For example, to suppress the default insertion of pound signs in lines preceding chunk evaluation output, throughout the entire knitted document, one could include the following initial chunk:

 ```{r, include = F }
 knitr::opts_chunk$set(comment = NA)
 ```

$\blacksquare$

Code chunks can be generated by going to Code$>$Insert Chunk or by using the RStudio shortcut Ctrl + Alt + I (Windows and Linux) or Cmd + Alt + I (Mac).

R code can also be invoked inline in a R Markdown document using the format:

`r some code`

For instance, I could seamlessly place three random numbers generated from a the continuous uniform distribution, $f(x) = UNIF(0,1)$, inline into text using:

`r runif(3)`

Here I run an iteration using “hidden” inline R code: 0.30689, 0.48092, 0.15112.

2.10.2.1.3 Equations

Inline equations for both R Markdown and Sweave (discussed below) can be specified under the LaTeX system, which uses dollar signs, $, to delimit equations. For instance, to obtain the inline equation: $P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)}$, i.e., Bayes theorem, I could type the LaTeX script into R Markdown:

$P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)}$

Display-style equations can be specified with two dollar signs, $$. For instance, $$P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)}$$ results in:

\[P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)}\]

A cheatsheet for LaTeX equation writing can be found here.

2.10.2.1.4 Figures

Probably the simplest way to place external figures into a document is by applying the function knitr::include_graphics() from within a chunk. The following R Markdown code would insert Fig1.jpg (contained in the working directory) into an R Markdown document.

 ```{r }
 knitr::include_graphics("Fig1.jpg")
 ```

Figures can also be generated from the execution of R plotting functions (see Ch 6, 7). For instance, the following R Markdown code would place a simple R-generated scatterplot into the document:

 ```{r }
 plot(1:10)
 ```

2.10.2.1.5 Tables

R Markdown tables can be created by specifying the following format (outside of a chunk).

First Header  | Second Header
------------- | -------------
Content Cell  | Content Cell
Content Cell  | Content Cell

Tables, however, can also be be generated by executing R functions within chunks. I generally use the function knitr::kable() to create R Markdown $\rightarrow$ Pandoc $\rightarrow$ HTML tables because it is relatively simple to use, and allows straightforward tabling of R output.

Example 2.21 $\text{}$
Table 2.10, shows data from the Loblolly dataset in the package datasets. The data track the growth of loblolly pine trees (Pinus taeda) with respect to seed type and age. The function head(), nested in kable(), allows one to access the first or last components of an R data storage object. By default, head() returns the first six values (in this case, the first six dataframe rows).

knitr::kable(head(Loblolly))

Table 2.10: Loblolly pine data.
	.1	.2

	height	age	Seed
1	4.51	3	301
15	10.89	5	301
29	28.72	10	301
43	41.74	15	301
57	52.70	20	301
71	60.92	25	301

$\blacksquare$

I often use functions in the package xtable to build R Markdown $\rightarrow$ Pandoc $\rightarrow$ LaTeX $\rightarrow$ PDF tables. Under this approach, one could create Table 2.10 using:

print(xtable::xtable(head(Loblolly)))

This method would also require that one use the command results = 'asis' in the chunk options.

One can even call for different table approaches on the fly. For instance, I could use the command eval = knitr::is_html_output()), in the options of a Markdown chunk when using table code that optimizes HTML formatting, and use eval = knitr::is_latex_output()) to create a table that optimizes LaTeX formatting.

Aside from knitr::kable() and xtable, there are many other R functions and packages that can be used to create R Markdown tables, particularly for HTML output. These include:

The kableExtra (Zhu et al. 2022) package extends knitr::kable() by including styles for fonts, features for specific rows, columns, and cells, and straightforward merging and grouping of rows and/or columns. Most kableExtra features extend to both HTML and PDF formats.
DT (Xie, Cheng, and Tan 2024), a wrapper for HTML tables that uses the JavaScript (see Section 11.4) library DataTables. Among other features, DT allows straightforward implementation in interactive Shiny apps (Section 11.6).
Like DT, the reactable package (Lin 2023) creates flexible, interactive HTML embedded tables. As with DT, reactable tables add complications when those interactives are considered as conventional tables in R markdown, with captions and referable labels.

Xie, Dervieux, and Riederer (2020) discuss several other alternatives.

Below I use the function reactable() from the reactable package to create a table with sortable columns and scrollable rows (Table 2.11).

# install.packages("reactable")
library(reactable)
reactable(Loblolly, pagination = FALSE, highlight = TRUE, height = 250)

Example 2.22 $\text{}$
An R Markdown (.rmd) skeleton file generated by RStudio (Figs 2.7-2.9) contains documentation text, interspersed with example R code in chunks. These been have been modified below to create a simple R markdown document for summarizing the Loblolly dataset (Fig 2.11).

Figure 2.11: An R Markdown (.rmd) file with documentation text and interspersed R code in chunks.

Note the use of echo = FALSE in the final chunk to suppress printing of R code. A snapshot of the knitted HTML is shown in Fig 2.12.

Figure 2.12: An HTML document knit from Markdown code in the previous figure. Note that code is displayed (by default) as well as executed.

$\blacksquare$

2.10.2.1.6 bookdown

A large number of useful auxiliary features are available for R Markdown, through the R package bookdown (Xie (2023)). These include an extended capacity for figure, table, and section numbering and referencing. The bookdown package is not included with RStudio, and will require installation using the code below. See Section 9.5.2 for more information on loading and installing packages.

install.packages("bookdown") # install bookdown package

To use bookdown we must modify the output: designation in the YAML header to have a bookdown-specific output. For instance,

output: bookdown::html_document2

to create an HTML document, or

output: bookdown::pdf_document2

to create a LaTeX $\rightarrow$ PDF document, or

output: bookdown::word_document2

to create an MS Word$^{\circledR}$ document⁴⁰.

Numbering R-generated plots and tables in R in bookdown requires specification of a chunk label after the language reference, e.g., r, in the chunk generating the plot ot table. Importantly, many table generating R functions (e.g., knitr::kable() and xtable::xtable(), see below) also contain a label argument that allows referencing and numbering.

Example 2.23 $\text{}$
In the chunk header below I use the label lobplot. Note that a space is included after r. Captions can be specified in the chunk header using the chunk option fig.cap or tab.cap for figures and tables, respectively. The option fig.cap is used below:

```{r lobplot, echo=FALSE, fig.cap= "Loblolly pine height versus age."}

$\blacksquare$

Cross-references within the text can be made using the syntax \@ref(type:label), where label is the chunk label and type is the environment being referenced (e.g., fig, tab, or eq). For Example 2.23, we might want to type something like: “see Figure \@ ref(fig:lobplot).” in some non-chunk component of the Markdown document.

Specification of a bookdown output format, will result in automated numbering of sections⁴¹. To turn this numbering off, one could modify the YAML output to be:

output:
  bookdown::html_document2:
    number_sections: false

The code indents shown above are important because YAML, like the language Python, uses significant indentation. To omit numbering for certain sections, one would retain the default bookdown output, and add {-} after the unnumbered section heading, e.g.,

# This section is unnumbered {-}

2.10.2.1.7 Additional Resources for R Markdown and Bookdown

The Posit website houses a number of useful R Markdown guides, including this brief introduction. Thorough descriptions of R Markdown are provided in Xie, Allaire, and Grolemund (2018) and Xie, Dervieux, and Riederer (2020). The latter text is currently available as an online resource. Thorough guidance for bookdown is provided in Xie (2016), which can be viewed as an open-source online document.

2.10.2.2 Sweave

Under the Sweave documentation approach, high quality PDF documents are generated from LaTeX .tex files, which in turn are created from Sweave .rnw files. A skeleton .rnw document can be generated in RStudio by going to File$>$New File$>$R Sweave⁴².

2.10.2.2.1 R code in Sweave chunks

Sweave chunks can be implemented using knitr-style formatting, or with formatting under the function Sweave() (Leisch 2002). Switching between these formats in RStudio requires altering options in Build$>$Configure Build Tools$>$Sweave.

In RStudio, Sweave code chunks are initiated which <<>>=, which serves as a chunk header, and are closed with @.

Example 2.24 $\text{}$
Including the chunk below in an .rnw file would: 1) cause the R source code to be printed in a LaTeX-rendered PDF, 2) run the code in R (the mean of the three number would be calculated), and 3) print the evaluated result in the output PDF.

 <<>>=
 mean(c(1,2,3))
 @

$\blacksquare$

Chunk options in Sweave() are often similar to those in knitr, but are more limited (see vignette("Sweave")).

Example 2.25 $\text{}$
In Fig 2.13 I create an .rnw file, based on an RStudio skeleton, with text and analyses reflecting those used with R Markdown in Example 2.22. We note that instead of the Markdown YAML header, we now have lines in the preamble defining the type of desired document (e.g., article) and the LaTeX packages needed for document compilation (e.g., amsmath). All non-chunk text, including figure and table captions and cross-referencing must follow LaTeX guidelines.

Figure 2.13: A Sweave (.rnw) file with documentation text and interspersed code in chunks.

Fig 2.14 shows a snapshot of the result, following automated .rnw $\rightarrow$ knitr $\rightarrow$ LaTeX $\rightarrow$ .pdf compilation in RStudio.

Figure 2.14: A .pdf document resulting from compilation of Sweave code in the previous figure.

$\blacksquare$

2.10.2.3 Purl

R chunk code can be extracted from an .rmd or an .rnw file using the function knitr::purl(). For instance, assume that the R Markdown loblolly pine summary shown in Fig 2.11 is saved in the working directory under the name lob.rmd. Code from the file will be extracted to a script file called lob.R, located in the working directory, if one types:

purl("lob.rmd")

Exercises

Create an R Markdown document to contain your homework assignment. Modify the YAML header to allow numbering of figures and tables, but not sections. This will require use of the bookdown package (see Section 2.10.2.1.6). Install bookdown at the R console (not within a document chunk). To test the formatting, perform the following steps:
1. Create a section header called Question 1 and a subsection header called (a). Under (a) type "completed".
2. Under the subsection header (b), insert a chunk, and create a simple plot of points at the coordinates: $\{1,1\}$, $\{2,2\}$, $\{3,3\}$, by typing the code: plot(1:3) in the chunk. Create a label for the chunk, and a create caption for the plot using the knitr chunk option, fig.cap.
3. Under the subsection header (c), create a cross reference for the plot from (b) (see Section 2.10.2.1.6).
4. Under the subsection header (d), write the equation, $y_i = \hat{\beta}_0 + \hat{\beta}_1x_i + \hat{\varepsilon_i}$, using LaTeX. As noted earlier, a LaTeX equation cheatsheet can be found here.
5. Render (knit) the final document as either an .html file or a .doc file. Include other assigned exercises for this Chapter as directed, using the general formatting approach given in Question 1.
Perform the following operations.
1. Leave a note to yourself.
2. Create and examine an object called x that contains the numeric entries 1, 2, and 3.
3. Make a copy of x called y.
4. Show the class of y.
5. Show the base type of y.
6. Show the attributes of y.
7. List the current objects in your work session.
8. Identify your working directory.
Distinguish R expressions and assignments.
Sometimes R reports unexpected results for its classes and base types.
1. Create x <- factor("a","a","b") and show the class of x.
2. Type ?factor. What is a factor in R?
3. Show the base type of x? Is this surprising? Why? Type ?integer. What is an integer in R?
Solve the following mathematical operations using R.
1. $1 + 3/10 + 2$
2. $(1 + 3)/10 + 2$
3. $\left(4 \cdot \frac{(3 - 4)}{23}\right)^2$
4. $\log_2(3^{1/2})$
5. $3\boldsymbol{x}^3 + 3\boldsymbol{x}^2 + 2$ where $\boldsymbol{x} = \{0, 1.5, 4, 6, 8, 10\}$
6. $4(\boldsymbol{x} + \boldsymbol{y})$ where $\boldsymbol{x} = \{0, 1.5, 4, 6, 8\}$ and $\boldsymbol{y} = \{-2, 0.5, 3, 5, 8\}$.
7. $\frac{d}{dx} \tan(x) 2.3 \cdot e^{3x}$
8. $\frac{d^2}{dx^2} \frac{3}{4x^4}$
9. $\int_3^{12} 24x + \ln(x)dx$
10. $\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx$ (i.e., find the area under a standard normal pdf).
11. $\int_{-\infty}^{\infty}\frac{x}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx$ (i.e., find $E(X)$ for a standard normal pdf).
12. $\int_{-\infty}^{\infty}\frac{x^2}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx$ (i.e., find $E(X^2)$ for a standard normal pdf).
13. Find the sum, cumulative sum, product, cumulative product, arithmetic mean, median and variance of the data x = c(0, 1.5, 4, 6, 8, 10).
The velocity of the earth’s rotation on its axis at the equator, $E$, is approximately 1674.364 km/h, or 1040.401 m/h⁴³. We can calculate the velocity of the rotation of the earth at any latitude with the equation, $V = \cos($latitude$^\text{o}) \times E$. Using R, simultaneously calculate rotational velocities for latitudes of 0,30,60, and 90 degrees north, or south, latitude (they will be the same). Remember, the function cos() assumes inputs are in radians, not degrees.

1 Welcome to R

3 Data Objects, Packages, and Datasets

Operation	Function/Operator	To find:	We type:
addition	`+`	\(2 + 2\)	`2 + 2`
subtraction	`-`	\(2 - 2\)	`2 - 2`
multiplication	`*`	\(2 \times 2\)	`2 * 2`
division	`/`	\(\frac{2}{3}\)	`2/3`
modulo	`%%`	remainder of \(\frac{5}{2}\)	`5%%2`
integer division	`%/%`	\(\frac{5}{2}\) without remainder	`5%/%2`
exponentiation	`^`	\(2^3\)	`2^3`
\(\mid x \mid\)	`abs(x)`	\(\mid -23.7 \mid\)	`abs(-23.7)`
round \(x\) to \(d\) digits	`round(x, digits = d)`	round \(-23.71\) to 1 digit	`round(-23.71, 1)`
round \(x\) up to closest whole num.	`ceiling(x)`	ceiling(2.3)	`ceiling(2.3)`
round \(x\) down to closest whole num.	`floor(x)`	floor(2.3)	`floor(2.3)`
\(\sqrt{x}\)	`sqrt(x)`	\(\sqrt{2}\)	`sqrt(2)`
\(\log_e{x}\)	`log(x)`	\(\log_e{5}\)	`log(5)`
\(\log_b{x}\)	`log(x, base = b)`	\(\log_{10}{5}\)	`log(5, base = 10)`
\(x!\)	`factorial(x)`	\(5!\)	`factorial(5)`
\(\binom{n}{x} = \frac{n!}{x!(n-x)!}\)	`choose(n,x)`	\(\binom{5}{2}\)	`choose(5,2)`
\(\Gamma(x)\)	`gamma(x)`	\(\Gamma(3.2)\)	`gamma(3.2)`
\(B(a,b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a + b)}\)	`beta(a,b)`	\(B(3,2)\)	`beta(3,2)`
\(\sum_{i=1}^{n}x_i\)	`sum(x)`	sum of `x`	`sum(x)`
cumulative sum	`cumsum(x)`	cum. sum of `x`	`cumsum(x)`
\(\prod_{i=1}^{n}x_i\)	`prod(x)`	product of `x`	`prod(x)`
cumulative product	`cumprod(x)`	cum. prod. of `x`	`cumprod(x)`

Operation	Operator/Function	To find:	We type:
\(-\infty\)	`-Inf`	\(-\infty\)	`-Inf`
\(\infty\)	`Inf`	\(\infty\)	`Inf`
\(\pi = 3.141593 \dots\)	`pi`	\(\pi\)	`pi`
\(e = 2.718282 \dots\)	`exp(1)`	\(e\)	`exp(1)`
\(e^x\)	`exp(x)`	\(e^3\)	`exp(3)`

Operation	Operator/Function	To find:	We type:
\(\text{cos}(x)\)	`cos(x)`	\(\text{cos}(3 \text{ rad.})\)	`cos(3)`
\(\text{sin}(x)\)	`sin(x)`	\(\text{sin}(45^{\circ})\)	`sin(45 * pi/180)`
\(\text{tan}(x)\)	`tan(x)`	\(\text{tan}(3 \text{ rad.})\)	`tan(3)`
\(\text{acos}(x)\)	`acos(x)`	\(\text{acos}(45^{\circ})\)	`acos(45 * pi/180)`
\(\text{asin}(x)\)	`asin(x)`	\(\text{asin}(3 \text{ rad.})\)	`asin(3)`
\(\text{atan}(x)\)	`atan(x)`	\(\text{atan}(45^{\circ})\)	`atan(45 * pi/180)`
\(\text{cosh}(x)\)	`cosh(x)`	\(\text{cosh}(3 \text{ rad.})\)	`cosh(3)`
\(\text{sinh}(x)\)	`sinh(x)`	\(\text{sinh}(45^{\circ})\)	`sinh(45 * pi/180)`
\(\text{tanh}(x)\)	`tanh(x)`	\(\text{tanh}(3 \text{ rad.})\)	`tanh(3)`
\(\text{cot}(x)\)		\(\text{cot}(3 \text{ rad.})\)	`cos(3)/sin(3)`
\(\text{sec}(x)\)		\(\text{sec}(3 \text{ rad.})\)	`1/cos(3)`
\(\text{csc}(x)\)		\(\text{csc}(3 \text{ rad.})\)	`1/sin(3)`

To find:	We type:
\(\frac{d}{dx}5x\)	`D(expression(5 * x), "x")`
\(\frac{d^2}{dx^2} 5x^2\)	`D(D(expression(5 * x^2), "x"), "x")`
\(\frac{\partial}{\partial x} 5xy + y\)	`D(expression(5 * x * y + y), "x")`

Acronym	Function	Description	Estimator type
\(\bar{x}\)	`mean(x)`	arithmetic mean of \(x\)	location
	`mean(x, trim = t)`	trimmed mean of \(x\) for \(0 \leq t \leq 1\).	location
\(GM\)	`asbio::G.mean(x)`	geometric mean of \(x\)	location
\(HM\)	`asbio::H.mean(x)`	harmonic mean of \(x\)	location
\(\tilde{x}\)	`median(x)`	median of \(x\)	location order statistic
\(mode(x)\)	`asbio::Mode(x)`	mode of \(x\)	location
\(s\)	`sd(x)`	standard deviation of \(x\)	scale
\(s^2\)	`var(x)`	variance of \(x\)	scale
\(cov(x,y)\)	`cov(x, y)`	covariance of \(x\) and \(y\)	scale
\(r_{x,y}\)	`cor(x, y)`	Pearson correlation of \(x\) and \(y\)	scale
\(IQR\)	`IQR(x)`	interquartile range of \(x\)	scale order statistic
\(MAD\)	`mad(x)`	median absolute deviation of \(x\)	scale
\(g_1\)	`asbio::skew(x)`	skew of \(x\)	shape
\(g_2\)	`asbio::kurt(x)`	kurtosis of \(x\)	shape
\(min(x)\)	`min(x)`	min of \(x\)	order statistic
\(max(x)\)	`max(x)`	max of \(x\)	order statistic
\(\hat{F}^{-1}(p)\)	`quantile(x, prob = p)`	quantile of \(x\) at lower-tailed probability \(p\)	order statistic

Function	Description
`asbio::ci.mu.z(x, conf, sigma)`	Conf. int. for \(\mu\) at level `conf`. True SD = `sigma`.
`asbio::ci.mu.t(x, conf)`	Conf. int. for \(\mu\) at level `conf`. \(\sigma\) unknown.
`asbio::ci.median(x, conf)`	Conf. int. for true median at level `conf`.

2 Some Basics

2.1 First Steps

2.2 First Operations

2.2.1 Use Your Scroll Keys

2.2.2 Note to Self: #

2.2.3 Unfinished Commands

2.3 Expressions and Assignments

2.3.1 Functions and their Arguments

2.3.2 Naming Objects

2.3.3 Listing Objects

2.3.4 Combining Data

2.3.5 Object Classes

2.3.6 Object Base Types

2.3.7 Object Attributes

2.4 Getting Help

2.4.1 help() and ?

2.4.2 demo() and example()

2.4.3 Vignettes

2.5 Keyboard Shortcuts

2.6 Options

2.6.1 Advanced Options

2.7 The Working Directory

2.8 Saving and Loading Your Work

2.8.1 R History

2.8.2 R Objects

2.8.3 R Scripts

2.9 Basic Mathematics

2.9.1 Elementary Operations

2.9.2 Associativity and Precedence

2.9.3 Constants

2.9.4 Trigonometry

2.9.5 Derivatives

2.9.6 Integration

2.9.7 Statistics

2.10 RStudio

2.10.1 RStudio Project

2.10.2 Workflow Documentation

2.10.2.1 R Markdown

2.10.2.1.1 Writing Text

2.10.2.1.2 R Code in R Markdown Chunks

2.10.2.1.3 Equations

2.10.2.1.4 Figures

2.10.2.1.5 Tables

2.10.2.1.6 bookdown

2.10.2.1.7 Additional Resources for R Markdown and Bookdown

2.10.2.2 Sweave

2.10.2.2.1 R code in Sweave chunks

2.10.2.3 Purl

Exercises

2.2.2 Note to Self: `#`

2.4.1 `help()` and `?`

2.4.2 `demo()` and `example()`