Preface

This book is contracted to Chapman & Hall/CRC, and will be officially published in 2026. It is currently a draft. Comments are welcome at GitHub or by email. The book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

What this book is about

This book explores the ever expanding universe of R. Specifically, it considers:

  • The historical development of the R language, the R engine, and the installation of R (Ch 1)

  • The creation of R objects and their fundamental characteristics (Ch 2)

  • R data storage entities, and the import and export of user data files (Ch 3)

  • Data management approaches using base R (Ch 4) and the tidyverse (Ch 5)

  • R approaches to graphics, including base plotting methods (Ch 6) and the ggplot2 package (Ch 7)

  • R functions (Ch 8) including loops, and the creation of user-defined classes and generic methods

  • Interfacing other languages (e.g., C, Fortran, C++, SQL, Python) and software environments to and from R (Ch 9)

  • Building custom R packages (Ch 10)

  • R Interactive interfaces and web applications including approaches from the packages tcltk, plotly and shiny (Ch 11)

  • The fundamental ways that R interacts with your computer (Ch 12)

While this book covers a lot of ground, clearly many other topics could be considered. Subjects explored are those I have found to be particularly useful or interesting during my 20+ years of using R as a biologist and statistician. Chapters concerning advanced topics (i.e., Chs 8-12) are intended to be starting points for further exploration, and the reader is directed to additional resources when necessary.

The book emphasizes R as an important computer language. While ignored in many phenologies of computer languages (e.g., Boutin et al. 2002), R has had a large, devoted following for decades and its computational engine and language can be clearly linked to seminal concepts and advances in computer science. Further, from its inception R has been a tool for metaprogramming wherein code is shared and modified programmatically. For instance R has a wide variety of widely used APIs for languages like C, Fortran, C++, Java, Python, and many other others.

Individuals from the natural sciences, particularly biologists, are likely to find this book more useful than individuals from other backgrounds because coding examples and applications are generally biological. Non-biologists may find, however, that examples readily extend to other settings.

What this book is not about

Notably, although statistics is the primary focus/purpose of R, the primary emphasis of this book is not statistics. Instead I focus on the R language, and the characteristics, capabilities, and extensions of the R system. I take this approach because: 1) coverage of non-statistical topics is challenging in and of itself, and 2) the responsible introduction of statistical algorithms from any program or language (including R) should be accompanied by detailed information concerning the statistical procedures. Many pedagogic resources exist for the statistical application of R. These include: Aho (2014) (the pedagogic statistical companion to this book), Venables and Ripley (2002), (Faraway 2004, 2016), Crawley (2012), and Fox and Weisberg (2019), among others. It should be noted that while this text does not focus on inferential statistical methods, it does emphasize methods for handling, summarizing and displaying empirical data, and these steps serve as a companion and prerequisite to formal inferential analyses.

Distinguishing Characteristics of This Book

Many other sources have emphasized fundamental programming aspects of R, while largely ignoring statistics, including seminal texts (e.g., Chambers 2008, 2020; Wickham 2016, 2021), and definitive CRAN manuals (R Core Team 2024a, 2024b, 2024c), or have focused on particular, potentially non-statistical R attributes, including graphics (Wickham 2016; Murrell 2019) and web-based applications (Wickham 2021; Sievert 2020). This book is a brave/foolish attempt to amalgamize and distill this disparate information, while occasionally emphasizing topics earlier works have ignored. For instance, Wickham (2019) admirably emphasizes many foundational and advanced programming ideas in R, but does not thoroughly consider some important programming extensions, including powerful syntheses with Python and Tcl. Unlike many other texts, this book also adheres to the format of a textbook, with numerous worked (often biological) examples, and exercises at the end each chapter.

Conventions

This document has been created with Windows users of R in mind. Windows is currently the most widely used operating system for desktop computers and laptops by a wide margin (Wikipedia 2025b)1. In the vast majority of cases instructions and examples provided here will be extendable to other operating systems. In cases when this is not true I note steps to address these inconsistencies.

Several conventions are followed throughout the text. R package names and important terms are italicized. R function names, function arguments and objects are written in blocked Courier font. Functions and operations are often written into “chunks” whose contents are readily copied to a clipboard using an icon located at the top right of the chunk (HTML versions of book only). For example:

print("hello world")

The output from an evaluated chunk is generally printed immediately below. For example:

[1] "hello world"

If you are reading an HTML version of this document generated using bs4_book(), then R function names will generally be hyperlinked to their documentation. For example: print().

Acknowledgements and Corrections

I thank individuals who have reviewed/edited this book in various forms including Lauren Tucker and Adam Zambie. Corrections and comments are welcome, and can be sent using the book’s GitHub site.