Learn R through examples (2024)

Xijin Ge, Jianli Qi, and Rong Fan

2022-03-21

Aimed at total beginners, this book is written based on the philosophy that people learn faster from examples.Instead of explaining the rules, the book primarily centers on analyzing several datasets from the very beginning.So this is an alternative to traditional, more rigorous textbooks on R programming.We start with small and clean datasets and gradually transition into big, messy ones.With each dataset, we hope to tell a story through the analysis.We invite you, our courageous reader, to take on this journey with us.Motivated readers, such as biologists, could quickly work through this book and learn by themselves.I would encourage you to type in the example code and see the outputs. Then work on the challenges and exercises.

It originally started as materials for 2-hour hands-on workshops intended togive a quick introduction/demonstration for students and researchers new to R.The workshop has been given many times to different audiences ranging from high-school studentsto mathematicians. For a 2-hour session, I have to keep it gentle, interactive, and fun,sometimes at the expense of rigor. Instead of explaining all the rules, grammar, and syntax,I found it is easier to focus on one dataset and walk them through some of the analyses possible with R.This material later evolved into a one-credit online class and then a three-credit course.We stick with the unconventional approach of focusing on datasets and examples.

Another feature of this book is that we review the statistical concepts involved.R is a language for statistical computing, thus can not be detached from the context.

Learn R through examples (1)

Coding and cooking.

If this is your first time coding, consider it a process of writing a recipe.Your goal is to provide clear, step-by-step instructions to help a 10-year-old turn rawingredients (data) into delicious pasta (results). A good recipe should be used againto make pasta from the same materials, just like computer programs could process dataof the exact specifications. Millions of people share their code on repositories like GitHub.

Free, powerful, and welcoming, the R programming environment is a wonderful kitchen.It is interactive and easy to learn for beginners. In this kitchen, you can find manytools (a knife) and complex appliances (stove); in R, we call them functions,created by others (sometimes painstakingly over many years) and ready to beused to process data. We need to learn the commonly used R functions,just like we need to know how to use a knife. Each kitchen toolhas its instruction manuals, but people rarely read them. The same thinggoes with R functions, as most people learn from example code provideby others on sites like StackOverflow.

If you want to make a veggie smoothie, but the recipe requires a fancyblender. You can go to a marketplacesuch as Amazon to buy one.Similarly, with the R programming language, we can download additional Rpackages from The Comprehensive R Archive Network (CRAN),a FREE marketplace where tens of thousands of people contribute.The open and collaborative user community is uniquely productive.People build on top of each other, providing increasingly complexfunctionalities with simple interfaces. There are R packages that canhelp you create complex charts, write a book (like this one),host a website, or even find a girlfriend (just kidding).Imagine a free appliance that can turn uncooked chicken,vegetables, oil, and spices, into delicious Kung Pao Chicken!That is precisely how I feel every time I use other people’sR packages to analyze genomic data.

In your kitchen, you also find jars, dishes, salt dispensers, pots, and so on;we use suitable containers for different ingredients or foods. Even though somecontainers are only needed to store intermediate products, it is essential toknow what kind of containers there are before starting to cook. In programming,we have different types of pre-defined data types. A scalar variable can holdone number, some text(strings), or just a true/false indicator (logical values).A vector contains a sequence of scalars of the same kind. With rows and columnslike an Excel spreadsheet, a data frame can be considered multiple vectorsof the same length. In computer programming, we need to learn these data structures.Common R data structures include scalars, vectors, matrices, data frames, and lists.

When I just started cooking, I always hated it when people or recipes say somethinglike “a little” olive oil, like this recipe in the picture above. Without any experience,I have no idea whether that is one drip, a teaspoon, or 1 cup of olive oil!The 10-year-old we want to write a recipe for might not even know how small“small pieces” are or even what “boiling water” looks like. Computers are stupidmachines that can run calculations faithfully and fastly. They have no common sense whatsoever,unlike the “computers” in history, who are people that can calculate, either mentally orwith mechanical calculators. When programming, we need to (1) provide clear, specific commandsat each step and (2) define the correct sequence of operations, considering exceptionalscenarios such as data being zero or missing.

Just like writing a recipe, the process of programming can be frustrating. Patience andtrial-and-error is the only solution. You asked your 6-year-old baby sister to help peelthe carrots. Before moving on to the chopping step, you need to look at these carrots tosee if they are appropriately peeled. One of the main things you can do in debugging isto stop and look at the intermediate products. The previous steps might not be carriedout correctly, even though you think your instructions are clear and correct.Sometimes we have typos in the code or forget to pass on the right inputs.We can print out the data and take a look. If the data is large, we examinethe first few rows or even just the number of rows and columns.The intermediate data objects are created in the computer memory as you execute your code.The coding process is the step-by-step creation and modification of data objects in memory.

Many students have contributed to this material. Notably, Quazi Irfan, who worked asa teaching assistant, fixed many errors and gave constructive feedback. In the fall of2018, a group of highly motivated students in the STAT 442 Exploratory Data Analysisworked on some of the datasets presented here. They are Samuel Ivanecky, Kory Heier,Audrey Bunge, Jacie McDonald, Shae Olson, Nathan Thirsten, and Alex Wieseler.Some of the plots in this book are inspired by them.

The best way to learn R is to use it. This is similar to learning a forgein language.After the first two chapters, it is entirelypossible to start working on a your own dataset. You do not need to learn everything first.That is nearly impossible, as the R community produces extremely useful and cool packages every day.You do not need to finish all the chapters of this book.Feel free tosearch for and steal some example code. Even programmers with 20 years of experienceacknowledged that they still google basic functions daily.If you do not have a dataset, you can find one from sourcessuch as TidyTuesday.

Chapter 3 is a more traditional R programming content that explains the data objects andcommon functions. If you are learning by yourself, you do not need to feel guilty about skipping it.You can go through the later chapters of the book quickly.There is no need to memorize the functions. Many students,
even the authors, keep coming back to the later chaptersto recall how a particular plots are generated. Once you go through this book, you can use it as a reference.

This is still a work in progress. The later chapters, in particular, need to be extensively edited.Any comments and suggestions to make this draft better would be welcome.This includes typos, errors, and organizational issues. The best place to reachout is through the GitHub issues page.If you do not like to create yet another account, you can email us Xijin.Ge@sdstate.edu.

Learn R through examples (2024)
Top Articles
Latest Posts
Article information

Author: Neely Ledner

Last Updated:

Views: 6189

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Neely Ledner

Birthday: 1998-06-09

Address: 443 Barrows Terrace, New Jodyberg, CO 57462-5329

Phone: +2433516856029

Job: Central Legal Facilitator

Hobby: Backpacking, Jogging, Magic, Driving, Macrame, Embroidery, Foraging

Introduction: My name is Neely Ledner, I am a bright, determined, beautiful, adventurous, adventurous, spotless, calm person who loves writing and wants to share my knowledge and understanding with you.