Workshop: Importing & Exporting Data in R

Author

Workshop Guide

1. Introduction: The Data “Commute”

In our last session, we got RStudio set up. But data rarely starts in R. It lives in Excel files, CSVs, SPSS files, or on the web.

Getting data into R (importing) and getting your results out of R (exporting) is a daily task for any data analyst.

The Problem We Saw Last Session

The biggest single frustration for new R users is the “File Not Found” error. This is almost always a Working Directory problem. You try to read a file, but R is “standing” in the wrong folder.

Today, we’re going to learn a workflow that solves this problem permanently and makes importing and exporting a breeze.

2. Setup: Install Your Toolset

We’ll need a few packages. Remember, you only install.packages() once (like downloading an app). We’re installing the whole tidyverse because it’s so common, plus rio and here.

# Run this in your Console, NOT your script
install.packages("tidyverse")
install.packages("rio")
install.packages("here")

Now, let’s load them in our script (like opening the apps).

# We'll load these at the top of our script
library(tidyverse) # Loads readr, readxl, and more
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rio)
library(here)
here() starts at /Users/drpakhare/Dropbox/R Workshop 2025 AIIMS Bhopal

3. The “Good”: Base R Functions

R comes with built-in functions for reading data. The most common is read.csv().

  • Function: read.csv()
  • Pros: It’s built-in, no package needed.
  • Cons: It’s slower, can be fussy with data types, and only works for .csv files.
# THE BAD WAY: An "absolute" path
# This code is brittle and will BREAK on your computer.
my_data <- read.csv("C:/Users/Sarah/Desktop/My_Project/data/data.csv")

# THE "OK" WAY: A "relative" path
# This ONLY works if your Working Directory is correct.
my_data <- read.csv("data/data.csv")

This is fine, but if you have an Excel file, you’re stuck. This leads us to the next level.


4. The “Better”: The Tidyverse Approach

The tidyverse provides a set of modern, fast, and consistent tools for data import.

  • read_csv() (from the readr package) is the modern replacement for read.csv().
  • read_excel() (from the readxl package) is the standard for reading Excel files.

A. Reading CSVs with read_csv()

Notice the underscore! read_csv() is much faster and smarter than read.csv().

# The Tidyverse way to read a CSV
my_csv_data <- read_csv("data/my_data.csv")

B. Reading Excel with read_excel()

This is the real workhorse. An Excel file can have multiple sheets, so you need to be specific.

# 1. Read the default sheet (usually the first one)
my_excel_data <- read_excel("data/my_workbook.xlsx")

# 2. Read a specific sheet by its name
my_sales_data <- read_excel(
  "data/my_workbook.xlsx",
  sheet = "Sales_Data"
)

# 3. Read a specific sheet by its position (e.g., the 3rd sheet)
my_inventory_data <- read_excel(
  "data/my_workbook.xlsx",
  sheet = 3
)
This is a ‘Better’ Workflow, But…

This is a huge improvement! The functions are fast and consistent.

But we still have two problems: 1. We’re still vulnerable to the “File Not Found” error if our Working Directory is wrong. 2. We still have to remember which function to use (read_csv, read_excel, read_spss from the haven package, etc.).


5. The “Best”: The rio + here Combo

This is the workflow we recommend for all your projects. It solves both problems at once.

  • here solves the “Where is my file?” problem.
  • rio solves the “Which function do I use?” problem.

Step 1: Solving “Where?” with here

The here package has one main job: find your .Rproj file and build a path from there. No matter where your script is, here::here() always starts from the project’s “home base.”

Analogy: The ‘Home Base’ Button

Think of here() as a “Home Base” button in a video game. No matter where you are on the map (e.g., in a scripts/analysis sub-folder), here::here() instantly teleports you back to your home base (your .Rproj file).

From there, you just give simple directions: “go into the data folder and get my_data.csv.”

Step 2: Solving “What?” with rio

The rio package has one “magic” function: import(). It’s a “Universal Translator” for data.

  • You give it any file path.
  • It looks at the file extension (like .csv, .xlsx, .sav, .json).
  • It automatically uses the correct import package (like readr, readxl, haven) behind the scenes to read the data.

Putting It All Together: The Golden Workflow

Now we combine them. This is the code you should use 99% of the time.

Your New Workflow: Robust & Simple

This code is robust, shareable, and works for (almost) any file type.

# --- YOUR NEW WORKFLOW ---

# 1. IMPORTING A CSV FILE
# rio::import() sees ".csv" and uses a fast CSV reader
my_csv <- rio::import(
  here::here("data", "my_data.csv")
)

# 2. IMPORTING AN EXCEL FILE (SPECIFIC SHEET)
# rio::import() sees ".xlsx" and uses read_excel()
# It's smart enough to pass the 'sheet' argument!
my_sales <- rio::import(
  here::here("data", "my_workbook.xlsx"),
  sheet = "Sales_Data"
)

# 3. IMPORTING AN SPSS FILE
# rio::import() sees ".sav" and uses haven::read_spss()
my_spss_data <- rio::import(
  here::here("data", "survey_data.sav")
)

What about Exporting?

It’s just as easy with rio::export(). rio figures out the file type you want from the file name you provide.

# Take our 'my_sales' R data frame...

# ...and save it as a new, clean CSV file
rio::export(
  my_sales,
  here::here("output", "clean_sales_data.csv")
)

# ...or save it as a new Excel file
rio::export(
  my_sales,
  here::here("output", "clean_sales_data.xlsx")
)

6. Session Recap: The “Good, Better, Best”

  1. “Good” (Base R): Use read.csv(). It works, but it’s basic and fragile.
  2. “Better” (Tidyverse): Use read_csv() and read_excel(). These are fast and powerful, but you must manage file types and file paths manually.
  3. “Best” (Our Recommendation):
    • Always use RStudio Projects (.Rproj file).
    • Always build file paths using here::here().
    • Always use rio::import() and rio::export() to read and write your files.

This simple rio + here combo makes your code shareable, robust, and incredibly easy to read.