Importing Various Data Files In R
Hi there. This post is a quick overview about importing various data file types into R.

Sections
- Reading In .csv Files
- Importing .data Files
- The
haven
Package For Importing SPSS, STATA and SAS Files - Importing Excel (.xlsx) Files
- References
1) Reading In .csv Files
I think that one of the easiest file types to work with in terms of file importing is the .csv file type. For .csv files use the code
read.csv()
command in R.
> # .csv files
>
> # Ref: http://introcs.cs.princeton.edu/java/data/
>
> countries_data <- read.csv("http://introcs.cs.princeton.edu/java/data/countries.csv",
+ header = TRUE, sep = ",")
> head(countries_data)
UN.code.number X2.letter.ISO.abbreviation X3.letter.ISO.abbreviation name
1 4 AF AFG Afghanistan
2 8 AL ALB Albania
3 12 DZ DZA Algeria
4 20 AD AND Andorra
5 24 AO AGO Angola
6 28 AG ATG Antigua and Barbuda
capital
1 Kabul
2 Tirana
3 Algiers
4 Andorra la Vella
5 Luanda
6 St. John's
2) Importing .data Files
I do visit this website from time to time for some sample datasets to play around with in R and for Python. Some of the datasets do have .csv files and this weird data file type I have never seen before. This data file type is .data
.
### .data files
# Source: UCI Machine Learning Repository
# Read in the .data file:
# https://stackoverflow.com/questions/21101927/read-files-with-extension-data-into-r
url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult-stretch.data"
balloons_data <- read.table(url, sep = "\t", dec = ",") # Reading in a .data file.
head(balloons_data)
V1
1 YELLOW,SMALL,STRETCH,ADULT,T
2 YELLOW,SMALL,STRETCH,CHILD,T
3 YELLOW,SMALL,DIP,ADULT,T
4 YELLOW,SMALL,DIP,CHILD,F
5 YELLOW,SMALL,DIP,CHILD,F
6 YELLOW,LARGE,STRETCH,ADULT,T
It appears that the data is all in one column. This could split into separate columns with the commas as the separators. From the tidyr
package in R, the separate()
function is used to split this V1 column into multiple columns.
# Need to split this V1 column into multiple columns and separate by the commas.
library(tidyr)
col_names <- c("Color", "size", "act", "age", "inflated")
sep_data <- separate(balloons_data, col = "V1", into = col_names, sep = ",")
head(sep_data)
Color size act age inflated
1 YELLOW SMALL STRETCH ADULT T
2 YELLOW SMALL STRETCH CHILD T
3 YELLOW SMALL DIP ADULT T
4 YELLOW SMALL DIP CHILD F
5 YELLOW SMALL DIP CHILD F
6 YELLOW LARGE STRETCH ADULT T
3) The haven
Package For Importing SPSS, STATA and SAS Files
There are a few places out there that do statistics with SPSS, STATA and SAS programs. The haven
package in R allows the user to read in SPSS/STATA/SAS files. (Use install.packages("haven")
to install the haven package into R/RStudio.)
SPSS files come in the
.sav
extension and could be imported into R with eitherread_sav()
orread_spss()
.STATA files come in as
.dta
files. You can read in STATA files with theread_stata()
function or with theread_dta()
function.SAS files can be read with the
read_sas()
command in R.
4) Importing Excel (.xlsx) Files
When I was working with Excel files, I had deal with a bit of trial and error. I was unable to find a solution where you can read an Excel (.xlsx) file from the web. My solution is to save the Excel file into a folder, set that folder as the working directory in R/RStudio and then read the file with the read_excel()
function from R's readxl
package. (Here is the excel file).
### Excel Files
library(readxl) # For excel files.
# Reference: http://www.milanor.net/blog/read-excel-files-from-r/
# Data: "www.superdatascience.com/wp-content/uploads/2015/06/SuperStoreUS_2015.xlsx"
# Can't load excel spreadsheet directly from url. Save the file first and then
# load it into R. Remember to set the working directory in RStudio.
excel_data = data.frame(read_excel("SuperStoreUS_2015.xlsx"))
head(excel_data, n = 10)
Row.ID Order.Priority Discount Unit.Price Shipping.Cost Customer.ID Customer.Name
1 20847 High 0.01 2.84 0.93 3 Bonnie Potter
2 20228 Not Specified 0.02 500.98 26.00 5 Ronnie Proctor
3 21776 Critical 0.06 9.48 7.29 11 Marcus Dunlap
4 24844 Medium 0.09 78.69 19.99 14 Gwendolyn F Tyson
5 24846 Medium 0.08 3.28 2.31 14 Gwendolyn F Tyson
6 24847 Medium 0.05 3.28 4.20 14 Gwendolyn F Tyson
7 24848 Medium 0.05 3.58 1.63 14 Gwendolyn F Tyson
8 18181 Critical 0.00 4.42 4.99 15 Timothy Reese
9 20925 Medium 0.01 35.94 6.66 15 Timothy Reese
10 26267 High 0.04 2.98 1.58 16 Sarah Ramsey
Ship.Mode Customer.Segment Product.Category Product.Sub.Category Product.Container
1 Express Air Corporate Office Supplies Pens & Art Supplies Wrap Bag
2 Delivery Truck Home Office Furniture Chairs & Chairmats Jumbo Drum
3 Regular Air Home Office Furniture Office Furnishings Small Pack
4 Regular Air Small Business Furniture Office Furnishings Small Box
5 Regular Air Small Business Office Supplies Pens & Art Supplies Wrap Bag
6 Regular Air Small Business Office Supplies Pens & Art Supplies Wrap Bag
7 Regular Air Small Business Office Supplies Rubber Bands Wrap Bag
8 Regular Air Small Business Office Supplies Envelopes Small Box
9 Regular Air Small Business Office Supplies Envelopes Small Box
10 Regular Air Small Business Office Supplies Rubber Bands Wrap Bag
Product.Name Product.Base.Margin
1 SANFORD Liquid Accent™ Tank-Style Highlighters 0.54
2 Global Troy™ Executive Leather Low-Back Tilter 0.60
3 DAX Two-Tone Rosewood/Black Document Frame, Desktop, 5 x 7 0.45
4 Howard Miller 12-3/4 Diameter Accuwave DS ™ Wall Clock 0.43
5 Newell 321 0.56
6 Newell 351 0.56
7 OIC Colored Binder Clips, Assorted Sizes 0.36
8 Grip Seal Envelopes 0.38
9 Tyvek ® Top-Opening Peel & Seel ® Envelopes, Gray 0.40
10 Staples Gold Paper Clips 0.39
Country Region State.or.Province City Postal.Code Order.Date Ship.Date
1 United States West Washington Anacortes 98221 2015-01-07 2015-01-08
2 United States West California San Gabriel 91776 2015-06-13 2015-06-15
3 United States East New Jersey Roselle 7203 2015-02-15 2015-02-17
4 United States Central Minnesota Prior Lake 55372 2015-05-12 2015-05-14
5 United States Central Minnesota Prior Lake 55372 2015-05-12 2015-05-13
6 United States Central Minnesota Prior Lake 55372 2015-05-12 2015-05-13
7 United States Central Minnesota Prior Lake 55372 2015-05-12 2015-05-13
8 United States East New York Smithtown 11787 2015-04-08 2015-04-09
9 United States East New York Smithtown 11787 2015-05-28 2015-05-28
10 United States East New York Syracuse 13210 2015-02-12 2015-02-15
Profit Quantity.ordered.new Sales Order.ID
1 4.5600 4 13.01 88522
2 4390.3665 12 6362.85 90193
3 -53.8096 22 211.15 90192
4 803.4705 16 1164.45 86838
5 -24.0300 7 22.23 86838
6 -37.0300 4 13.99 86838
7 -0.7100 4 14.26 86838
8 -59.8200 7 33.47 86837
9 261.8757 10 379.53 86839
10 2.6300 6 18.80 86836
References
- http://www.milanor.net/blog/read-excel-files-from-r/
- https://stackoverflow.com/questions/21101927/read-files-with-extension-data-into-r
- Loading A .JSON File Into R (From My Website)
- Longer Version Of This Post On My Website
- Data Import Cheatsheet From RStudio
(Apparently there is a file type extension of .fwf which stands for fixed width file. There are also .tsv files. I found out about this from the data import cheatsheet link.)
Congratulations @dkmathstats! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP