Importing Various Data Files In R

in #programming8 years ago

Hi there. This post is a quick overview about importing various data file types into R.


Featured Image Source

Sections


  • Reading In .csv Files
  • Importing .data Files
  • The haven Package For Importing SPSS, STATA and SAS Files
  • Importing Excel (.xlsx) Files
  • References

1) Reading In .csv Files


I think that one of the easiest file types to work with in terms of file importing is the .csv file type. For .csv files use the code
read.csv() command in R.

> # .csv files
> 
> # Ref: http://introcs.cs.princeton.edu/java/data/
> 
> countries_data <- read.csv("http://introcs.cs.princeton.edu/java/data/countries.csv", 
+                         header = TRUE, sep = ",")
> head(countries_data)
  UN.code.number X2.letter.ISO.abbreviation X3.letter.ISO.abbreviation                name
1              4                         AF                        AFG         Afghanistan
2              8                         AL                        ALB             Albania
3             12                         DZ                        DZA             Algeria
4             20                         AD                        AND             Andorra
5             24                         AO                        AGO              Angola
6             28                         AG                        ATG Antigua and Barbuda
           capital
1            Kabul
2           Tirana
3          Algiers
4 Andorra la Vella
5           Luanda
6       St. John's

2) Importing .data Files


I do visit this website from time to time for some sample datasets to play around with in R and for Python. Some of the datasets do have .csv files and this weird data file type I have never seen before. This data file type is .data.


###  .data files

# Source: UCI Machine Learning Repository
# Read in the .data file:
# https://stackoverflow.com/questions/21101927/read-files-with-extension-data-into-r

url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/balloons/adult-stretch.data"

balloons_data <- read.table(url, sep = "\t", dec = ",") # Reading in a .data file.

head(balloons_data)
                            V1
1 YELLOW,SMALL,STRETCH,ADULT,T
2 YELLOW,SMALL,STRETCH,CHILD,T
3     YELLOW,SMALL,DIP,ADULT,T
4     YELLOW,SMALL,DIP,CHILD,F
5     YELLOW,SMALL,DIP,CHILD,F
6 YELLOW,LARGE,STRETCH,ADULT,T

It appears that the data is all in one column. This could split into separate columns with the commas as the separators. From the tidyr package in R, the separate() function is used to split this V1 column into multiple columns.

# Need to split this V1 column into multiple columns and separate by the commas.
  
library(tidyr)

col_names <- c("Color", "size", "act", "age", "inflated")

sep_data <- separate(balloons_data, col = "V1", into = col_names, sep = ",")

head(sep_data)
   Color  size     act   age inflated
1 YELLOW SMALL STRETCH ADULT        T
2 YELLOW SMALL STRETCH CHILD        T
3 YELLOW SMALL     DIP ADULT        T
4 YELLOW SMALL     DIP CHILD        F
5 YELLOW SMALL     DIP CHILD        F
6 YELLOW LARGE STRETCH ADULT        T

3) The haven Package For Importing SPSS, STATA and SAS Files


There are a few places out there that do statistics with SPSS, STATA and SAS programs. The haven package in R allows the user to read in SPSS/STATA/SAS files. (Use install.packages("haven") to install the haven package into R/RStudio.)

  • SPSS files come in the .sav extension and could be imported into R with either read_sav() or read_spss().

  • STATA files come in as .dta files. You can read in STATA files with the read_stata() function or with the read_dta() function.

  • SAS files can be read with the read_sas() command in R.

4) Importing Excel (.xlsx) Files


When I was working with Excel files, I had deal with a bit of trial and error. I was unable to find a solution where you can read an Excel (.xlsx) file from the web. My solution is to save the Excel file into a folder, set that folder as the working directory in R/RStudio and then read the file with the read_excel() function from R's readxl package. (Here is the excel file).

setwd_image.PNG

A Screenshot Image Of My RStudio Showing How To Set The Working Directory


### Excel Files

library(readxl) # For excel files.

# Reference: http://www.milanor.net/blog/read-excel-files-from-r/
# Data: "www.superdatascience.com/wp-content/uploads/2015/06/SuperStoreUS_2015.xlsx"


# Can't load excel spreadsheet directly from url. Save the file first and then
# load it into R. Remember to set the working directory in RStudio.

excel_data = data.frame(read_excel("SuperStoreUS_2015.xlsx"))

head(excel_data, n = 10)
   Row.ID Order.Priority Discount Unit.Price Shipping.Cost Customer.ID     Customer.Name
1   20847           High     0.01       2.84          0.93           3     Bonnie Potter
2   20228  Not Specified     0.02     500.98         26.00           5    Ronnie Proctor
3   21776       Critical     0.06       9.48          7.29          11     Marcus Dunlap
4   24844         Medium     0.09      78.69         19.99          14 Gwendolyn F Tyson
5   24846         Medium     0.08       3.28          2.31          14 Gwendolyn F Tyson
6   24847         Medium     0.05       3.28          4.20          14 Gwendolyn F Tyson
7   24848         Medium     0.05       3.58          1.63          14 Gwendolyn F Tyson
8   18181       Critical     0.00       4.42          4.99          15     Timothy Reese
9   20925         Medium     0.01      35.94          6.66          15     Timothy Reese
10  26267           High     0.04       2.98          1.58          16      Sarah Ramsey
        Ship.Mode Customer.Segment Product.Category Product.Sub.Category Product.Container
1     Express Air        Corporate  Office Supplies  Pens & Art Supplies          Wrap Bag
2  Delivery Truck      Home Office        Furniture   Chairs & Chairmats        Jumbo Drum
3     Regular Air      Home Office        Furniture   Office Furnishings        Small Pack
4     Regular Air   Small Business        Furniture   Office Furnishings         Small Box
5     Regular Air   Small Business  Office Supplies  Pens & Art Supplies          Wrap Bag
6     Regular Air   Small Business  Office Supplies  Pens & Art Supplies          Wrap Bag
7     Regular Air   Small Business  Office Supplies         Rubber Bands          Wrap Bag
8     Regular Air   Small Business  Office Supplies            Envelopes         Small Box
9     Regular Air   Small Business  Office Supplies            Envelopes         Small Box
10    Regular Air   Small Business  Office Supplies         Rubber Bands          Wrap Bag
                                                 Product.Name Product.Base.Margin
1              SANFORD Liquid Accent™ Tank-Style Highlighters                0.54
2              Global Troy™ Executive Leather Low-Back Tilter                0.60
3  DAX Two-Tone Rosewood/Black Document Frame, Desktop, 5 x 7                0.45
4      Howard Miller 12-3/4 Diameter Accuwave DS ™ Wall Clock                0.43
5                                                  Newell 321                0.56
6                                                  Newell 351                0.56
7                    OIC Colored Binder Clips, Assorted Sizes                0.36
8                                         Grip Seal Envelopes                0.38
9           Tyvek ® Top-Opening Peel & Seel ® Envelopes, Gray                0.40
10                                   Staples Gold Paper Clips                0.39
         Country  Region State.or.Province        City Postal.Code Order.Date  Ship.Date
1  United States    West        Washington   Anacortes       98221 2015-01-07 2015-01-08
2  United States    West        California San Gabriel       91776 2015-06-13 2015-06-15
3  United States    East        New Jersey     Roselle        7203 2015-02-15 2015-02-17
4  United States Central         Minnesota  Prior Lake       55372 2015-05-12 2015-05-14
5  United States Central         Minnesota  Prior Lake       55372 2015-05-12 2015-05-13
6  United States Central         Minnesota  Prior Lake       55372 2015-05-12 2015-05-13
7  United States Central         Minnesota  Prior Lake       55372 2015-05-12 2015-05-13
8  United States    East          New York   Smithtown       11787 2015-04-08 2015-04-09
9  United States    East          New York   Smithtown       11787 2015-05-28 2015-05-28
10 United States    East          New York    Syracuse       13210 2015-02-12 2015-02-15
      Profit Quantity.ordered.new   Sales Order.ID
1     4.5600                    4   13.01    88522
2  4390.3665                   12 6362.85    90193
3   -53.8096                   22  211.15    90192
4   803.4705                   16 1164.45    86838
5   -24.0300                    7   22.23    86838
6   -37.0300                    4   13.99    86838
7    -0.7100                    4   14.26    86838
8   -59.8200                    7   33.47    86837
9   261.8757                   10  379.53    86839
10    2.6300                    6   18.80    86836

References


(Apparently there is a file type extension of .fwf which stands for fixed width file. There are also .tsv files. I found out about this from the data import cheatsheet link.)

Sort:  

Congratulations @dkmathstats! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Coin Marketplace

STEEM 0.13
TRX 0.34
JST 0.036
BTC 108529.70
ETH 4405.54
USDT 1.00
SBD 0.82