Unix command line tricks for Linux, Mac, and Windows: cut

in #technology7 years ago

This is part of a series on Unix command line tools that are available in on Linux, OS X, and Windows machines. We'll be looking at some of the most useful Unix command line commands, because as the idea of Data Science gets bigger and bigger, it turns out that these old school Unix tools are more valuable than ever for dealing with the different kinds of data files that may show up. See Unix command line tricks for Linux, Mac, and Windows: grep for more on this series.

Today, we'll look at cut. Last week we looked at head and tail, which you can combine to take horizontal slices of files—for example, to take lines 145,234 through 145,238 of a 300,000 line file. cut lets you take vertical slices of files, which is especially useful when working with spreadsheet data or data exported from relational tables.

For our examples, we'll use this sampledata.csv file exported as a CSV file from a spreadsheet program like Excel or LibreOffice Calc:

Employee Number,Family Name,Given Name,Hire Date,Phone Extension
1001,Johnson,Emily,11/13/2016,x0023
1002,Smith,John,03/16/2017,x7225
1003,Baker,Debbie,03/23/2017,x8834
1004,Morales,Kermit,06/09/2017,x2643

The following command tells cut that our input file is delimited with commas and that we only want the third field:

cut -d ',' -f3 sampledata.csv

The command returns that third field for all the rows:

Given Name
Emily
John
Debbie
Kermit

We can also ask for multiple fields with a list of field numbers separated by commas. The following asks for the third and fifth fields:

cut -d ',' -f3,5 sampledata.csv

And here they are:

Given Name,Phone Extension
Emily,x0023
John,x7225
Debbie,x8834
Kermit,x2643

Last week we also learned about combining commands into a pipeline where each command sends its output to be used as input by the next command. The following uses the head command to take the first four lines of the file named as input, then sends its output to the tail command, which will pass along the last three lines of its input to the cut command, which will output the first and fourth columns of its input:

head -n 4 sampledata.csv | tail -n 3 | cut -d ',' -f1,4

Here is the result:

1001,11/13/2016
1002,03/16/2017
1003,03/23/2017

Again, this may not seem useful with an input file that is five lines long, but when you've exported a table from a massive database and have hundreds of thousands of lines that you can't just pull up in a text editor, you can combine these commands to perform a lot of very useful tasks. For example, you can pull a subset that has the parts that are most interesting to you and that will fit into a text editor or visualization tool.

As with all Unix commands, you can learn more cut with the manual (man) command:

man cut
Sort:  

This is cool, keep up these videos. I use linux and windows. Needless to say, linux is MUCH better lol

I can't find an edit button for my post, so I'm adding a link to the table of contents for the series here: Unix command line tricks for Linux, Mac, and Windows: table of contents

Coin Marketplace

STEEM 0.30
TRX 0.12
JST 0.033
BTC 64303.16
ETH 3137.29
USDT 1.00
SBD 3.97