Getting Started with R - Things are heating up with government data! - defining variables from dataframes

in analysis •  last month

'Hunker in Place!'

As I continue to dive deeper in search of being able to do something useful with R, I will try to do short updates on my progress, however minor it may seem to me c;

Half the battle is developing the discipline to sit at the computer and not wander off onto the internet, never to be seen or heard from again.

R part 2.jpg
Rate my header in the comments....

So if you were paying attention, you may remember that I ran into a roadblock in my last article on R, after I had imported data successfully, I was having trouble accessing it. Today while continuing with my aided online learning at datacamp, I made some progress on the side in my understanding of these imported data frames.

But not all at once....first I had to find new data:

Name three adjectives that describe the US government, they probably apply here too....

As a new learner, I don't want to pay for data to practice on while I learn. This is the reason that I am not currently using Arcange's SQL database, which requires a 10 steem per month subscription to access. One day soon that is the goal, but its a little pricey while I practice.

Instead, I go to data.gov, which has tons of weird data schniblets that I can play with. Take, for example, the file that I am currently using - 2010 census data broken down by population. Now that's pretty cool:

1.jpg
Summarized Data!

Now maybe some of you can already notice what is weird about this data, but at this point I was very happy. The data I was using last time ended up having very low population numbers, between 1 and 100 for each zip code, I think I had found a weird sample data or something, so when I found the 2010 census data and it worked I was thrilled!

Back to trying to do something interesting with one of the variable:

2.jpg
Integers are numeric!!!!

Again, even with my fancy new data, I was getting hammered by errors. The program wasn't understanding me, or, more accurately, I wasn't yet speaking it's language correctly. I seemed to be having trouble calling the variables that I saw right in front of me.

Lastly, when I got really verbose, it told me that my population variable was not numeric, even though it clearly said it was an integer!

See what can happen when we aren't careful with our communication? People are not so different XD

3.jpg
declare your variables

After a bit of research, and taking a break from my break to continue the datacamp course, I realized that I needed to declare my variables from the dateframe before I could do any neat tricks on them. Above you can see that I separate population into its own vector, on which I can then do stuff.

But what I find out is that the Population of the US is not what I thought it was....In fact, the sum total is only 10.6 million! After mulling over certain conspiracy theories I have heard about world population sizes, I rechecked my data set and realized I was only dealing with data from the greater Los Angeles area. Oh. I guess that is more reasonable than a global conspiracy to overstate population levels that a lone data trainee uncovers with open government data.

For fun and laughs, I tried the plot() command:

4.jpg
Completely successful and completely meaningless!

And I got a meaningless graph plotting the size of the population versus the order they appear in my dataset. There is still a long way to go before I do anything useful, but I feel like I made real progress this morning!

Thanks for tagging along c;

May your dreams come true.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Very interesting, I have no idea about data analysis but was always interested.

Had no idea the overall population within the US is 10.6 million :) hihi

Have a great week!!!!

Completly successful and completly meaning less....lol..
This got my mind rolling.

That's a whole load of data, and I do just that with anything technical like databases - keep trying commands until one finally works! SQL can be super frustrating at times, it's also really fussy about context.

Who is ecoinstants - I thought it was you or is it an alt account?

c0ff33commentaimage.png
#thealliance #witness

·

Great to hear from you c0ff33a! Learning to talk to computers makes talking to humans almost seem easy!

@ecoinstats is an alt, I am going to use it for steepshot and dlike content, but haven't gotten steepshot to work yet, seems as if they are in the middle of an update.

I have a few other 'mispelled' accounts registered as well, @econstant , @ecosaint etc., one day perhaps I'll get them all doing something on scripts. :)

Hi @ecoinstant!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 4.168 which ranks you at #2821 across all Steem accounts.
Your rank has improved 2 places in the last three days (old rank 2823).

In our last Algorithmic Curation Round, consisting of 316 contributions, your post is ranked at #86.

Evaluation of your UA score:
  • Some people are already following you, keep going!
  • You have already convinced some users to vote for your post, keep trying!
  • Good user engagement!

Feel free to join our @steem-ua Discord server