'Hunker in Place!'
As I continue to dive deeper in search of being able to do something useful with R, I will try to do short updates on my progress, however minor it may seem to me c;
Half the battle is developing the discipline to sit at the computer and not wander off onto the internet, never to be seen or heard from again.
So if you were paying attention, you may remember that I ran into a roadblock in my last article on R, after I had imported data successfully, I was having trouble accessing it. Today while continuing with my aided online learning at datacamp, I made some progress on the side in my understanding of these imported data frames.
But not all at once....first I had to find new data:
As a new learner, I don't want to pay for data to practice on while I learn. This is the reason that I am not currently using Arcange's SQL database, which requires a 10 steem per month subscription to access. One day soon that is the goal, but its a little pricey while I practice.
Instead, I go to data.gov, which has tons of weird data schniblets that I can play with. Take, for example, the file that I am currently using - 2010 census data broken down by population. Now that's pretty cool:
Now maybe some of you can already notice what is weird about this data, but at this point I was very happy. The data I was using last time ended up having very low population numbers, between 1 and 100 for each zip code, I think I had found a weird sample data or something, so when I found the 2010 census data and it worked I was thrilled!
Back to trying to do something interesting with one of the variable:
Again, even with my fancy new data, I was getting hammered by errors. The program wasn't understanding me, or, more accurately, I wasn't yet speaking it's language correctly. I seemed to be having trouble calling the variables that I saw right in front of me.
Lastly, when I got really verbose, it told me that my population variable was not numeric, even though it clearly said it was an integer!
See what can happen when we aren't careful with our communication? People are not so different XD
After a bit of research, and taking a break from my break to continue the datacamp course, I realized that I needed to declare my variables from the dateframe before I could do any neat tricks on them. Above you can see that I separate population into its own vector, on which I can then do stuff.
But what I find out is that the Population of the US is not what I thought it was....In fact, the sum total is only 10.6 million! After mulling over certain conspiracy theories I have heard about world population sizes, I rechecked my data set and realized I was only dealing with data from the greater Los Angeles area. Oh. I guess that is more reasonable than a global conspiracy to overstate population levels that a lone data trainee uncovers with open government data.
For fun and laughs, I tried the plot() command:
And I got a meaningless graph plotting the size of the population versus the order they appear in my dataset. There is still a long way to go before I do anything useful, but I feel like I made real progress this morning!
Thanks for tagging along c;
May your dreams come true.