Machine Learning 101 for Startups — Part 1

in #technology7 years ago

This is the first post in a series of posts on #machine-learning techniques for #startups and #business to help in decision making. The target audience for this post is business folks looking to understand how they can leverage #machine-learning in their business or product.

The problem statement — How to establish #correlation or make a decision based on existing data.

Have you ever been in a situation when you have an XL sheet with rows of data and you want to know how the data is related and develop a tool to look into the future by using this data ?

#Regression analysis is a statistical technique used to find the relations between two or more variables. In regression analysis one variable is independent and its impact on the other dependent variables is measured.

Typical use cases where you can apply #regression-analysis is to solve questions like product pricing changes, determining why expenses are overrun on projects or determining inventory levels based on #stock analysis of the past.

A simple example

Below is a very simple example of correlation between cancer incidents reported and income group. This data is a subset of the US Government census data found here. For the sake of this demonstration, we have taken a very tiny snapshot of the data. This is purely meant to demonstrate how #linear-regression can very quickly show some quick patterns in your data and help you draw correlations.

Check this #ipython notebook:
cancer_income_correlation.ipynb

What we have shown here, Linear Regression is an algorithm to find the best suitable line, called regression line, through all data points. It is one of the most interpretable #machine-learning algorithms.

One thing to keep in mind is that #Regression analysis works well only when you have continuous data — for example, in the above code we had a continuous set of data between income and incidents reported.

However, if you are trying to #predict a relationship between weather and the number of cars on the street, then you need to have a set of daily data for temperature or humidity and the cars in the street. The more gaps you have in your data, the weaker the #regression #analysis will be.

This was meant to be a very quick #introduction to how you can use #regression analysis in your daily #decision making. In the real world, you often use a triage of multiple #statistical tools to build a complete #Machine Learning toolkit which is customised to help you with your data set. Over the next few articles we will be discussing each of these tools, to help you understand the various scenarios.

Sort:  

Congratulations @karamjit! You received a personal award!

1 Year on Steemit

Click here to view your Board

Do not miss the last post from @steemitboard:

SteemWhales has officially moved to SteemitBoard Ranking
SteemitBoard - Witness Update

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @karamjit! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Coin Marketplace

STEEM 0.17
TRX 0.16
JST 0.028
BTC 74715.49
ETH 2808.16
USDT 1.00
SBD 2.53