What are some algorithms and data structures that every data scientist should know?

in #algorithms7 years ago

Depending on the purpose of application and quantity of data there is possible to make a first classification as follows.

  1. Clustering :is the problem of grouping the individuals in a population together by their similarity of attributes. A very famous clustering algorith is for example k-means . Here is a ppt presentation in slideshare of how kmeans work : Algorithms presentation (http://www.slideshare.net/alketcecaj/algorithms-presentation)

  2. Classification algorithms. Classification tries to predict, for each individual in a population, which set of classes this individual belongs to. A classification task, given a new individual, determines which class that individual belongs to. It also mya assign a probability to this association. An example is KNN (or k neares neighbour )

  1. Dimension reduction algorithm for reducing dimensions of a data set .It tries to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set. For example you can use the reduced data set for undestanding it better and visualize it in 2D dimensions.

  2. PCA or Principal Components Analysis for identifying the most important variables in you dataset. There are many studies that use PCA for data analysis but this is one of the papers that applies it in an original way :Eigenbehaviors: identifying structure in routine (http://link.springer.com/article/10.1007%2Fs00265-009-0739-0)

  3. Collaborative filtering for building recommendation systems. It is a problem of similarity matching. For example finding people who are similar to you in terms of the products they have liked or have purchased or finding products that are similar with respect to set of attributes. In this later case an "item based" recomendation algorithm is performed

  4. Association rules or co-occurrence grouping for market basket analysis. A common question in this case is : what items are commonly purchased together? For example, analyzing purchase records from a supermarket may uncover that beer is purchased together with chips frequently .

  5. Regression method for predicting the value of a certain value. For example : How much will a given customer use a certain service? The quantity to be predicted here is service usage, and a model could be generated by looking at other, similar individuals in the population and their historical usage.

  6. LDA algorithm for sentiment analysis and text mining but also for many other applications. Here a paper about it : Page on aaai.org (https://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/viewFile/1913/2215)

  7. Dijkstra's Shortest Path for finding the shortest way from a node to another in a graph. Probably the most important algorithm in graph theory.

  8. Link prediction to predict connections between data items, by suggesting that a link should exist, and estimating the strength of the link. For example in social networking it tries to guess if you and John share 15 friends, maybe you and John could be friens also in real life.

As for Data Structures you have to be familiar with Lists and LinkedLists, HashMaps TreeMaps, Sets and all the different versions. A good book for algorithms and data structures in Java can be found here : Data Structures and Algorithms in Java, 6th Edition (http://it-ebooks.info/book/4478/) and Python Data Structures and Algorithms in Python (http://it-ebooks.info/book/2467/)
Good luck.
This post originates from my answer on Quora you an find here : https://www.quora.com/What-algorithms-and-data-structures-should-any-software-engineer-know/answer/Alket-Cecaj?srid=n9bS

Sort:  

Congratulations @alketcecaj! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You published 4 posts in one day

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

good article, thanks for sharing :)

You're welcome!

Coin Marketplace

STEEM 0.18
TRX 0.15
JST 0.031
BTC 60794.44
ETH 2623.30
USDT 1.00
SBD 2.62