Data Mining and Application : BIG DATA RULES THE WORLD
The current trend in industry is all about Business Intelligence, no wonder some Human Resource Manager pay Data Scientists huge amount of money to gather and analyze relevant data for them. Many have been using data mining to analyze, forecast and predict the uncertain future trends, aids in forecasting the stock exchange, crypto currency: such as predicting the rise and fall of STEEM & SBD, which later determines when to sell or buy from the market and assist in business strategies which later brings more money to the company.
The advanced in technology has made it simple and possible for non-programmer or people with no degree in computer science to be able to analyze, forecast the stock and extract meaningful knowledge from pool of records in the database. There are many machine learning software tools which aid in doing this without any need to hire outside help. Before you begin, though, getting a better understanding of what data mining is and how it can help your business is very important.
Data mining is an interdisciplinary research area that stems from statistics, machine learning, artificial intelligence and computer science. It simply means the extraction of hidden predictive information from massive pool of data in databases. It is a very powerful technology with great potential used in various commercial applications including stock exchange, retail sales, education, e-commerce, remote sensing, bioinformatics etc.
The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge.
There are various steps that are involved in data mining as shown in the picture above.
- Selection: This is a process of selecting relevant data from database which are useful for data mining.
- Preprocessing and cleaning: Most times the raw data we used to collect are not always clean and may contain errors, missing values, noisy or inconsistent data. Thereby getting rid of such anomalies are very important.
- Features selection and extraction: Feature selection and extraction lets you refined data with a smaller number of attributes than the original set.
- Data Mining: This is the application of data mining techniques on the data to discover the interesting patterns. Using various techniques such as association and clustering and many other techniques used for data mining.
- Interpretation and Evaluation: This is where we generate visualization, transformation, forecasting and prediction
Researchers are applying Data mining techniques in various application domains such as education, banking, fraud detection, network intrusion detection and telecommunications
Science applications
Scientist are using Data mining techniques in many areas, like medicine, molecular biology, astronomy, geology and many more. Scientific data mining is referred to as application of data mining to scientific problems. Applications of data mining to bioengineering and medicine such as brain scans or multi variate time series signals such as electrocardiograms and magneto-cardiograms.
Business Application
Many business organizations are now applying data mining as a means to remain or gain competitive edge. Data mining are truly helping a business organization to reach its maximum potential. It is a way to evaluate how business is being affected by certain features, and can help business managers to maximize their profits and avoid making unnecessary mistakes down the line.
Educational Data Mining
Data mining in higher education is a recent research field and this area of research is gaining popularity because of its potentials to educational institutes. It is a term used for processes designed for the analysis of the data in educational settings to better understand students and the settings which they learn in.
The diagram below shows machine learning algorithms use in data mining in other to extract knowledge
Machine Learning Algorithms used in data mining are generally classified into three.
Supervised Learning: The main focus in using supervised learning is to learn a model from labeled training data which helps us to make predictions about future/unknown data. The term supervised refers to a set of samples in which the desired output signals (labels) are already known. For instance, the example of electronic mail spam filtering, one can train a model using a supervised machine learning algorithm on a group of labeled e-mail, i.e. e-mail that are
correctly marked either as spam or not-spam, in other to predict whether a new e-mail belongs to one of the two categories. Classification and regression are both example of this algorithm
credit Unsupervised Learning: We used this algorithm when we do not have any outcome variable to predict. It is also used for clustering population into different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.
credit Reinforcement: The last type of machine learning algorithm is called reinforcement learning. In this technique, the main goal is to develop a system (agent) that improves its performance based on interactions with its environment. The information about the current state of the environment typically also includes a so-called reward signal, we can always think of reinforcement learning technique as an area related to supervised learning. Though, in reinforcement learning, feedback is not the correct ground truth label, but a way of measuring how well the action was measured by a reward function. A typical and most popular example of reinforcement learning is a chess game engine. In this system, the agent decides upon a series of moves depending on the state of the board, i.e. the environment and the reward can be defined as either win or lose at the end of the game:
credit
List of Common Machine Learning Algorithms
Here is the list of commonly used machine learning algorithms. We can apply all these algorithms to almost any data problem:
- Linear Regression
- Logistic Regression
- Decision Tree
- SVM
- Naive Bayes
- KNN
- K-Means
- Random Forest
- Dimensionality Reduction Algorithms
- Gradient Boosting algorithms
Various algorithm use to mine data are shown above and their applications will be explored in my coming posts.
Reference
1
2
3
4
5
6
7
8
9
10
Thanks for reading through and hope you have added to your knowledge!
Thanks @steemSTEM for the usual recognition. I promise to enrich you with original and well informed contents in the future!