Artificial Intelligence - A quick primer
Who has not heard about the chances, challenges, and threats regarding „Artificial Intelligence“?
Cars learn how to drive, your customer support is now a chatbot and every startup is nowadays somehow AI-driven. Basic income is seriously discussed to keep unemployed people from rioting and eventually machines will take over the world and destroy humanity.
Instead of click-rate-optimising this article by repeating apocalyptic assumptions (…I’ll keep that for another day…), I would rather like focussing on the current state of technology and how it can be applied.
Also, whatever the future brings, always be reminded that:
„The best way to predict your future is to create it.“ - Abraham Lincoln
Keep focussed on your goal, not fancy technology buzzwords
Before I give you an overview I feel tempted to give you a short kind of disclaimer regarding new technologies in general:
Technology can help you to reach a goal. Important is to define the goal first and use then the best technology to reach it. Especially new technologies and methods are often thrown at problems where it just doesn’t make sense. We see this with overhyped blockchain solutions, where you could better use a database, as well as with artificial intelligence.
Therefore business people should define their goals and engineers should try to solve it. With the most straightforward method they have. Not the latest. Not the most advanced. Not the most complex.
Strong AI versus Weak AI
„Strong AI“, „Artificial General Intelligence“ (AGI) or „human-level intelligence“ are used synonymously by experts to describe machines which can abstract concepts from limited experience and to transfer knowledge between domains. So more or less what we define as intelligence in humans and animals.
„Weak AI“ or „Narrow AI“ on the other hand are systems designed for a specific task whose capabilities are not easily transferable to others. All systems developed so far are weak Als, which can solve specific tasks better than humans.
Despite new techniques to solve particular tasks very efficiently and weak AI solutions applied to more and more problems, we are very far away from something we could call a „human-level intelligence“ or strong AI. Therefore we focus on the weak AI - techniques which are in our arsenal today.
Statistics is a branch of mathematics dealing with the collection, analysis, visualisation, and interpretation of data. Relationships and properties are discovered within data sets and uncertainty is quantified.
Two main statistical methods are used in data analysis: „descriptive statistics“ and „inferential statistics“.
Descriptive statistics provide summaries about the sample and about the observations that have been made. One classic business example is investors who may use a historical account of return behavior by performing analytical analyses on their investments to make better investing decisions in the future.
Inferential statistics, on the other hand, assumes that the data came from a larger population and is used if the actual population is too large or difficult to capture. Take political election polling as an example. Naturally, these results cannot be 100% accurate but are coming with a certain probability, based on the size of the taken sample.
Data mining is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown patterns such as groups of data records, unusual records, and dependencies.
Often data mining builds the foundation for further analysis, machine learning or predictive analysis.
Expert systems were among the first successful AI tools, designed in the 1970s and became popular in the 1980s. They emulate the decision-making ability of a human expert based on the „knowledge base“ and by certain rules defined and used in the „inference engine“.
The knowledge base represents facts about the world, represented as classes, subclasses and object instances.
An inference engine is an automated reasoning system which evaluates the current state of the knowledge-base, applies relevant rules, and then asserts new knowledge into the knowledge base. Additionally, it provides an explanation for the user which rules and decisions resulted in the assertion.
Knowledge-base and rules have to be entered by domain experts and are therefore not scalable and only focussed on answering specific questions. Nevertheless, new approaches are looking into combining them with machine learning to improve performance.
Machine learning enables computers to become more accurate in predicting outcomes without being explicitly programmed. Which means that they automatically build algorithms that can receive input data and use computational statistics to predict an output value within an acceptable range.
To build these algorithms the computer has to be trained, best compared with a toddler learning to distinguish everyday objects from each other. There are different training methods which can be used solely or combined which are also running best on a different kind of hardware.
In supervised learning, the algorithm or function is inferred from labeled training data. One example would be a data set containing 5000 tweets of Donald Trump, labeled with Yes or No if the tweet is insulting someone. When you talk about supervised learning you should be aware of the difference between classification and regression problems: If you are trying to predict if a new tweet is an insult or not, this is a classification problem with discrete classes, primarily identifying a group membership. If it is a continuous value, estimating or predicting a response, e.g., the number of twitter-likes this insult will have, it is a regression problem.
In unsupervised learning, there is no training data set, and outcomes are unknown. Essentially we are going into the problem blind, and the computer is left alone to discover inherent structure and patterns that lie within the data.
One typical application of unsupervised learning is clustering, where input data is divided into different groups based on a measure of similarity. Unsupervised learning algorithms can perform more complex processing tasks than supervised learning systems. However, unsupervised learning can be more unpredictable. While an unsupervised learning system might figure out on its own how to sort insulting tweets from others, it might also add unexpected and undesired categories to deal with…
Depending on the problem to solve, unsupervised learning can become computationally intensive, especially if deep unsupervised learning is used, which requires extensive datasets and the optimisation of millions of model parameters like synaptic weights. Here is where new hardware is going to be the next step into the future: quantum computers, which are already available as cloud-based playgrounds (e.g. IBM Q).
When theory meets reality
In reality, engineers have to be more creative to solve specific problems, and the best solution is often a combination of supervised and unsupervised learning. In most cases, this means to get somehow a feedback how well the computer performed at a specific task to improve the function.
One example would be semi-supervised learning, where training data is not entirely or incorrectly labeled, which is the case with many real-world data sets. „Active learning“ is one semi-supervised learning method, where the computer requests the user if it was the right output, like when you are hitting the thumbs up in the recommendations engine of Spotify.
There are several other methods an AI engineer has in her arsenal, but choosing the right one or inventing something new to tackle the problem is what differs the good AI engineer from the awesome.
As soon as multi-layered artificial neural networks are involved, we are speaking of „Deep Learning“.
Neural networks, inspired by biological neurons, are mathematical structures invented in the 1950s. With more and more data and computational power available, the mathematical principles can nowadays be applied in fields where even experts did not think computers could beat the human brain in the near future.
One milestone, which surprised even the scientific world, was Google DeepMind’s AlphaGo to defeat human world champions of Go in 2016. Moreover, this is not another game like Chess: after the first two moves of a Chess game, there are 400 possible next moves. In Go, there are close to 130,000. Which means traditional brute-force algorithms could not be used, the machine had to develop a kind of intuition for the game.
This sounds impressing and gets, of course, a lot of media attention. However, in practice, the more common machine learning methods often outperform deep learning approaches in both performance and transparency. Another drawback in a business context is the requirement of a large volume of reliable, clean labeled data, which enterprises often lack.
Many other approaches exist which can be used alone or in combination with the described methods to improve performance:
Genetic algorithms are used for generative design and in combination with neural networks to improve learning. Probabilistic programming creates machines capable of making decisions in the face of uncertainty, which has already been used in applications such as medical imaging or financial predictions.
Thanks to Marcus for this great article on zauberware.com