The building blocks of Artificial Intelligence
With the start of a series of articles related to Artificial Intelligence, the subject is attempted to be more widely known within the community. The initial publication briefly stated various kinds of learning algorithms utilized for artificial intelligence through a pleasing real-life example, the self-driving vehicle.
This article pays more attention to this topic due to the importance of these algorithms. They form the building blocks of the overall machine learning process. To discover patterns in big data, that lead to actionable insights, scientists use various learning algorithms. These algorithms can be classified into two groups based on the way they learn about data predictions: Supervised and Unsupervised learning.
Supervised Machine Learning
Supervised Machine learning is the most common used. Algorithms such as linear and logistic regression, multi-class classification, and support vector machines form a part of this high-level group. It is named supervised, because engineers guide and teach the algorithms to recognize rules for inputs that should generate correlating outputs. The requirement for this method is that all possible outputs are already known, and the data being used to train the algorithm is already labeled with correct answers having the idea that there is a relationship between the input and output.
In this technique the groups are known, and the experience provided to the algorithm is the relationship between actual entities and the group they belong to. The machine is told who is what, a significant number of times, and then is expected to predict this on its own.
The most widely used forms of supervised learning are:
- Classification
In this type the output should be in discrete terms that is, either ‘yes’ or ‘no’. In some cases, the options may increase to more than two.
An example: A classification algorithm will learn to identify animals after being trained on a dataset of images that are properly labeled with the species of the animal and some identifying characteristics. - Regression
Another form is the regression problem. In this type all we need is a continuous output that should not be in discrete terms.
Take the example of analyzing the size of houses on the real estate market to predict their price. This gives a continuous output in which price is a function of size.
Unsupervised learning
On the other hand, unsupervised machine learning is more closely aligned with what some call true artificial intelligence — the idea that a computer can learn to identify complex processes and patterns without a human to provide guidance along the way. Although unsupervised learning is complex for some simpler use cases, it opens the doors to solving problems that humans normally would not tackle.
While a supervised classification algorithm learns to ascribe inputted labels to images of animals, its unsupervised counterpart will look at inherent similarities between the images and separate them into groups accordingly, assigning its own new label to each group. That is why unsupervised learning is often interpreted as a synonym for clustering. This technique is used when the groups (categories) of data are not known. It is called unsupervised as it is left on the learning algorithm to figure out patterns in the data provided.
In a practical example, this type of algorithm is useful for customer segmentation because it will return groups based on parameters that a human may not consider due to pre-existing biases about the company’s demographic distribution. A wide used form of unsupervised learning, that is also interpreted by many as a synonym, is clustering.
- Clustering
Google news page uses this type of machine learning by creating distinct clusters. The page clusters or groups one type of news from different sites and becomes able to place any new input in the appropriate cluster. With this just within a single webpage many news sites can be found by related variables such as word frequency, sentence length, page count, and so on.
Also anomaly detection is part of unsupervised learning that manages to identify data errors and take them out of consideration.
Choosing to use either a supervised or unsupervised machine learning algorithm typically depends on factors related to the structure and volume of your data and the use case of the issue at hand. A well-rounded data science program will use both types of algorithms to build predictive data models that help to make decisions across a variety of business challenges.
CoinAnalyst
The technological solution of CoinAnalyst makes actively use of both methodologies. In addition to analyzing unstructured and raw datasets to create a classification, the smart algorithms are taught various kinds of terms and word classes so that the clustering also takes these characteristics into consideration when performing its processes.
The idea of content density helps algorithms to accurately classify articles, messages and any other social media publication across a wide range of domains in the crypto market when evaluated against a ground truth dataset of already correctly classified data classes. This ensures the technology to identify which project it concerns, who made the publication and how reliable the content is.
Start with a big dataset and give each item an annotation saying whether or not that item falls within a particular category. The actual content quantification was done in terms of another existing dataset containing big lists of words more or less likely to convey content (high content density: “official”, “success”, “reliable”, “promising” and so on; low content density: “false,” “misleading”, “not worth”, “scam”).
So, each article, statement or publication gets a score. These evaluations are done both by an automated system (mostly) and by engineers themselves. In the end, we wind up with a large amount of data labeled as content dense or not and this is what gets fed to the machine learning algorithm, which basically builds its own internal representation of what is and isn’t content dense.
Of course, this task can be extended to more fine-grained levels. In our following publication we will dive further into text classification techniques and predictions on sentiment level. If you have ever wondered how Support Vector Machines and Bayesian Networks function, as part of Supervised machine learning techniques, then stay tuned for the following article. We will also provide you some ‘early insights’ of CoinAnalyst’s already working product!
Originally published at medium.com on August 11, 2018.