5Ws and 1Hs of a Classification Algorithm
What is classification?
Classification is identifying categories that can be used to separate a dataset. This can be as simple as taking a bunch of clothes and separating them into tops or bottoms. In machine learning, we try to automate this process by creating a classification algorithm also known as a classifier.
How do classifiers work?
There many types of classifiers, here I will briefly describe the intricacies of the most common ones.
This classifier has a binary output, 1 and 0 which correlates to True of False. Logistic regression will attempt to find the probability of whether an event will occur or not. For example, you want to know if tomorrow’s weather is suitable for a walk or not, if it rains you will stay inside, otherwise you will go outside. You watch the news and the weather forecast says that there is a 40% chance of rain tomorrow. This is the kind of situation where logistic regression operates in, finding the probability between 2 possible outcomes.
This classifier is used when the classes, independent of each other, combined to describe an object. Let us look at an apple, we as humans can instinctively tell whether the object we are looking at is an apple or not since our brains have an imprint of the fruit's characteristics, but how can a machine detect it? Naive Bayes tries to imitate this by assigning independent characteristics to the apple.
The first thing we can see is that the apple has a spherical shape, though the machine can’t automatically assume that the object is an apple based on that characteristic alone since there are other objects that also have a spherical shape, like a basketball. By adding more and more independent characteristics to the apple, like a red outer-layer, yellowish-white core, and a small brown stem, the machine gets more confident and outputs a higher probability that the thing it is looking at is an apple.
This algorithm determines the category of a new object based on other objects surrounding it. Let us say you're at a large grocery store and you somehow manage to get lost, how will you know which aisle you are in? One of the many ways of knowing is to look at the nearby products around you; If you see there are more cheeses than cabbages surrounding you then, you can deduce that you are in the dairy aisle rather than the vegetable aisle.
K-Nearest Neighbours or KNN for short, works similar to this. When a new object is added to the dataset, the algorithm decides its category by looking at a certain amount -denoted as k- of its nearest neighbours. KNN works if you already have a labelled dataset beforehand so that it could categorise new objects.
To put it simply, a decision tree will continuously split the dataset into more and more precise categories. There are 3 parts to a decision tree, nodes, branches and leaves, which represent the category, the decision and the final outcomes respectively.
Above is an example of a simple decision tree that predicts whether a person is fit or not based on their life habits. Decision trees are great classifiers for supervised learning, where the person developing it already knows the different categories of a dataset and their possible outcomes.
Support Vector Machines
Support Vector Machines or SVM is similar to KNN in a way that it requires a pre-labeled dataset. The difference here is, instead of checking the new object’s nearest neighbours, SVM determines a category by creating lines that separates each group in a dataset.
First, each data is mapped onto a graph based on their values. These data points are already categorised. A line -also known as a hyperplane- is then drawn on the map by the algorithm, the line acts as a separator between different categories. When a new object is added to the dataset, the object will be placed on the graph based on its inherent value and is then categorised based on which side of the graph it is located.
When or Why would you use a classifier over other algorithms?
Based on the different types of classifiers that we have discussed previously, the similarities between them is that the labels or categories for their dataset are already known. Thus, classifiers are used when the person has prior knowledge of the dataset they are dealing with.