AI/ML

AI applications

One of the famous applications of AI is google predictive search engine. when you begin typing search term and google make recommendations to you to chose from, that is AI in action.

Predictive search is based on data that google collects about you such as your browser history, your location, age and other personal details. So by using AI google attempts to guess what you are tmight be trying to find.

Finance sector, JPMorgans Chase contract intelligence platform uses ML, AI and image recognition software to analyze legal docs. Manually reviewing 12,000 agreements take 36000 hours whereas if we replace with AI that took just some seconds.

Healthcare: IBM developed AI software specifically for medicine.

AI is divided into three categories

Artificial Narrow intelligence also known as Weak AI involves applying AI only to specific tasks.
Artificial general intelligence
Artificial super intelligence

AI programming languages:

- Python

- R

- Java

- Lisp (Developed by John Macarthy)

- Prolog

Machine Learning: The difference between Machine learning and AI is ML is used in AI. It is a method where you feed lot of data to machine and make it learn.

ML definitions

Algorithm: A set of rules and statistical techniques used to learn patterns from data.

Model: A model is trained using ML algorithms.

Predictor variable: It is feature of data that is used to predict the variable. Ex: If we want to find height of the person using the weight then weight is the predictor variable.

Response variable: It is the feature or output variable that needs to be predicted by using predictor variables. From above example, height is the response variable.

Training data: The ML model is built using the training data.

Testing data: The ML model is evaluated using the testing data.

ML process: It involves building a predictive model that can be used to find a solution for a problem statement.

We have to choose ML algorithm based on the problem statement we have. We have N number of ML algorithms but we have to chose based ion problem we solving.

Types of Machine Learning:

Supervised learning: is a technique in which we teach or train the machine using data which is well labelled. The entire training dataset is labelled.

Unsupervised learning: is the training of machine using information that is unlabeled and allowing the algorithm to act on that information without guidance.
Reinforcement learning: is a part of ML where an agent is put in an environment and he learns to behave in this environment by performing certain actions and observing the rewards which it gets from those actions. This is used in Advanced ML such as self driving cars and AlphaGo

Differences between ML types.

Types Of Problems Solved Using Machine Learning

In ML all problems are classified into 3 types.

Regression
Classification
Clustering

Supervised learning algorithm:

Linear Regression:

If you want to predict the price of stock over a period of time. We can use Linear regression by studying relationship between stock price which is dependent variable and time which is independent variable.

stock price: dependent variable or output variable
Time: predictor variable or independent variable.
stock price: is a continuous quantity because it has infinite number of values.
In above diagram, Y is dependent variable and X is independent variable

Equation for linear line in Math: Y = mx + c

Logistic Regression:

Mathematical equation

The output can be ranging from 0 and 1, that's why we have sigmoid curve.

In logistic regression, the output must be a probability between 0 and 1.

Why Use the Sigmoid Curve in Logistic Regression?

1. Converts any value → probability (between 0 and 1)

Perfect for binary classification:

Email is spam / not spam
Fraud / not fraud
Loan approved / rejected

2. Output interpretable as probability

Example:

Output = 0.89 → 89% chance of belonging to class 1

Output = 0.12 → 12% chance of class 1

Why Logistic Regression Uses Sigmoid Instead of Linear?

A linear model:

Can output negative numbers
Can output > 1
Not usable as probability

Sigmoid fixes this.

Decision Tree

ID3 algorithm: stands for Iterative Dichotomiser 3 algorithm which is one of the most effective algorithm used to build decision tree.

Select the best attribute A as predictor variable.

Assign this predictor variable A as root node.

What is the best attribute?

It is the one that separates the data into different classes most effectively or its basically a feature that best splits the data set.

How to best split the data?

There are two measures. one is information gain and other is entropy.

Information gain and entropy are basically two measures that are used to decide which variable is assigned to root node of the decision tree.

Entropy: measures impurity and uncertainty present in the data.

Information gain: indicates how much information a particular feature or variable gives us the final outcome.

The variable with higher information gain best divides the data into desired output classes.

Calculating information gain for Road Type. For right side the entropy is 0 because there is no uncertainty but on left side there is some uncertainty since there are two distinct values slow and fast.

Calculating the entropy of children with the weighted average

Similarly calculate information gain for other predictor variables such as Obstruction and speed limit as well.

If the information gain is high for any predictor variable then that is going to be the root node.

Here speed limit has higher information gain so its assigned to root node.

Random Forest

Overfitting occurs when a model studies training data to such an extent that it negatively influences the performance of the model on new data. To avoid this issue, bagging is used.

Naive Bayes

KNN (K Nearest Neighbor)

Support Vector Machine

Hyperplane is the decision boundary that best sperate two classes.

SVM is used to classify data by using hyperplane such that the distance between hyperplane and support vector is maximum. Now this distance is nothing but the margin.

In above picture the nearest things around the hyperplane are known as support vectors.

Problem Statement

For suppose we introduce a new data point (blue), we draw a hyperplane and choose a max distance between hyperplane and support vectors and that is the optimal hyperplane

Here our data is linearly separable which means that you could draw straight line to separate two classes but what you will do if data looks like this

This is exactly when non linear SVM comes into picture. This is what the Kernel trick is all about. Kernel is basically used to transform data into another dimension that has clear dividing margin between classes of data. So basically Kernel function offers the user the option of transforming non linear spaces into linear ones.

Until this point we are plotting the data on two dimensional space. We had x and y axis. A simple trick is transforming the two variables x and y into a new feature space which involves new variable Z. So basically we are visualizing the data in 3 dimensional space. So when you transform 2D space into 3D space you can clearly see a dividing margin between two classes of data. You can clearly draw a line in the middle to separate two data sets. See below.

Unsupervised Learning Algorithms

These are used to solve clustering problems

K-Means Clustering: is to group similar elements or data points into cluster.

Group1 is between of age 18 and 22

Group 2 is between of age 23 and 35

Group 3 is between of age 36 and 39

Lets say you want to cluster people based on their age. For such problems we can use this algorithm

Randomly select one of the datapoint as centroid in each cluster.

K denotes number of clusters we form out of datapoints.

start computing distance from centroid to every other data point within the cluster.

As you are computing the centroid, the distance between centroid and other data points in cluster, your centroid keeps shifting because you are trying to get to average of that cluster.

Above is the way the K-Means works.

The Elbow Method:

As number of clusters increase the distortion will also decrease. The idea of Elbow method is to choose K where the distortion decreases abruptly. This is how we find K value.

Reinforcement learning

Basic terminologies of RI

Example usecase:

Exploration: is about exploring and capturing more information about an environment

Exploitation: is about using the already known exploited information to heighten the rewards.

Markov decision process:

Policy based learning

There are also Value based learning and action based learning. Value based is to maximize the rewards .

AI vs ML vs DL:

AI: is basically the science of getting the machines to mimic the behavior of human beings.

Machine Learning: is a subset of AI that focuses on getting machines to make decisions by feeding them data.

Deep learning: is subset of ML that uses the concept of neural networks to solve complex problems.

All these three are interconnected fields. ML and DL aids AI by providing set of algorithms and neural networks to solve data driven problems.

Limitations of ML:

is not capable of handling high dimensional data. This is where input and output is very large. so handling and processing such type of data become very complex and it takes up lot of resources.

First one is one dimensional entity (straight line) where we need to search for coin.

Second one is two dimensional entity (two square yards )

Third one is a cube where we need to search for the coin.

So as your dimension increases the problem becomes more complex.

So the high dimensional data can be found easily in image processing, Natural language processing, image translation and so on. So that is why ML is limited, it can not be used for image processing since they have lot of pixels and have lot of high dimensional data.

Deep learning:

In deep learning feature extraction happens automatically. you need very little guidance from the programmer. Deep learning will learn the model and understand which feature or which variable is important in predicting the outcome. Lets say you have millions of predictor variables in a problem statement so its not possible to sit down and understand significance of all those predictor variables.

So if there is high dimensionality data and lot of predictor variables we use deep learning.

How does deep learning works?

The main aim was to re-engineer the human brain. Deep learning studies the basic unit of a brain called brain cell or neuron. So basically deep learning is inspired from our brain structure. In our brains we have something called neurons and these neurons are replicated in deep learning as artificial neurons which are also called perceptrons.

Lets understand the functionality of biological neurons first.

dendrites: These are basically used to receive inputs. These inputs are found in cell body and are passed to another biological neuron. So similar to this perceptron or artificial neuron receives multiple inputs and applies various transformations and functions and provides us an output. These multiple inputs are nothing but input variables. Youa re feeding input data to artificial neuron and this artificial neuron or perceptron apply various functions and transformations and will give you an output. So we build something like network of artificial neurons called artificial neural networks. So thats the basic concept behind deep learning.

There can be N number of hidden layers.

Below is the image recognition using deep learning

Single layer Perceptron

A single layer perceptron is a linear or binary classifier. It is used mainly in supervised learning and it helps to classify the given input data into separate classes. So this diagram basically represents a perceptron.

Perceptron has multiple inputs labelled X1 and X2.
Each input has given a specific weight. W1 represents the weight of the input X1.
Perceptron will compute some functions on these weighted inputs and it will give you the output. So these weighted inputs go through something known as Summation.
After summation is done this is passed on to Transfer function. Its nothing but an activation function.
From activation function we get outputs Y1, Y2 and so on.

All inputs will be multiplied with weights. X1 multiplied by W1.

Then Summation of all those multiplied weights happen.

Then apply correct activation function or transfer function.

The neuron becomes active only after some threshold is reached. That threshold is known as activation protection.

Weights and bias: why do we assign weights to each of these inputs? Basically weightage of input denotes the importance of the input.

Problem statement

Assign weights and threshold calculation

Limitations: There are no hidden layers and its only Single layer.

Complex problems cant be solved using this. so we go with multi layer perceptron with back propogation.

Multi Layer Perceptron:

Below is how the multi layer perceptron works. There will be one or more hidden layers.

Back Propogation:

In beginning you are going to assign some weights to each of the input.

Now these inputs will go through the activation function and it goes through all the hidden layers and give us an output.

Now when you get the output the output is not very precise.

So what you do is you propagate backward and you start updating your weights in such a way the error is as minimum as possible.

Limitations of Feed Forward Network

It can not be used in usecases such as where you have to predict outcome based on previous outcome. In lot of usecases your previous output will also determine next output. For such cases you can not make use of Feed forward network. So the modification is to make so that your network can learn from your previous mistakes. So the solution to this Recurrent Neural networks

Recurrent Neural Networks: are type of artifical neural network designed to recognize patterns in sequences of data such as text, genomes, handwriting, the spoken word or numerical time series data emanating from sensors, stock markets and government agencies.

Its very important part of deep learning. It has applications in lot of domains. In time series and stock markets they are mainly using RNN.

So the model is trained from data it can obtain from the previous excercise, the output from the model is extreamly accurate.

Convolutional Neural Networks

We can not use fully connected networks when it comes to convolutional neural networks.

consider first input image. The image is of 28*28*3 and we get 2352 weights in first hidden layer itself. Similarly it goes the same with second image as well. This leads to something known as overfitting because all of the hidden layers are massively connected. There is a connection between each and every node. We have way too much of data and too many neurons. this is why we have convolutional neural networks.

Natural Language Processing

Need for Text mining and NLP:

Before we understand what is text mining and Natural Language Processing, we have to understand the need for Text mining and NLP. It is because of amount of data we are generating during this time. There are around 2.5 quintillion bytes of data that is created everyday. With evolution of socila media day by day we generate tons and tons of data. In the above picture the data generated is for every minute.

Out of all this data only 21% of data is structured and well formatted and remaining is unstructured data and major source of unstructured data includes text messages from WhatsApp, Facebook likes, comments in Instagram, bulk emails that we send out every single day.

So what can be done with so much of these data. The data that we generate help grow businesses. By analyzing and mining the data we can add more value to business. this is all about text mining is about.

Text mining is all about processing unstructured data to draw useful insights that help grow businesses.

Text mining is vast field which use NLP in order to perform text analysis and text mining. So NLP is part of text mining.

What is NLP?

Its a component of text mining which helps machine in reading the text. Machines do not understand natural languages like english or french and they interpret data in form of zeros and ones.

Sentimental analysis

Terminologies of NLP:

Tokenization:

Stemming:

Inorder to overcome limitaitons of Stemming, we use Lemmitization. Sometimes stemming does not produce a word that has any meaning. It cuts the word inappropriate. But this problem is solved by Lemmitization.

Lemmitization

Stop Words:

Stop words are how, to, begin, gone, various, and, the. They are not necessarily important to understand the importance of a sentence.

Ex: How to make strawberry recipe.

Here search engines will also do search on How, to, make words which are not needed and these are called stop words and has to be removed. So search engines can focus only on strawberry recipe

Document Term Matrix:

It is basically to understand whether your document contains each of these words. It is a frequency matrix.

Search This Blog

A Blog by Malathi Boggavarapu