SOAP Interview Questions and Answers Part 2
Answer:
Machine learning is the form of Artificial Intelligence that deals with system programming and automates data analysis to enable computers to learn and act through experiences without being explicitly programmed.
Answer:
Machine Learning can be mainly divided into three types:
- Supervised Learning: Supervised learning is a type of Machine learning in which the machine needs external supervision to learn from data. The supervised learning models are trained using the labeled dataset. Regression and Classification are the two main problems that can be solved with Supervised Machine Learning
- Unsupervised Learning: It is a type of machine learning in which the machine does not need any external supervision to learn from the data, hence called unsupervised learning. The unsupervised models can be trained using the unlabelled dataset. These are used to solve the Association and Clustering problems.
- Reinforcement Learning: In Reinforcement learning, an agent interacts with its environment by producing actions, and learn with the help of feedback. The feedback is given to the agent in the form of rewards, such as for each good action, he gets a positive reward, and for each bad action, he gets a negative reward. There is no supervision provided to the agent. Q-Learning algorithm is used in reinforcement learning.
Answer:
Q-learning is a popular algorithm used in reinforcement learning. It is based on the Bellman equation. In this algorithm, the agent tries to learn the policies that can provide the best actions to perform for maximizing the rewards under particular circumstances. The agent learns these optimal policies from past experiences.
Answer:
Deep learning is a subset of Machine learning that mimics the working of the human brain. It is inspired by the human brain cells, called neurons, and works on the concept of neural networks to solve complex real-world problems. It is also known as the deep neural network or deep neural learning.
Some real-world applications of deep learning are:
- Adding different colors to the black&white images
- Computer vision
- Text generation
- Deep-Learning Robots, etc.
Answer:
Machine learning is a subset or subfield of Artificial intelligence. It is a way of achieving AI. As both are the two different concepts and the relation between both can be understood as “AI uses different Machine learning algorithms and concepts to solve the complex problems”.
Answer:
The solution for a reinforcement learning problem can be achieved using the Markov decision process or MDP. Hence, MDP is used to formalize the RL problem. It can be said as the mathematical approach to solve a reinforcement learning problem. The main aim of this process is to gain maximum positive rewards by choosing the optimum policy.
Answer:
In machine learning, there are mainly two types of models, Parametric and Non-parametric. The explanation of these models is as follow:
Parametric Model: The parametric models use a fixed number of the parameters to create the ML model. It considers strong assumptions about the data. The examples of the parametric models are Linear regression, Logistic Regression, Naïve Bayes, Perceptron, etc.
Non-Parametric Model: The non-parametric model uses flexible numbers of parameters. It considers a few assumptions about the data. These models are good for higher data and no prior knowledge. The examples of the non-parametric models are Decision Tree, K-Nearest Neighbour, SVM with Gaussian kernels, etc.
Answer:
In machine learning, hyperparameter is the parameters that determine and control the complete training process. The examples of these parameters are Learning rate, Hidden Layers, Hidden units, Activation functions, etc. These parameters are external from the model. The selection of good hyperparameters makes a better algorithm.
When the machine learning algorithm tries to capture all the data points, and hence, as a result, captures noise also, then overfitting occurs in the model. Due to this overfitting issue, the algorithm shows the low bias, but the high variance in the output. Overfitting is one of the main issues in machine learning.
Methods to avoid Overfitting in ML:
- Cross-Validation
- Training With more data
- Regularization
- Ensembling
- Removing Unnecessary Features
- Early Stopping the training.
Answer:
Some popular ways to evaluate the performance of the ML model are:
- Confusion Matrix: It is N*N table with different sets of value that is used to determine the performance of the classification model in machine learning.
- F1 score: It is the harmonic mean of precision and recall, which is used as one of the best metrics to evaluate the ML model.
- Gain and lift charts: Gain & Lift charts are used to determine the rank ordering of the probabilities.
- AUC-ROC curve: The AUC-ROC is another performance metric. The ROC is the plot between the sensitivity.
- Gini Coefficient: It is used in the classification problems, also known as the Gini Index. It determines the inequality between the values of variables. The high value of the Gini represents a good model.
- Root mean squared error: It is one of the most popular metrics used for the evaluation of the regression model. It works by assuming that errors are unbiased and have a normal distribution.
- Cross-Validation: It is another popular technique for evaluating the performance of the machine learning model. In this, the models are trained on subsets of the input data and evaluated on the complementary subset of the data.
Answer:
The three stages of building a machine learning model are:
- Model Building: Choose a suitable algorithm for the model and train it according to the requirement.
- Model Testing: Check the accuracy of the model through the test data.
- Applying the Model: Make the required changes after testing and use the final model for real-time projects.
Here, it’s important to remember that once in a while, the model needs to be checked to make sure it’s working correctly. It should be modified to make sure that it is up-to-date.
Answer:
Following are the comparison between both:
- K-NN is a Supervisedmachine learning while K-means is an unsupervised machine learning.
- K-NN is a classificationor regression machine learning algorithm whereas, K-means is a clustering machine learning algorithm.
- K-NN is a lazy learner.On the other hand, K-Means is an eager learner. An eager learner has a model fitting that means a training step but a lazy learner does not have a training phase.
- K-NN performs much better if all of the data have the same scale but this is not true for K-means.
Answer:
While there is no fixed rule to choose an algorithm for a classification problem, you can follow these guidelines:
- If accuracy is a concern, test different algorithms and cross-validate them.
- If the training dataset is small, use models that have low variance and high bias.
- If the training dataset is large, use models that have high variance and low bias.
Answer:
Classification is used when your target is categorical, while regression is used when your target variable is continuous. Both classification and regression belong to the category of supervised machine learning algorithms.
Kernel SVM is the abbreviated version of the kernel support vector machine. Kernel methods are a class of algorithms for pattern analysis, and the most common one is the kernel SVM.
Answer:
The generalized linear model is the derivative of the ordinary linear regression model. GLM is more flexible in terms of residuals and can be used where linear regression does not seem appropriate. GLM allows the distribution of residuals to be other than a normal distribution. It generalizes the linear regression by allowing the linear model to link to the target variable using the linking function. Model estimation is done using the method of maximum likelihood estimation.
Answer:
Variance Inflation Factor (VIF) is an estimate of the volume of multicollinearity in the collection of regression variables.
It is given as;
VIF = Variance of model / Variance of the model with a single independent variable.
Answer:
Type I error is a false positive. Type I error is claiming something has happened when it hasn’t.
Type II error is a false negative error. Type II error is claiming nothing when in fact something has happened.
Answer:
Sentiment analysis involves the use of natural learning processing to categories opinions expressed usually in a piece of text, in order to determine whether the writer’s attitude is positive, negative or neutral. Machine learning algorithms can be used to help ease the learning process of the bot. These learning algorithms are constantly fed with a massive amount of data so that it can adjust itself and continually improve.
Answer:
Data mining can be described as the process in which the structured data tries to abstract knowledge or interesting unknown patterns. During this process, machine learning algorithms are used. Whereas, Machine learning represents the study, design, and development of the algorithms which provide the ability to the processors to learn without being explicitly programmed.