Skip to content

Machine Learning Theory Test


1. What is the purpose of the 'train' phase in machine learning?

a) To test the model
b) To gather data
c) To fit the model to the data
d) To visualize the data

Answer: c) To fit the model to the data


2. Which of the following is not a type of machine learning?
   a) Supervised learning
   b) Unsupervised learning
   c) Reinforcement learning
   d) Dependent learning

Answer: d) Dependent learning


3. What is the purpose of a confusion matrix in machine learning?
   a) To confuse the model
   b) To visualize the performance of an algorithm
   c) To add noise to the data
   d) To reduce the dimensionality of the data

Answer: b) To visualize the performance of an algorithm

Reminder:

A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. It's a 2x2 matrix used in binary classification tasks, which includes 4 outcomes: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

  1. True Positives (TP): The cases in which we predicted YES and the actual output was also YES.
  2. True Negatives (TN): The cases in which we predicted NO and the actual output was NO.
  3. False Positives (FP): The cases in which we predicted YES and the actual output was NO.
  4. False Negatives (FN): The cases in which we predicted NO and the actual output was YES.

The confusion matrix is a measure of the performance of the classification model that gives more insight than just the accuracy score, as it shows the ways in which your classification model is confused when it makes predictions.


4. Which of the following is not a type of data in data science?
   a) Nominal data
   b) Ordinal data
   c) Interval data
   d) Reaction data

Answer: d) Reaction data


Reminder: Nominal data is categorical data that can be labeled and divided into groups but cannot be quantified or ordered. Ordinal data is data that can be categorized and ordered, but the differences between the categories are not quantifiable. Interval data is numerical and can be ordered with exact differences between the values being meaningful.

5. What is the purpose of the 'test' phase in machine learning?
   a) To gather data
   b) To fit the model to the data
   c) To evaluate the model's performance
   d) To visualize the data

Answer: c) To evaluate the model's performance


6. Which of the following is not a method of data preprocessing?
   a) Data cleaning
   b) Data transformation
   c) Data visualization
   d) Data destruction

Answer: d) Data destruction


9. What is the purpose of the 'validation' phase in machine learning?
   a) To gather data
   b) To fit the model to the data
   c) To evaluate the model's performance on unseen data
   d) To visualize the data

Answer: c) To evaluate the model's performance on unseen data

10. Which of the following is not a type of regression in machine learning?
   a) Linear regression
   b) Logistic regression
   c) Polynomial regression
   d) Categorical regression

Answer: d) Categorical regression

Pro tip: Logistic regression is generally understood to be a classifier. Although that is achieved by adding a decision rule, e.g. values mapped above 0.7 are true.


11. Which of the following is not a type of clustering in machine learning?
   a) K-means clustering
   b) Hierarchical clustering
   c) DBSCAN clustering
   d) Random clustering

Answer: d) Random clustering


12. Which of the following is not a type of decision tree in machine learning?
   a) ID3
   b) C4.5
   c) CART
   d) ABCD

Answer: d) ABCD


13. Which of the following is not a type of ensemble method in machine learning?
   a) Bagging
   b) Boosting
   c) Stacking
   d) Clustering

   Answer: d) Clustering

Reminder:

Bagging works by creating multiple subsets of the original data, with replacement, and then training a separate model on each subset. The final prediction is determined by aggregating the predictions of each model, typically through a majority vote for classification problems or an average for regression problems. Each model is trained on a slightly different set of data and thus makes slightly different predictions. Combining these predictions, bagging can often achieve higher accuracy and stability than any individual model.

On the other hand, bagging, which stands for Bootstrap Aggregating, also creates a strong classifier from multiple weak classifiers, but it does so by generating multiple subsets of the original data, training a weak classifier on each, and then combining their predictions. Unlike boosting, bagging runs the weak classifiers in parallel and each one learns from a random subset of the original data. Boosting differs from bagging in the approach to handling misclassified instances. Boosting emphasizes on correcting misclassified instances by increasing their weights in future iterations. Bagging treats all instances equally, not considering their classification status from previous iterations.

Stacking, or Stacked Generalization, involves training multiple different models and then combining their predictions with another machine learning model (a meta-learner) to make a final prediction.

King of ensemble methods:

XGBoost is a machine learning algorithm implementation for supervised problems of gradient boosting machines that is known for its speed and performance. XGBoost is for supervised methods the most efficient machine learning technique implementation resulting in excellent results out of the box with very little effort, making it the first try model for many analytics problems.

Unlike other ensemble methods it has a regularization term in its objective function to control model complexity and reduce overfitting. The implementation is heavily parallelized for both CPU and GPU processing. Unlike other techniques, XGBoost prunes trees backwards after splitting up to the maximum depth, removing sections without positive gain. It also has built-in cross-validation, which permits to find the optimum number of boosting iterations in a single run. Lastly, XGBoost offers flexibility by allowing users to define custom optimization objectives and evaluation criteria.


14. Which of the following is not a type of neural network in machine learning?
   a) Convolutional Neural Network
   b) Recurrent Neural Network
   c) Feedforward Neural Network
   d) Backward Neural Network

Answer: d) Backward Neural Network


15. Which of the following is not a type of distance measure in machine learning?
   a) Euclidean distance
   b) Manhattan distance
   c) Minkowski distance
   d) Polar distance

Answer: d) Polar distance


16. What is the purpose of a dendrogram in data visualization?
   a) To show the distribution of categorical data
   b) To show the hierarchical relationship between objects
   c) To show the distribution of numerical data
   d) To show the trend over time

Answer: b) To show the hierarchical relationship between objects


17. Which of the following is not a type of optimization algorithm in machine learning?
   a) Gradient Descent
   b) Stochastic Gradient Descent
   c) Adam
   d) Linear Descent

Answer: d) Linear Descent


18. Which of the following is not a type of loss function in machine learning?
   a) Mean Squared Error
   b) Cross-Entropy Loss
   c) Hinge Loss
   d) Gain Loss

Answer: d) Gain Loss


19. Which of the following is not a type of activation function in machine learning?
   a) Sigmoid function
   b) ReLU function
   c) Tanh function
   d) Linear function

Answer: d) Linear function


20. What is the purpose of a word cloud in data visualization?
   a) To show the distribution of categorical data
   b) To show the distribution of numerical data
   c) To visualize the frequency of words in a text data
   d) To show the trend over time

Answer: c) To visualize the frequency of words in a text data


21. Which of the following is not a type of feature selection method in machine learning?
   a) Filter methods
   b) Wrapper methods
   c) Embedded methods
   d) Imposed methods

Answer: d) Imposed methods


22. What is the purpose of a correlation matrix in data visualization?
   a) To show the distribution of categorical data
   b) To show the distribution of numerical data
   c) To visualize the correlation between multiple variables
   d) To show the trend over time

Answer: c) To visualize the correlation between multiple variables


23. Which of the following is not a type of dimensionality reduction technique in machine learning?
   a) Principal Component Analysis (PCA)
   b) Linear Discriminant Analysis (LDA)
   c) t-Distributed Stochastic Neighbor Embedding (t-SNE)
   d) Linear Component Analysis (LCA)

Answer: d) Linear Component Analysis (LCA)


24. Which of the following is not a type of data scaling method in machine learning?
   a) Min-Max scaling
   b) Standard scaling
   c) Robust scaling
   d) Random scaling

Answer: d) Random scaling


25. Which of the following is not a type of kernel in Support Vector Machine (SVM)?
   a) Linear kernel
   b) Polynomial kernel
   c) Radial basis function (RBF) kernel
   d) Circular kernel

Answer: d) Circular kernel


26. Which of the following is not a type of cross-validation method in machine learning?
   a) K-Fold cross-validation
   b) Stratified K-Fold cross-validation
   c) Leave-One-Out cross-validation
   d) Enter-One-Out cross-validation

Answer: d) Enter-One-Out cross-validation

Reminder:

a) K-Fold Cross-Validation is a statistical method used to estimate the generalization capacity of machine learning models by dividing the data into 'K' folds where each fold is used once as a testing set while the rest serves as the training set.

b) Stratified K-Fold Cross-Validation is a variation of K-Fold that returns stratified folds made by preserving the percentage of samples for each class in each fold. It's useful for imbalanced dataset where one class is larger than the other.

c) Leave-One-Out Cross-Validation makes 'N' folds in a dataset of 'N' instances. In each fold, one data point is used for testing and the rest for training. Essentially a K-Fold cross-validation with K equals the number of data points.


27. Which of the following is not a type of regularization technique in machine learning?
   a) L1 regularization (Lasso)
   b) L2 regularization (Ridge)
   c) Elastic Net regularization
   d) L3 regularization

Answer: d) L3 regularization