Machine Learning Theory Test
1. What is the purpose of the 'train' phase in machine learning? a) To test the model b) To gather data c) To fit the model to the data d) To visualize the data
Answer: c) To fit the model to the data
2. Which of the following is not a type of machine learning? a) Supervised learning b) Unsupervised learning c) Reinforcement learning d) Dependent learning
Answer: d) Dependent learning
3. What is the purpose of a confusion matrix in machine learning? a) To confuse the model b) To visualize the performance of an algorithm c) To add noise to the data d) To reduce the dimensionality of the data
Answer: b) To visualize the performance of an algorithm
Reminder:
A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. It's a 2x2 matrix used in binary classification tasks, which includes 4 outcomes: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
- True Positives (TP): The cases in which we predicted YES and the actual output was also YES.
- True Negatives (TN): The cases in which we predicted NO and the actual output was NO.
- False Positives (FP): The cases in which we predicted YES and the actual output was NO.
- False Negatives (FN): The cases in which we predicted NO and the actual output was YES.
The confusion matrix is a measure of the performance of the classification model that gives more insight than just the accuracy score, as it shows the ways in which your classification model is confused when it makes predictions.
4. Which of the following is not a type of data in data science? a) Nominal data b) Ordinal data c) Interval data d) Reaction data
Answer: d) Reaction data
Reminder: Nominal data is categorical data that can be labeled and divided into groups but cannot be quantified or ordered. Ordinal data is data that can be categorized and ordered, but the differences between the categories are not quantifiable. Interval data is numerical and can be ordered with exact differences between the values being meaningful.
5. What is the purpose of the 'test' phase in machine learning? a) To gather data b) To fit the model to the data c) To evaluate the model's performance d) To visualize the data
Answer: c) To evaluate the model's performance
6. Which of the following is not a method of data preprocessing? a) Data cleaning b) Data transformation c) Data visualization d) Data destruction
Answer: d) Data destruction
9. What is the purpose of the 'validation' phase in machine learning? a) To gather data b) To fit the model to the data c) To evaluate the model's performance on unseen data d) To visualize the data Answer: c) To evaluate the model's performance on unseen data
10. Which of the following is not a type of regression in machine learning? a) Linear regression b) Logistic regression c) Polynomial regression d) Categorical regression Answer: d) Categorical regression
Pro tip: Logistic regression is generally understood to be a classifier. Although that is achieved by adding a decision rule, e.g. values mapped above 0.7 are true.
11. Which of the following is not a type of clustering in machine learning? a) K-means clustering b) Hierarchical clustering c) DBSCAN clustering d) Random clustering
Answer: d) Random clustering
12. Which of the following is not a type of decision tree in machine learning? a) ID3 b) C4.5 c) CART d) ABCD
Answer: d) ABCD
13. Which of the following is not a type of ensemble method in machine learning? a) Bagging b) Boosting c) Stacking d) Clustering Answer: d) Clustering
Reminder:
Bagging works by creating multiple subsets of the original data, with replacement, and then training a separate model on each subset. The final prediction is determined by aggregating the predictions of each model, typically through a majority vote for classification problems or an average for regression problems. Each model is trained on a slightly different set of data and thus makes slightly different predictions. Combining these predictions, bagging can often achieve higher accuracy and stability than any individual model.
On the other hand, bagging, which stands for Bootstrap Aggregating, also creates a strong classifier from multiple weak classifiers, but it does so by generating multiple subsets of the original data, training a weak classifier on each, and then combining their predictions. Unlike boosting, bagging runs the weak classifiers in parallel and each one learns from a random subset of the original data. Boosting differs from bagging in the approach to handling misclassified instances. Boosting emphasizes on correcting misclassified instances by increasing their weights in future iterations. Bagging treats all instances equally, not considering their classification status from previous iterations.
Stacking, or Stacked Generalization, involves training multiple different models and then combining their predictions with another machine learning model (a meta-learner) to make a final prediction.
King of ensemble methods:
XGBoost is a machine learning algorithm implementation for supervised problems of gradient boosting machines that is known for its speed and performance. XGBoost is for supervised methods the most efficient machine learning technique implementation resulting in excellent results out of the box with very little effort, making it the first try model for many analytics problems.
Unlike other ensemble methods it has a regularization term in its objective function to control model complexity and reduce overfitting. The implementation is heavily parallelized for both CPU and GPU processing. Unlike other techniques, XGBoost prunes trees backwards after splitting up to the maximum depth, removing sections without positive gain. It also has built-in cross-validation, which permits to find the optimum number of boosting iterations in a single run. Lastly, XGBoost offers flexibility by allowing users to define custom optimization objectives and evaluation criteria.
14. Which of the following is not a type of neural network in machine learning? a) Convolutional Neural Network b) Recurrent Neural Network c) Feedforward Neural Network d) Backward Neural Network
Answer: d) Backward Neural Network
15. Which of the following is not a type of distance measure in machine learning? a) Euclidean distance b) Manhattan distance c) Minkowski distance d) Polar distance
Answer: d) Polar distance
16. What is the purpose of a dendrogram in data visualization? a) To show the distribution of categorical data b) To show the hierarchical relationship between objects c) To show the distribution of numerical data d) To show the trend over time
Answer: b) To show the hierarchical relationship between objects
17. Which of the following is not a type of optimization algorithm in machine learning? a) Gradient Descent b) Stochastic Gradient Descent c) Adam d) Linear Descent
Answer: d) Linear Descent
18. Which of the following is not a type of loss function in machine learning? a) Mean Squared Error b) Cross-Entropy Loss c) Hinge Loss d) Gain Loss
Answer: d) Gain Loss
19. Which of the following is not a type of activation function in machine learning? a) Sigmoid function b) ReLU function c) Tanh function d) Linear function
Answer: d) Linear function
20. What is the purpose of a word cloud in data visualization? a) To show the distribution of categorical data b) To show the distribution of numerical data c) To visualize the frequency of words in a text data d) To show the trend over time
Answer: c) To visualize the frequency of words in a text data
21. Which of the following is not a type of feature selection method in machine learning? a) Filter methods b) Wrapper methods c) Embedded methods d) Imposed methods
Answer: d) Imposed methods
22. What is the purpose of a correlation matrix in data visualization? a) To show the distribution of categorical data b) To show the distribution of numerical data c) To visualize the correlation between multiple variables d) To show the trend over time
Answer: c) To visualize the correlation between multiple variables
23. Which of the following is not a type of dimensionality reduction technique in machine learning? a) Principal Component Analysis (PCA) b) Linear Discriminant Analysis (LDA) c) t-Distributed Stochastic Neighbor Embedding (t-SNE) d) Linear Component Analysis (LCA)
Answer: d) Linear Component Analysis (LCA)
24. Which of the following is not a type of data scaling method in machine learning? a) Min-Max scaling b) Standard scaling c) Robust scaling d) Random scaling
Answer: d) Random scaling
25. Which of the following is not a type of kernel in Support Vector Machine (SVM)? a) Linear kernel b) Polynomial kernel c) Radial basis function (RBF) kernel d) Circular kernel
Answer: d) Circular kernel
26. Which of the following is not a type of cross-validation method in machine learning? a) K-Fold cross-validation b) Stratified K-Fold cross-validation c) Leave-One-Out cross-validation d) Enter-One-Out cross-validation
Answer: d) Enter-One-Out cross-validation
Reminder:
a) K-Fold Cross-Validation is a statistical method used to estimate the generalization capacity of machine learning models by dividing the data into 'K' folds where each fold is used once as a testing set while the rest serves as the training set.
b) Stratified K-Fold Cross-Validation is a variation of K-Fold that returns stratified folds made by preserving the percentage of samples for each class in each fold. It's useful for imbalanced dataset where one class is larger than the other.
c) Leave-One-Out Cross-Validation makes 'N' folds in a dataset of 'N' instances. In each fold, one data point is used for testing and the rest for training. Essentially a K-Fold cross-validation with K equals the number of data points.
27. Which of the following is not a type of regularization technique in machine learning? a) L1 regularization (Lasso) b) L2 regularization (Ridge) c) Elastic Net regularization d) L3 regularization
Answer: d) L3 regularization