Data science Screen TestBy mfh.officials@gmail.com / January 7, 2025 Data science Data science Screening test 1 / 20 Which of the following is TRUE about ensemble methods? They cannot be used with decision trees They combine multiple weak models to create a stronger model They are less prone to overfitting than single models They are always more computationally expensive than single models 2 / 20 What is the purpose of the Adam optimizer in neural networks? To perform backpropagation To adjust the learning rate during training To calculate the gradient of the loss function To reduce the loss function 3 / 20 Which of the following techniques is used for dimensionality reduction? Principal Component Analysis (PCA) Decision Trees K-means clustering Naive Bayes 4 / 20 Which of the following is an example of unsupervised learning? Principal Component Analysis (PCA) Linear Regression Logistic Regressi Decision Trees 5 / 20 2. In a decision tree, the split criterion is typically based on: Variance Sum of squared errors Root mean squared error Information gain or Gini impurity 6 / 20 Which of the following is the most appropriate way to deal with missing values in a dataset? Replace missing values with the mean or median of the column Drop all rows with missing values Replace missing values with zeros Use a model-based imputation method 7 / 20 Which of the following libraries is primarily used for deep learning? TensorFlow scikit-learn Matplotlib pandas 8 / 20 What is the "bias-variance tradeoff"? Increasing model complexity reduces both bias and variance Increasing model complexity reduces bias and increases variance Increasing model complexity reduces variance and increases bias Increasing model complexity does not affect bias or variance 9 / 20 Which of the following is a disadvantage of the k-nearest neighbors (KNN) algorithm? Answer: B) It requires a large amount of training data It cannot handle multi-class classification It is difficult to interpret It assumes linearity in the data It requires a large amount of training data 10 / 20 What is the purpose of regularization in machine learning? To speed up training To increase model complexity To improve model interpretability To reduce overfitting by penalizing large coefficients 11 / 20 Which of the following metrics is used to evaluate clustering models? Silhouette score Precision F1-score ROC-AUC 12 / 20 In the context of model evaluation, what does the "ROC curve" stand for? Root Output Curve Recurrent Operations Curve Residual Output Curve Receiver Operating Characteristic Curve 13 / 20 Which evaluation metric is most appropriate for imbalanced classification problems? R-squared Precision and Recall Accuracy Mean Squared Error 14 / 20 Which of the following is a hyperparameter for the k-means clustering algorithm? Activation function Learning rate Number of clusters (k) Regularization strength 15 / 20 In a time series forecasting problem, which of the following is most commonly used to check for stationarity? Augmented Dickey-Fuller (ADF) test Shapiro-Wilk test ACF/PACF plots Durbin-Watson test 16 / 20 Which of the following is a key assumption of the linear regression model? Independence of dependent variables Multicollinearity Homoscedasticity Non-linearity between independent and dependent variables 17 / 20 Which of the following algorithms is most appropriate for predicting a continuous outcome variable? Linear regression K-means clustering K-nearest neighbors (classification) Decision tree classification 18 / 20 The "curse of dimensionality" refers to: The increasing complexity of models with more features The difficulty of visualizing data in high-dimensional spaces The difficulty in finding a suitable machine learning model The tendency of high-dimensional data to become sparse 19 / 20 In random forests, what is the primary advantage over a single decision tree? It is more interpretable It always uses shallow trees It uses more training data It reduces variance and improves accuracy by averaging predictions 20 / 20 Cross-validation is primarily used to: Reduce overfitting and assess model performance Split data into training and testing sets Tune hyperparameters Visualize data Your score is