Data science Screen TestBy mfh.officials@gmail.com / January 7, 2025 Data science Data science Screening test 1 / 20 Which of the following metrics is used to evaluate clustering models? F1-score Precision Silhouette score ROC-AUC 2 / 20 In random forests, what is the primary advantage over a single decision tree? It is more interpretable It uses more training data It reduces variance and improves accuracy by averaging predictions It always uses shallow trees 3 / 20 Which of the following algorithms is most appropriate for predicting a continuous outcome variable? K-nearest neighbors (classification) Decision tree classification Linear regression K-means clustering 4 / 20 What is the purpose of regularization in machine learning? To improve model interpretability To speed up training To reduce overfitting by penalizing large coefficients To increase model complexity 5 / 20 Which of the following libraries is primarily used for deep learning? scikit-learn TensorFlow Matplotlib pandas 6 / 20 Which of the following is a key assumption of the linear regression model? Non-linearity between independent and dependent variables Homoscedasticity Independence of dependent variables Multicollinearity 7 / 20 Which of the following is a disadvantage of the k-nearest neighbors (KNN) algorithm? Answer: B) It requires a large amount of training data It is difficult to interpret It assumes linearity in the data It cannot handle multi-class classification It requires a large amount of training data 8 / 20 Which of the following is the most appropriate way to deal with missing values in a dataset? Replace missing values with the mean or median of the column Replace missing values with zeros Drop all rows with missing values Use a model-based imputation method 9 / 20 Which of the following techniques is used for dimensionality reduction? Decision Trees Principal Component Analysis (PCA) Naive Bayes K-means clustering 10 / 20 In the context of model evaluation, what does the "ROC curve" stand for? Receiver Operating Characteristic Curve Residual Output Curve Root Output Curve Recurrent Operations Curve 11 / 20 Which evaluation metric is most appropriate for imbalanced classification problems? R-squared Accuracy Mean Squared Error Precision and Recall 12 / 20 Which of the following is an example of unsupervised learning? Decision Trees Principal Component Analysis (PCA) Logistic Regressi Linear Regression 13 / 20 The "curse of dimensionality" refers to: The difficulty in finding a suitable machine learning model The increasing complexity of models with more features The tendency of high-dimensional data to become sparse The difficulty of visualizing data in high-dimensional spaces 14 / 20 Cross-validation is primarily used to: Split data into training and testing sets Reduce overfitting and assess model performance Visualize data Tune hyperparameters 15 / 20 Which of the following is TRUE about ensemble methods? They are always more computationally expensive than single models They cannot be used with decision trees They combine multiple weak models to create a stronger model They are less prone to overfitting than single models 16 / 20 In a time series forecasting problem, which of the following is most commonly used to check for stationarity? Shapiro-Wilk test Augmented Dickey-Fuller (ADF) test ACF/PACF plots Durbin-Watson test 17 / 20 2. In a decision tree, the split criterion is typically based on: Information gain or Gini impurity Sum of squared errors Root mean squared error Variance 18 / 20 What is the purpose of the Adam optimizer in neural networks? To reduce the loss function To calculate the gradient of the loss function To adjust the learning rate during training To perform backpropagation 19 / 20 Which of the following is a hyperparameter for the k-means clustering algorithm? Number of clusters (k) Learning rate Regularization strength Activation function 20 / 20 What is the "bias-variance tradeoff"? Increasing model complexity reduces both bias and variance Increasing model complexity does not affect bias or variance Increasing model complexity reduces bias and increases variance Increasing model complexity reduces variance and increases bias Your score is