Data science Screen TestBy mfh.officials@gmail.com / January 7, 2025 Data science Data science Screening test 1 / 20 Which of the following is the most appropriate way to deal with missing values in a dataset? Replace missing values with zeros Drop all rows with missing values Replace missing values with the mean or median of the column Use a model-based imputation method 2 / 20 Which of the following is TRUE about ensemble methods? They combine multiple weak models to create a stronger model They cannot be used with decision trees They are less prone to overfitting than single models They are always more computationally expensive than single models 3 / 20 What is the "bias-variance tradeoff"? Increasing model complexity reduces bias and increases variance Increasing model complexity does not affect bias or variance Increasing model complexity reduces variance and increases bias Increasing model complexity reduces both bias and variance 4 / 20 What is the purpose of the Adam optimizer in neural networks? To adjust the learning rate during training To reduce the loss function To calculate the gradient of the loss function To perform backpropagation 5 / 20 Which of the following techniques is used for dimensionality reduction? Naive Bayes Decision Trees Principal Component Analysis (PCA) K-means clustering 6 / 20 The "curse of dimensionality" refers to: The difficulty of visualizing data in high-dimensional spaces The tendency of high-dimensional data to become sparse The increasing complexity of models with more features The difficulty in finding a suitable machine learning model 7 / 20 Which of the following is a hyperparameter for the k-means clustering algorithm? Learning rate Activation function Regularization strength Number of clusters (k) 8 / 20 In random forests, what is the primary advantage over a single decision tree? It uses more training data It reduces variance and improves accuracy by averaging predictions It always uses shallow trees It is more interpretable 9 / 20 Which of the following algorithms is most appropriate for predicting a continuous outcome variable? Decision tree classification K-means clustering K-nearest neighbors (classification) Linear regression 10 / 20 In the context of model evaluation, what does the "ROC curve" stand for? Root Output Curve Recurrent Operations Curve Receiver Operating Characteristic Curve Residual Output Curve 11 / 20 Which of the following is a disadvantage of the k-nearest neighbors (KNN) algorithm? Answer: B) It requires a large amount of training data It requires a large amount of training data It assumes linearity in the data It cannot handle multi-class classification It is difficult to interpret 12 / 20 Which of the following is an example of unsupervised learning? Logistic Regressi Linear Regression Principal Component Analysis (PCA) Decision Trees 13 / 20 Cross-validation is primarily used to: Tune hyperparameters Split data into training and testing sets Reduce overfitting and assess model performance Visualize data 14 / 20 What is the purpose of regularization in machine learning? To speed up training To reduce overfitting by penalizing large coefficients To improve model interpretability To increase model complexity 15 / 20 Which evaluation metric is most appropriate for imbalanced classification problems? Accuracy Mean Squared Error R-squared Precision and Recall 16 / 20 Which of the following metrics is used to evaluate clustering models? Precision F1-score Silhouette score ROC-AUC 17 / 20 Which of the following libraries is primarily used for deep learning? pandas Matplotlib scikit-learn TensorFlow 18 / 20 Which of the following is a key assumption of the linear regression model? Multicollinearity Independence of dependent variables Non-linearity between independent and dependent variables Homoscedasticity 19 / 20 2. In a decision tree, the split criterion is typically based on: Sum of squared errors Variance Root mean squared error Information gain or Gini impurity 20 / 20 In a time series forecasting problem, which of the following is most commonly used to check for stationarity? ACF/PACF plots Augmented Dickey-Fuller (ADF) test Durbin-Watson test Shapiro-Wilk test Your score is