Data science Screen Test

By mfh.officials@gmail.com / January 7, 2025

Data science

Data science Screening test

1 / 20

Which of the following metrics is used to evaluate clustering models?

F1-score

Precision

Silhouette score

ROC-AUC

2 / 20

In random forests, what is the primary advantage over a single decision tree?

It is more interpretable

It uses more training data

It reduces variance and improves accuracy by averaging predictions

It always uses shallow trees

3 / 20

Which of the following algorithms is most appropriate for predicting a continuous outcome variable?

K-nearest neighbors (classification)

Decision tree classification

Linear regression

K-means clustering

4 / 20

What is the purpose of regularization in machine learning?

To improve model interpretability

To speed up training

To reduce overfitting by penalizing large coefficients

To increase model complexity

5 / 20

Which of the following libraries is primarily used for deep learning?

scikit-learn

TensorFlow

Matplotlib

pandas

6 / 20

Which of the following is a key assumption of the linear regression model?

Non-linearity between independent and dependent variables

Homoscedasticity

Independence of dependent variables

Multicollinearity

7 / 20

Which of the following is a disadvantage of the k-nearest neighbors (KNN) algorithm?

Answer: B) It requires a large amount of training data

It is difficult to interpret

It assumes linearity in the data

It cannot handle multi-class classification

It requires a large amount of training data

8 / 20

Which of the following is the most appropriate way to deal with missing values in a dataset?

Replace missing values with the mean or median of the column

Replace missing values with zeros

Drop all rows with missing values

Use a model-based imputation method

9 / 20

Which of the following techniques is used for dimensionality reduction?

Decision Trees

Principal Component Analysis (PCA)

Naive Bayes

K-means clustering

10 / 20

In the context of model evaluation, what does the "ROC curve" stand for?

Receiver Operating Characteristic Curve

Residual Output Curve

Root Output Curve

Recurrent Operations Curve

11 / 20

Which evaluation metric is most appropriate for imbalanced classification problems?

R-squared

Accuracy

Mean Squared Error

Precision and Recall

12 / 20

Which of the following is an example of unsupervised learning?

Decision Trees

Principal Component Analysis (PCA)

Logistic Regressi

Linear Regression

13 / 20

The "curse of dimensionality" refers to:

The difficulty in finding a suitable machine learning model

The increasing complexity of models with more features

The tendency of high-dimensional data to become sparse

The difficulty of visualizing data in high-dimensional spaces

14 / 20

Cross-validation is primarily used to:

Split data into training and testing sets

Reduce overfitting and assess model performance

Visualize data

Tune hyperparameters

15 / 20

Which of the following is TRUE about ensemble methods?

They are always more computationally expensive than single models

They cannot be used with decision trees

They combine multiple weak models to create a stronger model

They are less prone to overfitting than single models

16 / 20

In a time series forecasting problem, which of the following is most commonly used to check for stationarity?

Shapiro-Wilk test

Augmented Dickey-Fuller (ADF) test

ACF/PACF plots

Durbin-Watson test

17 / 20

2. In a decision tree, the split criterion is typically based on:

Information gain or Gini impurity

Sum of squared errors

Root mean squared error

Variance

18 / 20

What is the purpose of the Adam optimizer in neural networks?

To reduce the loss function

To calculate the gradient of the loss function

To adjust the learning rate during training

To perform backpropagation

19 / 20

Which of the following is a hyperparameter for the k-means clustering algorithm?

Number of clusters (k)

Learning rate

Regularization strength

Activation function

20 / 20

What is the "bias-variance tradeoff"?

Increasing model complexity reduces both bias and variance

Increasing model complexity does not affect bias or variance

Increasing model complexity reduces bias and increases variance

Increasing model complexity reduces variance and increases bias

Your score is