Data Analyst screening TestBy mfh.officials@gmail.com / January 17, 2025 Data Analyst Screening Test 1 / 20 What is overfitting in machine learning? When the model performs exceptionally well on training data but poorly on test data. When the model performs poorly on training data but well on test data. When the model fails to train completely. When the model generalizes well on unseen data. 2 / 20 Which correlation coefficient represents the strongest relationship? -0.5 0.85 0 -0.95 3 / 20 Which metric is commonly used to evaluate regression models? R-squared Precision Confusion Matrix Accuracy 4 / 20 In Python’s pandas library, which method is used to merge DataFrames on a common key? df.append() df.join() df.merge() df.concat() 5 / 20 What does the p-value in a hypothesis test represent? The probability of observing the data, given that the null hypothesis is true. a) The probability of rejecting the null hypothesis when it's trued. The margin of error in the dataset. The confidence level of the hypothesis. 6 / 20 Which measure indicates the spread of data in a dataset? Variance Mean Mode Median 7 / 20 Which method is best for handling outliers in a dataset? Drop them All of the above Replace with the mean or median Transform data using log or square root 8 / 20 Which machine learning library is used for building predictive models in Python? Scikit-learn NumPy Matplotlib Pandas 9 / 20 In a classification problem, which metric indicates the balance between precision and recall? F1-Score ROC Curve Accuracy Mean Squared Error 10 / 20 Which SQL command is used to remove rows from a table? DROP REMOVE TRUNCATE DELETE 11 / 20 Which visualization tool is primarily used for creating dashboards? NumPy Tableau Scikit-learn Matplotlib 12 / 20 What does one-hot encoding do in data preprocessing? Creates a summary of the dataset. Removes duplicate data. Converts categorical variables into numerical form. Normalizes numerical features. 13 / 20 What is the purpose of standardization in data preprocessing? To remove missing values. To make data have a mean of 0 and standard deviation of 1 To scale data to a fixed range. To identify outliers. 14 / 20 What does it mean when a dataset is said to have “missing data”? Some values are undefined or absent. Columns have the same values. Values are in the wrong format. Duplicate rows exist. 15 / 20 In Power BI, what does DAX stand for? Data Aggregation Expression Data Analytics Execution Data Assessment Extension Data Analysis Expression 16 / 20 Which of the following is a feature of NoSQL databases like MongoDB? Fixed data types Relational structure Scalability and flexibility Schema-based 17 / 20 Which statistical test is commonly used to compare means of two groups? Chi-square test Regression analysis ANOVA t-test 18 / 20 What type of machine learning algorithm is K-Means? Semi-supervised Reinforcement Supervised Unsupervised 19 / 20 Which technique is used to reduce the dimensions of a dataset? Principal Component Analysis (PCA) Normalization K-Means Clustering Data sampling 20 / 20 When should you use a boxplot? To display the frequency of categories. To compare distributions between groups. To identify relationships between variables. To calculate probabilities. Your score is