Data Analyst screening TestBy mfh.officials@gmail.com / January 17, 2025 Data Analyst Screening Test 1 / 20 Which visualization tool is primarily used for creating dashboards? NumPy Tableau Scikit-learn Matplotlib 2 / 20 Which of the following is a feature of NoSQL databases like MongoDB? Schema-based Scalability and flexibility Fixed data types Relational structure 3 / 20 In Python’s pandas library, which method is used to merge DataFrames on a common key? df.merge() df.append() df.concat() df.join() 4 / 20 Which metric is commonly used to evaluate regression models? Accuracy Confusion Matrix R-squared Precision 5 / 20 What type of machine learning algorithm is K-Means? Semi-supervised Reinforcement Unsupervised Supervised 6 / 20 When should you use a boxplot? To display the frequency of categories. To identify relationships between variables. To compare distributions between groups. To calculate probabilities. 7 / 20 Which machine learning library is used for building predictive models in Python? NumPy Scikit-learn Matplotlib Pandas 8 / 20 What does the p-value in a hypothesis test represent? The probability of observing the data, given that the null hypothesis is true. The confidence level of the hypothesis. The margin of error in the dataset. a) The probability of rejecting the null hypothesis when it's trued. 9 / 20 Which statistical test is commonly used to compare means of two groups? ANOVA Chi-square test t-test Regression analysis 10 / 20 What does one-hot encoding do in data preprocessing? Normalizes numerical features. Creates a summary of the dataset. Converts categorical variables into numerical form. Removes duplicate data. 11 / 20 What is overfitting in machine learning? When the model performs poorly on training data but well on test data. When the model fails to train completely. When the model generalizes well on unseen data. When the model performs exceptionally well on training data but poorly on test data. 12 / 20 In a classification problem, which metric indicates the balance between precision and recall? ROC Curve Mean Squared Error F1-Score Accuracy 13 / 20 Which method is best for handling outliers in a dataset? All of the above Replace with the mean or median Drop them Transform data using log or square root 14 / 20 Which correlation coefficient represents the strongest relationship? 0.85 -0.95 -0.5 0 15 / 20 In Power BI, what does DAX stand for? Data Analytics Execution Data Assessment Extension Data Analysis Expression Data Aggregation Expression 16 / 20 Which SQL command is used to remove rows from a table? TRUNCATE DELETE REMOVE DROP 17 / 20 What is the purpose of standardization in data preprocessing? To identify outliers. To remove missing values. To scale data to a fixed range. To make data have a mean of 0 and standard deviation of 1 18 / 20 What does it mean when a dataset is said to have “missing data”? Duplicate rows exist. Values are in the wrong format. Columns have the same values. Some values are undefined or absent. 19 / 20 Which measure indicates the spread of data in a dataset? Mean Mode Median Variance 20 / 20 Which technique is used to reduce the dimensions of a dataset? Data sampling Principal Component Analysis (PCA) Normalization K-Means Clustering Your score is