Top Gotcha Data Science Interview Questions Top Question 1: How can you use binary classification algorithms like Logistic Regression for the multiclass problem, where instead of just 2 different classes, you have n different classes?

You can use One Vs. All classification. In One Vs. All classification, you transform the multiclass classification task to multiple binary classification problems. So for example, if you have 3 classes, you would turn the problem into 3 separate binary classification tasks.
For every individual class i, you build a classifier where data points belonging to class i are the positive samples, and all the rest of the data points are the negative samples.
To predict the output for a new test data point x, you apply all the trained classifiers to point x, and predict the label for which the corresponding classifier outputs the highest confidence score

Top Question 2: What is a potential limitation of using R-squared as your evaluation metric for regression algorithms?

The R-squared value will always increase as additional features are added to the model. The avoid potentially overfitting, use the adjusted R-squared metric instead.

Top Question 3: What is the difference between Lasso Regression and Ridge Regression?

The main difference is in their penalty functions. The penalty function in Ridge regression is the sum of the squares of the coefficients, whereas in Lasso regression, the penalty function is the sum of the absolute values of the coefficients. Lasso regression can set some of the coefficients to zero, outputting a simpler model with fewer features.

Top Question 4: Name 3 methods for which data normalization or standardization are recommended?

Support Vector Machine, K-Means, Principal Component Analysis, Neural Networks, k-Nearest-Neighbors, and Regularization methods all require these data preprocessing steps.

Top Question 5: What is the difference between Feature Extraction and Feature Selection?

Both Feature Extraction and Feature Selection attempt to prevent overfitting by simplifying the learned models. Feature extraction starts from an initial set of data and builds new features which are more informative, using dimensionality reduction techniques such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Feature Selection on the other hand doesn't create new features, but selects only a subset of the most important and relevant features.

Top Question 6: What is the F1-Score, and why is it useful?

The F1-score is a metric that often comes up in classification tasks where the dataset is very imbalanced, with the majority class occuring much more often than the minority class. In such a case, accuracy is not a good metric for classification problems. The F1-score is the harmonic mean of precision and recall (precision is the percentage of results that are relevant, while recall is the percentage of total relevant results correctly classified by your model).

Top Question 7: Should the training and test datasets be preprocessed together?

The test dataset should never be impacted by the training dataset preparation, as the test dataset should provide an unbiased evaluation of the final learned model.

Top Question 8: Explain the difference between a left outer join and an inner join between two datasets?

Having to join datasets comes up all the time in your job as a data scientist, so companies will check that you know your SQL joins.
In an Inner Join, you extract all records that match a specified condition in both tables. Any record in either table that does not match the condition is not reported.
In a Left Outer Join, you return all the rows in the left table and the matching rows only from the right table.