Make sure to be prepared with at least one algorithm you can describe in great detail. Be familiar with **how the algorithm works**,
with what **use cases** it's best for, with **potential caveats** the algorithm
has, and with what **data preprocessing steps** are necessary for the algorithm to perform well.

- bias is a measure of a model's error
- variance is a measure of how an estimate changes given different training data
- there is no trade-off between bias and variance
- an ideal model has high bias and low variance

The correct answer is **C**. There is *always* a trade-off between bias and variance.

- an overfit model captures the noise in the training data
- overfit models do not generalize well to new unseen data
- overfit models are less likely to occur in models with greater flexibility
- overfitting may be assessed with cross validation techniques

The correct answer is **C**. Models that have high flexibility can suffer from overfitting. Techniques such as regularization, feature selection, feature extraction and cross-validation can help
to prevent overfitting.

- Random Forest
- Decision Trees
- Naive Bayes
- K-Means

The correct answer is **A,B and C**. K-Means is the only unsupervised learning approach in the choices.

Common unsupervised algorithms used in the commercial setting are **k-Means** for clustering,
**Apriori algorithm** for association rule learning,
and **Principal Component Analysis** for dimensionality reduction

Many algorithms require the data to be standardized or normalized. Further, many algorithms cannot handle missing values, so missing-data imputation is necessary.
In addition, often categorical values have to be encoded to numerical values using techniques such as one-hot encoding and label encoding.

For a categorical feature, you could impute a missing value with the mode
of the feature, whereas for numerical features, you could impute the missing values with the mean or the median of the feature.
You could also use the k-Nearest Neighbors algorithm to
find a data point's *k* closest neighbors and impute the value based on the values in the point's neighborhood.

- lasso regularization may improve model interpretability
- smaller shrinkage tuning parameter values increase the impact of regularization
- ridge regularization shrinks all coefficient values
- regularization may help prevent overfitting

The correct answer is **B**. Smaller shrinkage tuning parameter values **decrease** the impact of regularization

- adding correlated features
- regularization
- measuring correlation between predictors
- measuring multicollinearity with the variance inflation factor

The correct answer is **A**. Adding correlated features to the model will only worsen the multicollinearity problem.

You should divide your data into **train/validation/test splits**. You use the training set to learn the model, the validation set to tune the model, and the test set
to determine the out-of-sample performance of the model. A good starting point is to use 60% of the data for training,
20% for validation, and 20% for testing. You can also use cross-validation to build multiple train/validation splits.

- estimate test error in the training data
- detect model overfitting
- utilization of more data for training while maintaining enough data for validation
- computational efficiency of using more folds

The correct answer is **D**. Increasing the number of cross validation folds increases the computational cost of performing validation.

- Using accuracy as the performance metric
- Up sampling the minority class
- Down sampling the majority class
- Using metrics such as area under the receiver operating characteristic curve

The correct answer is **A**. You should not use accuracy as a metric when your data is unbalanced.

- sensitivity measures the proportion of positive cases that were correctly identified
- sensitivity is the same measure as recall
- sensitivity is the true positive rate
- sensitivity is the same measure as precision

The correct answer is **D**.

- area under the receiver operating characteristic curve
- accuracy
- specificity
- harmonic mean of recall and precision

The correct answer is **D**.

- word order and grammar is always accounted for
- text is represented by frequency of each word
- bag of words matrices are often sparse
- TF-IDF is an approach to improve the BOW technique

The correct answer is **A**.

- Random forests use bootstrapping approaches
- Random forests employ bagging approaches
- Error estimates may be made with the out of bag error method
- Random forests are built out sequentially

The correct answer is **D**.

- linearity
- homoscedasticity
- independence of observations
- normality