You can use **One Vs. All** classification. In One Vs. All classification, you transform the multiclass classification task to multiple binary classification problems.
So for example, if you have 3 classes, you would turn the problem into 3 separate binary classification tasks.
For every individual class *i*, you build a classifier where data points belonging to class *i* are the positive samples, and all the rest of the data points are the negative samples.
To predict the output for a new test data point *x*, you apply all the trained classifiers to point *x*, and predict the label for which the corresponding
classifier outputs the highest confidence score

The R-squared value will always increase as additional features are added to the model. The avoid potentially overfitting, use the **adjusted R-squared** metric instead.

The main difference is in their **penalty functions**. The penalty function in Ridge regression is the sum of the squares of the coefficients, whereas in Lasso regression,
the penalty function is the sum of the absolute values of the coefficients.
Lasso regression can set some of the coefficients to zero, outputting a simpler model with fewer features.

Support Vector Machine, K-Means, Principal Component Analysis, Neural Networks, k-Nearest-Neighbors, and Regularization methods all require these data preprocessing steps.

Both Feature Extraction and Feature Selection attempt to prevent overfitting by simplifying the learned models.
Feature extraction starts from an initial set of data and builds new features which are more informative, using dimensionality reduction techniques such as Principal Component Analysis (PCA)
and Singular Value Decomposition (SVD). Feature Selection on the other hand doesn't create new features, but selects only a subset of the most important and relevant features.

The F1-score is a metric that often comes up in classification tasks where the
dataset is very **imbalanced**, with the majority class occuring much more often than the minority class. In such a case,
accuracy is not a good metric for classification problems.
The F1-score is the harmonic mean of precision and recall (precision is the percentage of results that are relevant, while recall is the percentage of total
relevant results correctly classified by your model).

The test dataset should **never** be impacted by the training dataset preparation, as the test dataset should provide an unbiased evaluation of the final learned model.

Having to join datasets comes up all the time in your job as a data scientist, so companies will check that you know your SQL joins.
In an **Inner Join**, you extract all records that match a specified condition in both tables. Any record in either table that does not match the condition is not reported.
In a **Left Outer Join**, you return all the rows in the left table and the matching rows only from the right table.