Regression models are used to predict a numerical value (dependent variable) given a set of input variables (independent variables). The most famous model of the family is the linear regression [2].

Linear regression fits a line (or hyperplane) that best describes the linear relationship between some inputs (X) and the target numeric value (y).

However, if the data contains outlier values, the line can become biased, resulting in worse predictive performance. **Robust regression** refers to a family of algorithms that are robust in the presence of outliers [2].

**Principal Components Analysis** (PCA) is a well-known **unsupervised** **dimensionality** **reduction** technique that constructs **relevant** features/variables through linear (linear PCA) or non-linear (kernel PCA) **combinations** of the original variables (features). In this post, we will only focus on the famous and widely used **linear PCA** method.

The construction of relevant features is achieved by **linearly transforming correlated variables** into a smaller number of **uncorrelated** variables. This is done by **projecting** (dot product) the original data into the **reduced PCA space** using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The **resulting** **projected** **data** are essentially **linear** **combinations** of the **original** data **capturing** **most** **of the variance in the data** (Jolliffe 2002). …

Traditionally most machine learning (ML) models use some observations (samples/examples), but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable of **predicting** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties, e.g., the mean and standard deviation, are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some time-series examples include the temperature values over time, stock price over time, price of house overtime, etc. …

Traditionally most machine learning (ML) models use as input features some observations (samples/examples), but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable of **predicting** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties, e.g., the mean and standard deviation, are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some examples of time-series include the temperature values over time, stock price over time, price of a house over time, etc. …

Traditionally most machine learning (ML) models use as input features some observations (samples/examples), but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable of **predicting** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties, e.g., the mean and standard deviation, are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some examples of time-series include the temperature values over time, stock price over time, price of a house over-time, etc. …

Regression models are used to predict a numerical value (dependent variable) given a set of input variables (independent variables). The most famous model of the family is the linear regression [2].

Linear regression fits a line (or hyperplane) that best describes the linear relationship between some inputs (X) and the target numeric value (y).

However, if the data contains outlier values, the line can become biased, resulting in worse predictive performance. **Robust regression** refers to a family of algorithms that are robust in the presence of outliers [2].

**Introduction****The Naive Bayes algorithm****Dealing with text data****Working Example in Python (step-by-step guide)****Bonus: Having fun with the model****Conclusions**

**Naive Bayes** classifiers are a collection of classification algorithms based on **Bayes’ Theorem**. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

**Naive Bayes** classifiers have been heavily used for **text classification** and **text** **analysis** machine learning **problems**.

**Text Analysis** is a major application field for machine learning algorithms. However the raw data, a sequence of symbols (i.e. strings) cannot be fed directly to the algorithms themselves as most of them expect numerical feature vectors with a fixed size rather than the raw text documents with variable length. …

K-means is one of the most widely used unsupervised clustering methods.

The **K-means **algorithm clusters the data at hand by trying to separate samples into **K** groups of equal variance, minimizing a criterion known as the ** inertia** or

The k-means algorithm divides a set of **N **samples (stored in a data matrix **X**) into **K** disjoint clusters **C**, each described by the mean *μj** *of the samples in the cluster. …

**Time-series forecasting** models are the models that are capable to **predict** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. …

*Note from Towards Data Science’s editors:** While we allow independent authors to publish articles in accordance with our **rules and guidelines**, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our **Reader Terms** for details.*

Traditionally most machine learning (ML) models use as input features some observations (samples / examples) but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable to **predict** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties e.g. …

About