Traditionally most machine learning (ML) models use as input features some observations (samples / examples) but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable to **predict** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some examples of time-series include the temperature values over…

**Principal Components Analysis** (PCA) is a well-known **unsupervised** **dimensionality** **reduction** technique that constructs **relevant** features/variables through linear (linear PCA) or non-linear (kernel PCA) **combinations** of the original variables (features). In this post, we will only focus on the famous and widely used **linear PCA** method.

The construction of relevant features is achieved by **linearly transforming correlated variables** into a smaller number of **uncorrelated** variables. This is done by **projecting** (dot product) the original data into the **reduced PCA space** using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The **resulting** **projected** **data** are essentially **linear** **combinations** of…

Everyone has heard about the famous and widely-used **Support Vector Machines** (SVMs). The original SVM algorithm was invented by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963.

**SVMs** are **supervised** machine learning models that are usually employed for **classification** (**SVC **— Support Vector Classification) or **regression** (**SVR **— Support Vector Regression) problems. Depending on the characteristics of target variable (that we wish to predict), our problem is going to be a classification task if we have a **discrete target variable** (e.g. class labels), or a regression task if we have a **continuous target variable** (e.g. house prices).

SVMs are…

K-means is one of the most widely used unsupervised clustering methods.

The **K-means **algorithm clusters the data at hand by trying to separate samples into **K** groups of equal variance, minimizing a criterion known as the ** inertia** or

The k-means algorithm divides a set of **N **samples (stored in a data matrix **X**) into **K** disjoint clusters **C**, each described by the mean *μj** *of…

**Time-series forecasting** models are the models that are capable to **predict** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. …

Feature selection is the procedure of selecting a subset (some out of all available) of the input variables that are most relevant to the target variable (that we wish to predict).

**Target variable **here** **refers to the **variable** that we wish to **predict**.

For this article we will assume that we only have numerical input variables and a numerical target for regression predictive modeling. Assuming that, we can easily estimate the **relationship** between each **input** variable and the **target** variable. This relationship can be established by calculating a metric such as the correlation value for example.

The 2 most famous…

In a previous post I explained and showed how Facebook’s Prophet model works. If you missed that have a look here.

Recently, the Facebook Data Science team released a new version of the model the **NeuralProphet****.**

**Disclaimer **(before we move on): There have been attempts to predict stock prices using time series analysis algorithms, though they still cannot be used to place bets in the real market. This is just a tutorial article that does not intend in any way to “direct” people into buying stocks.

Let’s get started.

Traditionally most machine learning (ML) models use as input features some…

Regression models are used to predict a numerical value (dependent variable) given a set of input variables (independent variables). The most famous model of the family is the linear regression [2].

Linear regression fits a line (or hyperplane) that best describes the linear relationship between some inputs (X) and the target numeric value (y).

However, if the data contains outlier values, the line can become biased, resulting in worse predictive performance. **Robust regression** refers to a family of algorithms that are robust in the presence of outliers [2].

**Principal Components Analysis** (PCA) is a well-known **unsupervised** **dimensionality** **reduction** technique that constructs **relevant** features/variables through linear (linear PCA) or non-linear (kernel PCA) **combinations** of the original variables (features). In this post, we will only focus on the famous and widely used **linear PCA** method.

The construction of relevant features is achieved by **linearly transforming correlated variables** into a smaller number of **uncorrelated** variables. This is done by **projecting** (dot product) the original data into the **reduced PCA space** using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The **resulting** **projected** **data** are essentially **linear** **combinations** of…

Traditionally most machine learning (ML) models use some observations (samples/examples), but there is no **time** **dimension** in the data.

**Time-series forecasting** models are the models that are capable of **predicting** **future values** based on **previously** **observed** **values**. Time-series forecasting is widely used for **non-stationary data**. **Non-stationary data **are called the data whose statistical properties, e.g., the mean and standard deviation, are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called **time-series. **Some time-series examples include the temperature values over time, stock price over time, price…