In this post I show you how to predict stock prices using a forecasting LSTM model

Figure created by the author.

1. Introduction

1.1. Time-series & forecasting models

Traditionally most machine learning (ML) models use as input features some observations (samples / examples) but there is no time dimension in the data.

Time-series forecasting models are the models that are capable to predict future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called time-series. Some examples of time-series include the temperature values over…


In this post, I explain what PCA is, when, and why to use it, and how to implement it in Python using scikit-learn. Also, I explain how to get the feature importance after a PCA analysis.

Handmade sketch made by the author.

1. Introduction & Background

Principal Components Analysis (PCA) is a well-known unsupervised dimensionality reduction technique that constructs relevant features/variables through linear (linear PCA) or non-linear (kernel PCA) combinations of the original variables (features). In this post, we will only focus on the famous and widely used linear PCA method.

The construction of relevant features is achieved by linearly transforming correlated variables into a smaller number of uncorrelated variables. This is done by projecting (dot product) the original data into the reduced PCA space using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The resulting projected data are essentially linear combinations of…


In this article I explain the core of the SVMs, why and how to use them. Additionally, I show how to plot the support vectors and the decision boundaries in 2D and 3D.

Handmade sketch made by the author. An SVM illustration.

Introduction

Everyone has heard about the famous and widely-used Support Vector Machines (SVMs). The original SVM algorithm was invented by Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963.

SVMs are supervised machine learning models that are usually employed for classification (SVC — Support Vector Classification) or regression (SVR — Support Vector Regression) problems. Depending on the characteristics of target variable (that we wish to predict), our problem is going to be a classification task if we have a discrete target variable (e.g. class labels), or a regression task if we have a continuous target variable (e.g. house prices).

SVMs are…


Mathematical formulation, Finding the optimum number of clusters and a working example in Python

Image created by the author

Introduction

K-means is one of the most widely used unsupervised clustering methods.

The K-means algorithm clusters the data at hand by trying to separate samples into K groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.

The k-means algorithm divides a set of N samples (stored in a data matrix X) into K disjoint clusters C, each described by the mean μj of…


In this post I show you how to predict the TESLA stock price using a forecasting ARIMA model

ARIMA model performance on the test set

1. Introduction

1.1. Time-series & forecasting models

Time-series forecasting models are the models that are capable to predict future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties e.g. the mean and standard deviation are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called time-series. Some examples of time-series include the temperature values over time, stock price over time, price of a house over time etc. …


In this article I explain what feature selection is and how to perform it before training a regression model in Python.

1. Introduction

What is feature selection ?

Feature selection is the procedure of selecting a subset (some out of all available) of the input variables that are most relevant to the target variable (that we wish to predict).

Target variable here refers to the variable that we wish to predict.

For this article we will assume that we only have numerical input variables and a numerical target for regression predictive modeling. Assuming that, we can easily estimate the relationship between each input variable and the target variable. This relationship can be established by calculating a metric such as the correlation value for example.

2. The main numerical feature selection methods

The 2 most famous…


Data Science, Data Visualization

In this post, I provide a tutorial on how to predict stock prices using a NEW forecasting model publicly available from the Facebook Data Science team: The NeuralProphet !

Figure produced by the author using the model.

1. Introduction

In a previous post I explained and showed how Facebook’s Prophet model works. If you missed that have a look here.

Recently, the Facebook Data Science team released a new version of the model the NeuralProphet.

Disclaimer (before we move on): There have been attempts to predict stock prices using time series analysis algorithms, though they still cannot be used to place bets in the real market. This is just a tutorial article that does not intend in any way to “direct” people into buying stocks.

Let’s get started.

1.1. Time-series & forecasting models

Traditionally most machine learning (ML) models use as input features some…


Machine Learning, Programming

In this article, I explain what robust regression is, using a working example in Python.

1. Introduction

Regression models are used to predict a numerical value (dependent variable) given a set of input variables (independent variables). The most famous model of the family is the linear regression [2].

Linear regression fits a line (or hyperplane) that best describes the linear relationship between some inputs (X) and the target numeric value (y).

However, if the data contains outlier values, the line can become biased, resulting in worse predictive performance. Robust regression refers to a family of algorithms that are robust in the presence of outliers [2].


Data Science, Machine Learning

In this post, I explain what PCA is, when, and why to use it, and how to implement it in Python using scikit-learn. Also, I explain how to get the feature importance after a PCA analysis.

Handmade sketch made by the author.

1. Introduction & Background

Principal Components Analysis (PCA) is a well-known unsupervised dimensionality reduction technique that constructs relevant features/variables through linear (linear PCA) or non-linear (kernel PCA) combinations of the original variables (features). In this post, we will only focus on the famous and widely used linear PCA method.

The construction of relevant features is achieved by linearly transforming correlated variables into a smaller number of uncorrelated variables. This is done by projecting (dot product) the original data into the reduced PCA space using the eigenvectors of the covariance/correlation matrix aka the principal components (PCs).

The resulting projected data are essentially linear combinations of…


Data Visualization, Deep Learning

In this post, I show you how to predict stock prices using a forecasting LSTM model

Figure created by the author.

1. Introduction

1.1. Time-series & forecasting models

Traditionally most machine learning (ML) models use some observations (samples/examples), but there is no time dimension in the data.

Time-series forecasting models are the models that are capable of predicting future values based on previously observed values. Time-series forecasting is widely used for non-stationary data. Non-stationary data are called the data whose statistical properties, e.g., the mean and standard deviation, are not constant over time but instead, these metrics vary over time.

These non-stationary input data (used as input to these models) are usually called time-series. Some time-series examples include the temperature values over time, stock price over time, price…

Serafeim Loukas

Postdoctoral researcher at University of Geneva & University Hospital of Bern. I hold a PhD, a MSc, and a M.Eng.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store