Time Series Forecasting using Facebook Prophet

Time series analysis isan approach to analyse timely historical data to extract meaningfulcharacteristics and generate other useful insights applied in businesses.Generally, time-series data is a sequence of observations stored in time order.It helps understand time based patterns of set of data points which arecritical for any business. Techniques of time series forecasting could answer business questions like what level of inventory to maintain, how much websitetraffic can you expect in your e-store, to how many products will be sold in the next month. All of these are important time series problems to solve. Foran instance, large organisations like Facebook and Google must engage incapacity planning to allocate scarce resources and goal setting with respect tohigh increase of their users. The basic objective of time series analysis usually is to determine a model that describes the pattern of the time series and could be used for future forecasting.

Classical time series forecasting techniques are built on statistical models which require a lot ofeffort to tune models in order to get high accuracy. The person has to tune theparameters of the method with regards to the specific problem when a forecasting model doesn’t perform as expected. Tuning these methods requires athorough understanding of how the underlying time series models work. It’s difficult for some organisations to handle that level of forecasting withoutdata science teams. And it might not seem profitable for an organisation to have a bunch of expects on board if there is no need a build a complex forecasting platform or other services.

Why Facebook prophet?

Facebook developed"Prophet", an open source forecasting tool available in both Pythonand R. It provides intuitive parameters which are easy to tune. Evensomeone who lacks a deep expertise in time-series forecasting models can usethis to generate meaningful predictions for a variety of problems in business scenarios.

Facebook Prophet official logo

Excerpt from FacebookProphet website:

“ Producing high quality forecasts is not an easy problem for either machines or for most analysts. We have observed two main themes in the practice of creating a variety of businessforecasts:

· Completely automatic forecasting techniques can be brittle and they are often too inflexible to incorporate useful assumptions or heuristics.

· Analysts who can product high quality forecasts are quite rare because forecasting is a specialised data science skill requiring substantial experience. ”

Highlights of Facebook Prophet

· Very fast,since it’s built in Stan, the codetranslates easily between R and Python.

· An additive regression model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects:

o A piece-wise linear or logistic growth curve trend.Prophet automatically detects changes in trends by selecting change points from the data

o A yearly seasonal component modeled using Fourier series

o A weekly seasonal component using dummy variables

o A user-provided list of important holidays.

o Ability to add additional regressors to the model.

· Robust tomissing data and shifts in the trend, and typically handles outliers .

· Easy procedure to tweak and adjust forecast while adding domain knowledge or business insights.

The Prophet Forecasting Model

Prophet builds a modelby finding a best smooth line which can be represented as a sum of the following components:

y(t) = g(t)+ s(t) + h(t) + ϵₜ

· g(t) –Overall growth trend.

· s(t) –Periodic changes (e.g., weakly and yearly seasonality)

· h(t) – Holidays effects which occur on irregular schedules

· ϵₜ – Errorterm (Any idiosyncratic changes which are not accommodated by the model)

In this blog post, wewill see some of the useful functions present in the library fbprophet by training a basic prophet model using an example data set. Inthe following tutorial, the following topics will be covered.

1. Installing & importing the dependencies

2. Reading and preprocessing data

3. Model Fitting

4. Obtainingthe forecasts

5. Evaluatingthe model

1. Installing the packages

Since Python is used as the programming language here, the ways how the prophet package can be installed in the Python environment are mentioned below.

Just like every Pythonlibrary, you can install fbprophet using pip. The major dependency that Prophet has is pystan.Install pystan with pip before using pip to install fbprophet.

pip install pystan
pip install fbprophet

You can also install prophet in your conda environment.

conda install -c conda-forge fbprophet

After installation,let’s get started!

After setting up your Python environment with the dependencies installed, let’s import the required Python libraries including fbprophet which will be useful on our way to do thefuture forecasting.

import pyodbc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from fbprophet import Prophet

2. Loading and preprocessing data

The data set is thenloaded as a pandas dataframe. Here the dataset contains daily page views forthe Wikipedia page for PeytonManning. The dataset has been modified for the representation purposes for this article. You can access the data set and the source code here.

df =pd.read_csv("peyton_manning.csv")

When the followingcommands executed in order to see the first 10 and last 10 tuples of thedataframe, it appears as follows. As you see, it consists of two columns “Date”and “Views” where number of views of each date has been recorded. This datasethas records from year 2007 to 2016. The number of rows and columns in thedataset can be obtained using the Python command and it outputs as

First 10 and last 10 tuples in the dataset

First the date column should be converted into “Datetime” format before fitting out data set into the model.

df['Date'] =pd.to_datetime(df['Date'])
df.dtypes

Output:

Date datetime64[ns]
Views int64
dtype: object

We can visually represent the variation of data using the plot function in Matplotlib.

df.plot(x = 'Date')

Taking the date columnas the x axis, the above variation can be obtained which is not stationaryby the appearance. The curve is more rightly skewed and the data does not lookmuch cleaner. In order to fit the data into the model, there should be astationary variation of data in the data set. This can be achieved in mainly intwo ways.

· Taking Difference

o df.diff

o yt = yt -y(t-1)

o df[‘diff’] = df[‘a’] – df[‘a’].shift(1)

· LogTransformation :to stabilize the non consistence values

o using numpy.log()

In this tutorial, thelog transformation has been applied to all the values in Views column.

df['Views'] =np.log(df['Views'])

When the plot is obtained again, the data appears to be stationary.

Before fitting our model using the peyton manning dataset, the ‘date’ and ‘views’ columns should berenamed as ‘ds’ and ‘y’ respectively. This is a standard that is introduced byprophet.

df.columns = ['ds','y']

When this is done, weare good to go ahead and train our prophet model.

3. Model Fitting

We fit the model byinstantiating a new Prophet object. Any settings to the forecasting procedure arepassed into the constructor. Then you call its fit method andpass in the preprocessed dataset with historical data.

model = Prophet()
model.fit(df)

Predictions are thenmade on a dataframe with a column ds containingthe dates for which a prediction is to be made. You can get a suitable data frame that extends into the future a specified number of days using the helper method Prophet.make_future_dataframe. By default it will also include the dates from thehistory, so we will see the model fit as well. The number of future dates to be predicted can be specified by the parameter “periods”.

future_dates =model.make_future_dataframe(periods=365)

In the peyton manning dataset, it contains records from 2007 to 2016. If you examine the last tuplesof the future_dates data frame, it now consists dates from 2017 which are to be included in the forecast of the model.

4. Obtaining the forecasts

The predict methodwill assign each row in future_dates a predicted value which it names yhat. Ifyou pass in historical dates, it will provide an in-sample fit. The prediction object here is a new dataframe that includes a column yhat withthe forecast, as well as columns for components and uncertainty intervals.

prediction =model.predict(future_dates)
model.plot(prediction)

When you plot the prediction, it is illustrated as follows.

In the above figure,black dots are the actual datapoints. Dark blue colour area is the trendvariation of the data which has been predicted for the 2016-2017 period(indicated by red arrow) by the prophet model. The light blue regions represent the range of bounding boxes yhat_upper and yhat_lower.

You can also see theforecast components using the Prophet.plot_components method. By default you’ll see the trend, yearly seasonality, and weekly seasonality of the time series. If you include holidays, you’ll see those here, too.

model.plot_components(prediction)

5. Evaluating the model

Once the forecast isobtained from the model, the accuracy of the model has to be measured using arelevant performance metric. Prophet includes an inbuilt function in order tocarry out cross validation to measure forecast error using historical data. Theforecast horizon (horizon), initial training period (initial) andthe spacing between cutoff dates (period) should bespecified.

Here cross-validation isdone to assess prediction performance on a horizon of 365 days, starting with730 days of training data in the first cutoff and then making predictions every180 days. On this 8 year time series, this corresponds to 11 total forecasts.Thus the performance metrics can be calculated

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(model, initial='730 days',period='180 days', horizon = '365 days')

Thus the performance metrics can be calculated. The statistics computed are mean squared error(MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE) and coverage of theyhat_lower and yhat_upper estimates.

from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)

‍

Cross validation performance metrics can be visualized with plot_cross_validation_metric, here shown for MAPE. Dots show the absolute percent error foreach prediction in df_cv. The blue line shows the MAPE, where the mean is takenover a rolling window of the dots. We see for this forecast that errors around5% are typical for predictions one month into the future, and that errors increase up to around 11% for predictions that are a year out.

from fbprophet.plotimport plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')

It can also be visualised for the other metrics such as rmse, mae and mse which have beenalready done in the complete code. You can access the source code for thistutorial here.

Summary

There are manytime-series models such as ARIMA, exponential smoothing, snaive …etc which canbe used for forecasting from historical data. From the practical example, itseems that Prophet provides completely automated forecasts just as its officialdocument states. It’s fast and productive which would be very useful if yourorganisation doesn’t have a very solid data science team handing predictiveanalytics. It saves your time to answer internal stakeholder’s or client’sforecasting questions without spending too much effort to build an amazingmodel based on classic time-series modeling techniques.