FORECASTING STORE BUSINESS DATA USING GCN NETWORK

Pranav T P
5 min readOct 18, 2021

Store Item Demand Forecasting combining Graph and Recurrent Structures

Graph Convolutional Network

Time Series forecasting tasks can be carried out following different approaches. The most classical is based on statistical and autoregressive methods. More tricky are the algorithms based on boosting and ensemble where we have to produce a good amount of useful handmade features with rolling periods. On the other side, we can find neural network models that enable more freedom in their development, providing customizable adoption of sequential modeling and much more.

Recurrent and convolutional structure achieve great success in time series forecasting. Interesting approaches in the field are given by the adoption of Transformers and Attention architectures, originally native in the NLP. Uncommon seems to be the usage of graph structures, where we have a network composed of different nodes that are related by some kind of linkage to each other. What we try to do is to use a graphical representation of our time series to produce future forecasts.

In this post, we carry out a sales forecasting task where we make use of graph convolutional neural networks exploiting the nested structure of our data, composed of different sales series of various items in different stores.

THE DATA

The dataset is collected from my Github. The Store Item Demand Forecasting Challenge provides 4 whole years of sales data in a daily format for different items sold in various stores. We have 10 stores and 50 products, for a total of 500 series. Each product is sold in every store. Our scope is to provide accurate future forecasts daily for all the items.

sales for item 10 in each store

The data at our disposal is minimal: only sales amount and numerical encoding of items and stores. This is still enough for us to underline a basic hierarchical structure. All we need to do is to group the series at item levels, in this way we end with 50 groups (items) each composed by 10 series (items sold in each store); an example of a group is depicted in the figure above.

In classical graph networks, all the relevant information is stored in an object called the adjacent matrix. This is a numerical representation of all the linkages present in the data. The adjacent matrix in our context can be retrieved by the correlation matrix calculated on sale sequences of a given item in all stores.

The sequence repartition is fundamental in our approach because we decide to process the data in pieces like for recurrent architecture, which will be also part of our model.

THE MODEL

Our model receives, as input, sequences of sales from all stores and adjacent matrixes obtained from the same sequences. The sequences are passed through LSTM layers, while the correlation matrixes are processed by GraphConvolution layers. They are implemented in Spektral, a cool library for graph deep learning build on Tensorflow. It has various kinds of graph layers available. We use the most basic one, the GraphConvolution. It operates a series of convolution operations between learnable weights, external node features (provided together with the adjacent matrix), and our correlation matrixes. Unlikely, at the moment Spektral doesn’t support Window so I have to extract manually the class of my interest and create my Python executable.

Our network looks like below:

def get_model():    
opt = Adam(lr=0.001)

inp_seq = Input((sequence_length, 10))
inp_lap = Input((10, 10))
inp_feat = Input((10, X_train_feat.shape[-1]))

x = GraphConv(32, activation='relu')([inp_feat, inp_lap])
x = GraphConv(16, activation='relu')([x, inp_lap])
x = Flatten()(x)
xx = LSTM(128, activation='relu',return_sequences=True)(inp_seq)
xx = LSTM(32, activation='relu')(xx)
x = Concatenate()([x,xx])
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.3)(x)
out = Dense(1)(x)
model = Model([inp_seq, inp_lap, inp_feat], out)
model.compile(optimizer=opt, loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()]) return model

As introduced before, the data are processed as always like when developing a recurrent network. The sequences are a collection of sales, for a fixed temporal period, in all stores for the item taken into consideration.

The further step in our case is to calculate, on the same pieces of sequences, also the correlation matrix of sales between stores which represents our adjacent matrix. Together with them are provided some hand made features (like mean, standard deviation, skewness, kurtosis, regression coefficient), calculated by us on stores for each sequence, which stands for our node features in the network.

Given a sample covariance or correlation matrix, we can estimate an adjacency matrix applying a Laplacian normalization which enables the usage of an efficient layer-wise propagation rule, based on the first-order approximation of spectral convolutions.

The train is computed with the first two years of data while the remaining two are respectively used for validation and testing. I trained a model for each store so we ended with a total of 10 different neural networks.

The predictions of stores are retrieved at the end of the training procedure by the relative models. The errors are calculated as RMSE on test data and reported below.

In the same way, it’s easy to extract the predictions for items in desired stores directing manipulating our nested data structure.

Predictions on test data

SUMMARY

In this post, I’ve adopted graph neural networks in an uncommon scenario like time series forecasting. In our deep learning model, graph dependency combines itself with the recurrent part trying to provide more accurate forecasts. This approach seems to suits well to our problem because we could underline a basic hierarchical structure in our data, which we numerical encoded with correlation matrixes.

--

--

Pranav T P

I'm a Pranav T P. pursuing my Master (Mtech) at PES University, Banglore in a stream of Cloud Computing