Short-term precipitation forecasting using Convolutional LSTM neural networks

Petros Demetrakopoulos
8 min readDec 19, 2022

--

An image from the composite signal of the 2 weather radars operating in The Netherlands by KNMI, Image generated by author from radar data of KNMI

Introduction and intuition

During the last 4 months, I have moved to Eindhoven, a small city located in The Netherlands, to pursue a Master’s degree in Data Science and Artificial Intelligence. But you know what seems very different in The Netherlands for a group of Mediterranean friends and fellow students ? The weather, and especially the frequency of rain and the low temperatures (below 0 Celsius for many days or even weeks in a row). So when the Winter season kicked-in, all of my friends and fellow students coming from countries of the Mediterranean were awaiting to snow every day, sending screenshots from their weather apps every morning in student group chats that only forecasted snow with a probability of 40% or 50%. Some friends were still arguing, that a snowstorm was close even when sun was apparent and bright the whole day and they were claiming so, just because the weather app was forecasting snow with a low probability for the next hour.

This is when I seriously started thinking about responding to them as a Data Scientist would do. With raw data and the conclusions deriving from data analysis. This is when I started looking into the open datasets and public APIs provided by the Royal Netherlands Meteorological Institute (also known as KNMI).

Data gathering

While browsing through datasets provided by KNMI to see how I could probably utilize them, I came across a dataset containing the composite reflectivity (counted at various angles) captured by 2 weather radars in The Netherlands, the first being located in Herwijnen and the second one in Den Helder. Trying to avoid being too technical regarding how weather radars work, let’s assume that weather radar generates a signal which is being reflected by precipitation (rain, snow, hail etc) when it bounce on it. The intensity of the reflected signal which is then captured by the radar is called reflectivity (counted in dBZ) and we could roughly claim that it is somewhat proportionate to the intensity of precipitation at that point. This reflectivity data, when converted into an image by mapping a color scale according to signal intensity, (by default the color scale provided by KNMI is ‘viridis’ with purple/dark blue for the lower values and yellow for the higher values) produces an image like the one presented at the beginning of the article. These data, is then provided in a raw format via an API every 5 minutes. Unfortunately, the API has a quota that allows to fetch only 100 images per hour, which made it really difficult to gather a significant amount of data.

The problem

This is when I remembered reading somewhere about combining convolutional and LSTM layers in order to predict the next frame in a video. Then I started thinking about if I could reduce the problem of predicting the next signal capture of the weather radar to the problem of predicting the next frame in a video, which do not forget it simply is a sequence of images, almost as the weather radar data. So I gathered some sequences of images and started experimenting with various architectures of Convolutional LSTM neural networks. Each training data point I used consists of 36 consecutive radar raw files (which correspond to the measurements of 3 hours with an interval of 5 minutes). Each data point was then split in to 2 parts. The first 18 frames were used as the “features” (x) and the last 18 frames were what the neural network tries to predict (y) given the first 18 frames. Or, in terms of weather forecasting, what the precipitation will look like during the next 1.5 hour (frames are coming at an interval of 5 minutes, so 18 frames correspond to 1.5 hour), given the precipitation data of the past 1.5 hour.

Why Convolutional LSTM

If you are a bit familiar with neural networks and deep learning, you probably know that Convolutional Neural Networks (CNNs) perform very well on tasks that involve the analysis or discovery of specific features and shapes in images. On the other hand, Long short-term memory (LSTM) neural networks perform very well on tasks that involve the dimension of time (like Time Series prediction) and sequences of data (such as sequences of images, sequences of signals within specific time frames etc). This is mainly because they have the ability to learn long-term dependencies in data. So, it seems that this group of researchers proposed for first time in 2015 an architecture combining both Convolutional and LSTM layers in order to predict the next image in a sequence of images (and one of the applications they benchmarked it on, was precipitation forecasting). Unfortunately, I came across this paper after I had already invested a lot of time in finding the best architecture, number of layer, hyperparameters for each layer etc, but anyway, it really helped me a lot in understanding the theoretical background.

Data Preprocessing

After downloading almost 160 sequences of 36 consecutive radar scans each in raw form (.hf5 files), I used the h5py library which reads and easily handles spatiotemporal raw data (likes the ones received from KNMI) to preprocess them. The data points were picked from random days and times between 01–01–2019 and now. As the original dimensions of the images produced were too large to train the model even in a paid plan of a GPU-as-a-service provider (due to memory limitation issues), I down-scaled the images received from the radar from their original dimensions (700x765) to (315x344). This was the highest possible resolution at which the model could be trained at reasonable time and without facing any memory issues during the process. Then, I split each sequence in to 2 equal parts. The first 18 frames were used as the “features” (x) and the last 18 frames were the frames the neural network tries to predict (y) (given the first 18 frames). Finally, I split the dataset to 2 separate datasets for training (80%) and validation (20%).

The code performing all the tasks mentioned above is shown in the snippet below:

Data preprocessing

The Model

I implemented the model using Tensorflow and Keras frameworks.

The model is basically an autoencoder. Autoencoders are neural networks that try to reduce the dimensionality of the data they are trained on. In this way, they approximate the distribution from which the data are coming from. Then, we can sample from the approximation of this distribution in order to generate “new” data.

In terms of architecture, the neural network looks like this:

Architecture of the neural network model. Image by author

The model consists of 9 layers in total (Input, Output and 7 hidden layers). The hidden layers are interchanging between ConvLSTM2D layers and BatchNormalization layers. ConvLSTM2D layers act like simple LSTM layers, however their input and recurrent transformations are convolutional. This practically means, that ConvLSTM2D layer perform convolutional operations over time while retaining the dimensions of the input. You could think of it acting as a simple Convolution layer which output is then Flattened and passed as an input on a simple LSTM layer. Explaining more exhaustively how LSTM layers work is out of scope of this article, however this Medium article is a very nice start. What you should remember from how ConvLSTM2D layers work is that they receive as an input a tensor of the form (samples, time, channels, rows, cols) (note that the position of channels parameter may be in a different position if we follow the “channel last” convention) and they output tensors of the form (samples, timesteps, filters, new_rows, new_cols). So they perform operations on series of frames over time.

BatchNormalization layers between the ConvLSTM2D layers have just a technical role. They ensure that the mean of the output is close to 1 and the standard deviation is close to 0.

For all the layers (except the output) I used the LeakyRelu activation function as recent research shows that it seems to perform better in general for problems that contain sparse gradients and are one of the ways to avoid the dying ReLu problem.

The model is fitted with the Binary Cross Entropy loss function and using the Adadelta gradient descent optimizer. In general, due to the high dimensionality of the data, Adadelta showed much better results than the classic Adam optimizer. Model was trained for 25 epochs (after that, it started overfitting).

The code of the model is shown on the snippet below:

The prediction model

Results

After training the model, I tested it with some example data points from the validation dataset (which does not contain data points that contributed to the training). The input of the model is 18 consecutive frames (corresponding to almost 1.5 hours of signals captured by the radar) and it returns the 18 next predicted frames (corresponding to the next 1.5 hours).

Code for prediction of new frames

Then I compared the predicted frames with the ground truth, the true 18 next frames coming after the first 18 frames given as an input. The results are visualized below.

Ground Truth and Predicted Frames. Image by author

As you can see, ground truth frames are pretty close to the predicted frames. However, this visualization is not clearly presenting the dimension of time and direction towards which precipitation systems are moving, so the 2 GIF animations below try to better explain the output of the model.

Ground truth. The actual sequence of the 18 frames coming after the 18 frames given as input to the model. Image by author
Predicted frames. The sequence of 18 frames that the model predicted. Image by author

As you can see, the two animations are also pretty close and the model has captured correctly both the direction towards the precipitation system is moving to and its intensity (yellow is more intense, purple/dark blue less intense).

Conclusion

ConvLSTMs appear to be a very powerful tool in the hands of Machine Learning engineers and Data Scientists. The ability to combine 2 core concepts of deep learning and creating something new out of them will give immense possibilities in the future. Regarding the model explained in the article, the predictions provided may do seem a bit noisy but this is most probably due to the fact of the limited amount of training data. Apart from that, a bit of fine tuning is probably still needed in order to further improve its precision.

The code of the full project as well as a trained Jupyter notebook can be found in the GitHub repository in the link below.

References

[1] KNMI Data platform, KNMI, https://dataplatform.knmi.nl/

[2] Precipitation — radar 5 minute reflectivity composites over the Netherlands, KNMI, https://dataplatform.knmi.nl/dataset/radar-reflectivity-composites-2-0

[3] Understanding Weather Radar, wuinderground.com, https://www.wunderground.com/prepare/understanding-radar

[4] Amogh Joshi,Next-Frame Video Prediction with Convolutional LSTMs (2021), https://keras.io/examples/vision/conv_lstm/

[5] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo, Convolutional LSTM Network: A Machine Learning
Approach for Precipitation Nowcasting (2015), Arxiv

[6] HDF5 for Python, https://www.h5py.org/

[7] ConvLSTM2D layer, Keras API Reference, https://keras.io/api/layers/recurrent_layers/conv_lstm2d/

[8] Saqib Azhar, What is the dying ReLU problem?, Educative.io, https://www.educative.io/answers/what-is-the-dying-relu-problem

--

--

Petros Demetrakopoulos

💻Code-blooded, 🌏 Traveler, . Lifelong learner 📚. Currently studying Data Science and AI at TU/e, Eindhoven, NL. https://petrosdemetrakopoulos.github.io