This website uses cookies that are necessary to deliver an enjoyable experience and ensure
its correct functionality and cannot be turned off. Optional cookies are used to improve the page with analytics, by
clicking “Yes, I accept” you consent to this use of cookies. Learn more
Applying AI capabilities to address Operations challenges in ECMWF Products Team
The ECMWF Product’s Team acts as a data vendor to several clients, providing massive amounts of meteorological data to them. These services produce log files that contain useful patterns and information that can help improve the reliability of ECMWF’s services.
We aim to produce Machine Learning and Deep Learning based systems capable of monitoring these services for sudden disruptions or failures. We also propose methods to forecast these variables, so that we can predict future spikes and surges.
Through this, we hope to provide valuable insights into how ECMWF can improve it’s data services in the near future.
HPC Performance Profiling Tool
The continuous integration cycle of the IFS model is able to provide a regular stream of performance data, such as component runtimes, I/O and parallelization overheads.
In this project, we are aiming to develop a tool for interactive visualization of HPC performance data to better track and analyze IFS performance based on performance monitoring metrics built into the IFS.
Conversational Virtual Assistant for users of ECMWF online products and services
The goal of our challenge is to create a chatbot with which external users can have conversions to get their questions answered without the need to make use of other, existing support channels.
To achieve this, we will build up a modern processing pipeline which retrieves content from ECMWF's helpdesk and support-related pages, apply natural language understanding algorithms to build up a semantic knowledge graph and use this knowledge graph to train the Dialogflow-based chatbot.
Users who will make use of our chatbot will hopefully find answers faster than before, and ECMWF's support team gets more time to focus on critical support cases.
Compressing atmospheric data into its real information
There is a lot of artificial precision in the current CAMS data encoding setup, data takes a long time to archive and download.
We plan to use the CAMS global real-time forecast dataset to test different configurations and estimate data encoding errors.
The work on this project could help us to reduce both volume of data we store in our archive and the amount we disseminate to the users while preserving useful information.
The current Global ECMWF Fire Forecasting (GEFF) system is based on empirical models implemented in FORTRAN.
The project intends to explore whether fire danger forecasting using Deep Learning can achieve skills comparable to the operational GEFF system and whether artificial intelligence can reveal important relationships between fire danger and event occurrence through the inclusion of additional variables, optimisation of model architecture & hyperparameters.
Finally, conditional to the suitability of the available data, a preliminary fire spread prediction tool will be developed to support first responders and monitoring activities.
Validating and removing errors outliers from surface air quality observations from individual sensors so that these observation can be compared to ECMWF's CAMS air quality forecasts.
By clustering analysis on these observations more reliable observations can be identified. Enhancing these observations by attaching data about factors that affect air quality these observations can have more credibility about their accuracy.
CAMS lacks credible surface air quality observations in many parts of the world, often in the most polluted area such as in India or Africa. Some observations are available for these areas from data harvesting efforts such as openAQ but there is no quality control applied to the data, and it is often not well known if the observations are made in a rural, urban or heavily polluted local environment.
This information on the environment is important because the very locally influenced measurements are mostly not representative for the horizontal scale (40 km) of the CAMS forecasts and should therefore not be used for the evaluation of the CAMS model.
Exploring or machine/deep learning techniques to detect and track tropical cyclones
Cyclones are the complex events characterized by strong winds surrounding a low-pressure area.
Intensity classification of cyclones is traditionally performed using Dvorak technique focusing on statistical relationships between different environmental parameters and the intensity.
This project aims to create an algorithm based on deep learning to recognize and classify tropical cyclones based on their intensities. We'll utilize - a) Satellite imaging data b) BestTrack database information of tropical cyclones for the task.
The model will be developed for static (per satellite image) detection and classification and later extended to perform dynamic (continuous real-time) detection and classification while maintaining robustness.
UNSEEN-Open
An open, reproducible and transferable workflow to assess and anticipate climate extremes beyond the observed record.
Bias correction of CAMS model forecasts for air quality variables by using in-situ observations. The bias correction algorithm will mainly be based on machine-learning / deep-learning techniques.
MaLePoM (Machine Learning for Pollution Monitoring)
The project aims to build a Machine Learning model to estimate emissions using suitable proxy data due to anthropogenic activities. Initially, we will model the concentrations of NOx in Europe. Therefore, proxy data should frame these activities exploiting databases such as Land cover maps, Dynamic traffic data, lighthing data and others.
Subsequently, different approach will be tested in order to capture both spatial and temporal variability at high resolution and eventually allow accurate emissions estimates at global scales.
Elefridge.jl: Compressing atmospheric data into its real information content
Weather and climate forecasting centres worldwide produce very large amounts of data that has to be stored and shared with users. Data compression is essential to reduce file sizes sent over the internet and the demand on data archive capacity.
The previously completed challenge within the ESoWC 2020 developed the concept of information-preserving compression by analysing the real information content in data from the Copernicus Atmospheric Monitoring Service (CAMS). Separating the false and hardly compressible information from the real information was shown to allow for high compression factors without significant information loss.
Here, we focus on further details in the implementation of information-preserving compression for CAMS. Readily available in the current GRIB2 compression are different precision and accuracy options that can be translated to preserved information for a given data set. To implement this successfully and in an automated fashion, further improvements are necessary and the best lossless compressor available in GRIB2 that satisfies both speed and size requirements has to be found.
This project aims to successfully implement information-preserving compression for CAMS to put this advanced compression technique into practice.
Project Meeresvogel seeks to make it easier to incorporate weather visualisations into multimedia presentations. We will design and develop a Python module which enables users to create interactive Google Earth presentations which are enhanced with weather data and visualisations from MetView.
Using this module, we aim to create three examples to demonstrate how this could be useful to diverse audiences wanting to explore various aspects of the 2020/21 Vendée Globe Race, a sporting event in which 33 skippers set out to race their 60 foot yachts solo non-stop around the world.
We will explore ways in which weather visualisations can provide insights for the public following the race, the race teams wanting to analyse performance data, and scientists analysing the oceanmet observations which were collected by a number of the boats during the race.
Follow the developments on GitHub
BlenderNC Enhancements
Improving Forecast and Reanalysis Data visualisation support in Blender for ECMWF products.
CliMetLab - Machine Learning on weather and climate data
CliMetLab is a Python package aiming at simplifying access to climate and meteorological datasets, allowing users to focus on science instead of technical issues such as data access and data formats.
This project aims at handling the data loading as well as interpreting the output from the machine learning models with the use of plots, graphs, etc. This will remove the overhead of manual data retrieval, writing specific data loaders per dataset.
The plugin architecture in CliMetLab aims at easy addition of data sources, datasets, plotting styles and data formats.
Specific goals of the project:
1) extend CliMetLab so that it offers the user with high-level Matplotlib-based plotting functions to produce graphs and plot which are relevant to weather and climate applications.
2) Python package Intake is a lightweight set of tools for loading and sharing data in data science projects. Extend CliMetLab so that it seamlessly interfaces with Intake and allows users to access all intake-compatible datasets.
3) Xarray uses the data format Zarr to allow parallel read and parallel write. Convert large already available datasets to xarray-readable zarr format, define appropriate configuration (chunking/compression/other) according to domain use cases, develop tools to benchmark when used on a cloud-platform, compare to other formats (N5, GRIB, netCDF, geoTIFF, etc.).
ML4Land: Using Earth's observation data, Climate reanalysis
& Machine Learning to detect Earth’s heating patterns
Skin temperature has been pivotal in identifying the heating and land-use patterns of Earth. The project aims to learn a mapping from model simulations (using ERA5) to satellite observations of skin temperature. Various works have shown how Machine Learning based models can efficiently recognize and learn useful patterns from complex datasets. We thus aim to use Machine Learning algorithms to learn the mapping between ERA5 variables and satellite observations of maximal skin temperature. These solutions will provide predictions at higher resolutions and offer valuable insights into the relationships between skin temperature and various ERA5 variables.
At its core, ECMFW is a data organization that produces and distributes essential weather data to its member states and outside businesses. They also provide various other services such as global forecasting, supercomputing facilities, environmental services, meteorological services.
Many users around the world use these services. In this project, I aim to improve the user experience with ECMWF by providing individual users with their own dashboard showcasing valuable data, favourite charts, and a high-level overview of their relationship with ECMWF and its services.
Nowadays it is possible to obtain atmospheric composition datasets for the same locations from different sources. However, most datasets are not easily comparable due to their file formats and structure.
In this regard, Atmospheric Datasets Comparison (ADC) Toolbox is aimed to have a set of tools that allows unit conversion, side-by-side visual comparison, regridding, time and geographic data aggregation and statistics visualization to show how similar the datasets are among them.
The toolbox will consist of different scripts written in Jupyter Notebooks with the tools:
- Transformation: The datasets to be compared will be transformed into a common format.
- Merge: The files will be regridded and, if needed, its units will be converted, to combine them.
- Comparison: Statistics methods will be used to show information about the datasets.
- Visualization: The merge output will be seen side-by-side in tables and maps.
- File format change: The files in a common format will be given in any desired file format.
Users often wrongly convert latitude and longitude coordinates which leads to the selection of a wrong area. This is highly inefficient for ECMWF as a data provider, as the same data request has to be processed multiple times.
The challenge was aimed at developing a widget that selects and displays areas on a map. Such a widget will be useful for many web applications across ECMWF. The widget is based on Leaflet (a javascript library for interactive maps) and provides different tools, e.g. drawing and searching. The widget also offers a grid point system resembling ECMWF’s model grid points.
ECMWF’s visualisations are developed with weather forecasters in mind. This challenge focused on developing new visualisations that help to communicate weather information to non-experts. The goal was to develop an innovative visualisation to present ensemble weather forecasts.
A new design of a meteogram (a graphical presentation of multiple weather variables for a particular location) has been developed with value-suppressing uncertainty (VSU) icons. They allocate a larger range of icons when the uncertainty is low and a smaller range when the uncertainty is high (see figure below). This helps to make ensemble forecasts more accessible and to increase the overall trust in 15-day forecasts.
Migration of calibration software to Python and development of its GUI
ecPoint-Cal is software that compares numerical model outputs against point observations to identify biases/errors at local scale. The software had two different processes and both steps were written in two different programming languages, which cannot be easily integrated. The challenge proposed migrating the existing code into Python and developing a user-friendly GUI for the software.
The new software, ecPoint-PyCal, provides a dynamic environment in Python and could ultimately be used to help steer model developments and to post-process ECMWF model parameters to produce probabilistic products for geographical locations.
Vast quantities of ECMWF data are stored in Network Common Data Format (NetCDF) and often there is a need to quickly create or adapt existing NetCDF datasets, for instance when prototyping a new data processing application. Tasks such as modifying the name or value of a NetCDF attribute or deleting unnecessary variables or attributes, typically require specialised NetCDF tools and libraries.
This challenge aimed to develop a tool to represent the hierarchical structure of a NetCDF dataset as a virtual file system. The new software is written in Python and allows users to easily mount, view, and edit the contents of a NetCDF dataset using file-system operations and general purpose Unix tools. The software is potentially useful for anyone working with weather and climate data in NetCDF format wishing to quickly explore and edit a dataset.
Globally, new sources of raw data are being made available via the web all the time. However, ECMWF often isn’t aware of these. Manually identifying and gathering information on these new sources is both time consuming and error prone.
This challenge was aimed at developing a tool to search the web systematically, identifying data sources for observed environmental data. The software automates the discovery, analysis and assessment of the candidate web pages in order to find new datasets. The resulting data can be used to improve global predictive weather forecasting models.
This project is about using CrowdWater data and to translate this data into something that can be used in flood forecasting models. CrowdWater is an interesting initiative in which people send geo-referenced pictures of streams or rivers, along with the corresponding variations of water level.
The difficulty of this project lies in the variability of the CW data and in the difference of scale between the CW data and the GloFAS (or EFAS) data. We have indeed a very coarse representation of rivers in GloFAS, while in CW data we have more information about the smaller rivers.
The project aims to utilize CW data to improve model forecast.
Estimating and correcting the biases of climate models, as well as assessing associated uncertainties, are crucial for many climate impact-studies and other applications. In our project we aim to develop a software package – building upon the ISIMIP3b-code – that allows users to apply bias correction, using different methods, in a variety of situations.
We aim to:
1) develop an easy to use, flexible software package that users can employ in different computing environments,
2) extend the ISIMIP approach with several small improvements,
3) implement a systematic evaluation framework for bias correction to support the estimation of uncertainty.
The project's goal is to provide a web-based graphical user interface (GUI) to make the cache content and configuration settings of the CliMetLab Python package easier.
Currently, CliMetLab’s settings and cache are configured via the terminal, which is cumbersome to use and requires experience with shell commands.
This project will enable a wide audience to fully utilise CliMetLab's features by providing a GUI built using modern web frameworks such as ReactJS and Flask.
The Wildfire Explorer will allow users to create plots of wildfire emission and activity data on-demand.
This application will consist of a GUI where the user can select the geographical domain of interest, the date period of a specific event, a reference period for comparison (optional), the variable considered and the plot type.
The processing of the data will be automated and optimised using a PostGIS database. The GUI will be built from a Jupyter notebook with interactive widgets and the data will be processed from the CAMS Global Fire Assimilation System (GFAS).
ECMWF as an organization provides a variety of applications but there is no central dashboard to get a global overview of the information from the user’s favorite app.
The aim of this project is to move forward the existing user dashboard prototype closer to operations by building on the already existing functionalities.
The current state of the project offers a central dashboard to add different widgets to it.
The plan is to move forward with integration from the individual apps using a simple discoverable widget-api comparable to GetCapabilities for OGC Web Services, to add the widgets from the individual applications as well.
Magics is ECMWF's meteorological plotting software that supports plotting contours, wind fields, observations, satellite images, symbols, text, axis and graphs.
The project aims to utilize the power, flexibility and extensibility of the python library matplotlib (https://github.com/matplotlib/matplotlib) to improve the drivers for ECMWF's magics-python library (https://github.com/ecmwf/magics-python).
These improvements would allow the users to create more customizable and interactive plots. This project also aims to continue the development of ECMWF's magpye (https://github.com/ecmwf/magpye), which provides a more pythonic and user-friendly API to magics.
The final aim is to create resources such as tutorials and documentation for magics and magpye.