Oren Gilon
Authored Publications
Sort By
In Defense of Metrics: Metrics Sufficiently Encode Typical Human Preferences Regarding Hydrological Model Performance
Martin Gauch
Frederik Kratzert
Hoshin Gupta
Juliane Mai
Bryan A. Tolson
Sepp Hochreiter
Daniel Klotz
Water Resources Research, 59, e2022WR033918 (2023)
Preview abstract
Building accurate rainfall–runoff models is an integral part of hydrological science and practice. The variety of modeling goals and applications have led to a large suite of evaluation metrics for these models. Yet, hydrologists still put considerable trust into visual judgment, although it is unclear whether such judgment agrees or disagrees with existing quantitative metrics. In this study, we tasked 622 experts to compare and judge more than 14,000 pairs of hydrographs from 13 different models. Our results show that expert opinion broadly agrees with quantitative metrics and results in a clear preference for a Machine Learning model over traditional hydrological models. The expert opinions are, however, subject to significant amounts of inconsistency. Nevertheless, where experts agree, we can predict their opinion purely from quantitative metrics, which indicates that the metrics sufficiently encode human preferences in a small set of numbers. While there remains room for improvement of quantitative metrics, we suggest that the hydrologic community should reinforce their benchmarking efforts and put more trust in these metrics.
View details
Caravan - A global community dataset for large-sample hydrology
Frederik Kratzert
Nans Addor
Tyler Erickson
Martin Gauch
Lukas Gudmundsson
Daniel Klotz
Sella Nevo
Guy Shalev
Scientific Data, 10 (2023), pp. 61
Preview abstract
High-quality datasets are essential to support hydrological science and modeling. Several CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) datasets exist for specific countries or regions, however these datasets lack standardization, which makes global studies difficult. This paper introduces a dataset called Caravan (a series of CAMELS) that standardizes and aggregates seven existing large-sample hydrology datasets. Caravan includes meteorological forcing data, streamflow data, and static catchment attributes (e.g., geophysical, sociological, climatological) for 6830 catchments. Most importantly, Caravan is both a dataset and open-source software that allows members of the hydrology community to extend the dataset to new locations by extracting forcing data and catchment attributes in the cloud. Our vision is for Caravan to democratize the creation and use of globally-standardized large-sample hydrology datasets. Caravan is a truly global open-source community resource.
View details
AI Increases Global Access to Reliable Flood Forecasts
Asher Metzger
Dana Weitzner
Frederik Kratzert
Guy Shalev
Martin Gauch
Sella Nevo
Shlomo Shenzis
Tadele Yednkachw Tekalign
Vusumuzi Dube
arXiv (2023)
Preview abstract
Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.
View details
Deep learning rainfall–runoff predictions of extreme events
Jonathan Frame
Frederik Kratzert
Daniel Klotz
Martin Gauch
Guy Shalev
Logan M. Qualls
Hoshin Gupta
Hydrology and Earth System Science (2022)
Preview abstract
The most accurate rainfall–runoff predictions are currently based on deep learning. There is a concern among hydrologists that the predictive accuracy of data-driven models based on deep learning may not be reliable in extrapolation or for predicting extreme events. This study tests that hypothesis using long short-term memory (LSTM) networks and an LSTM variant that is architecturally constrained to conserve mass. The LSTM network (and the mass-conserving LSTM variant) remained relatively accurate in predicting extreme (high-return-period) events compared with both a conceptual model (the Sacramento Model) and a process-based model (the US National Water Model), even when extreme events were not included in the training period. Adding mass balance constraints to the data-driven model (LSTM) reduced model skill during extreme events.
View details
Flood forecasting with machine learning models in an operational framework
Asher Metzger
Chen Barshai
Dana Weitzner
Frederik Kratzert
Gregory Begelman
Guy Shalev
Hila Noga
Moriah Royz
Niv Giladi
Ronnie Maor
Sella Nevo
Yotam Gigi
Zvika Ben-Haim
HESS (2022)
Preview abstract
Google’s operational flood forecasting system was developed to provide accurate real-time flood warnings to agencies and the public, with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecasting, inundation modeling, and alert distribution. Machine learning is used for two of the subsystems. Stage forecasting is modeled with the Long Short-Term Memory (LSTM) networks and the Linear models. Flood inundation is computed with the Thresholding and the Manifold models, where the former computes inundation extent and the latter computes both inundation extent and depth. The Manifold model, presented here for the first time, provides a machine-learning alternative to hydraulic modeling of flood inundation. When evaluated on historical data, all models achieve sufficiently high-performance metrics for operational use. The LSTM showed higher skills than the Linear model, while the Thresholding and Manifold models achieved similar performance metrics for modeling inundation extent. During the 2021 monsoon season, the flood warning system was operational in India and Bangladesh, covering flood-prone regions around rivers with a total area of 287,000 km2, home to more than 350M people. More than 100M flood alerts were sent to affected populations, to relevant authorities, and to emergency organizations. Current and future work on the system includes extending coverage to additional flood-prone locations, as well as improving modeling capabilities and accuracy.
View details
Global Flood Forecasting at a Fine Catchment Resolution using Machine Learning
Asher Metzger
Dana Weitzner
Frederik Kratzert
Guy Shalev
Sella Nevo
Shlomo Shenzis
Tadele Yednkachw Tekalign
(2022)
Preview abstract
Machine learning has been shown to be a promising tool for hydrological modeling. We have used this technology to develop an operational real-time global streamflow prediction model. The model architecture is based primarily on an LSTM (Long Short Term Memory), which is a form of RNN (Recurrent Neural Network) that includes a state vector similar to dynamical systems models.
Our model has been shown to outperform physical and conceptual hydrologic models across time and spatial scales. The main advantage of this ML approach is that models can be trained (calibrated) over many diverse catchments simultaneously rather than being calibrated separately per catchment. This advantage is especially important when modeling on a global scale where the model is trained on a very large number of catchments that have diverse climatology and geographical settings. Consequently, the model learns different rainfall-runoff dynamics of rivers across these settings and is able to predict accordingly. Once the model is trained (a very short process in comparison to calibrating traditional global models), it can be applied almost anywhere where basin attributes are available, in particular, at ungauged locations.
We use globally available, near-real time datasets for training and inference, which allows running the model operationally.
Global datasets used:
HydroSHEDS database for global catchments delineation and static attributes.
Meteorological forcing data from:
ECMWF weather data, including the ERA5-Land reanalysis and the IFS HRES real-time forecasts and re-forecasts.
NOAA’s IMERG (early) global precipitation estimates.
CPC Global Unified Gauge-Based Analysis of Daily Precipitation.
Stream flow global datasets such as GRDC and Caravan for streamflow discharge labels.
View details
Customization Scenarios for De-identification of Clinical Notes
Danny Vainstein
Gavin Edward Bee
Jack Po
Jutta Williams
Kat Chou
Ronit Yael Slyper
Rony Amira
Shlomo Hoory
Tzvika Hartman
BMC Medical Informatics and Decision Making (2020)
Preview abstract
Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets.
Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized.
Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset.
Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems.
Conclusion: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.
View details
ML-based Flood Forecasting: Advances in Scale, Accuracy and Reach
Sella Nevo
Guy Shalev
NeurIPS HADR Workshop (2020)
Preview abstract
Floods are among the most common and deadly natural disasters in the world, and flood warning systems have been shown to be effective in reducing harm. Yet the majority of the world's vulnerable population does not have access to reliable and actionable warning systems, due to core challenges in scalability, computational costs, and data availability. In this paper we present two components of flood forecasting systems which were developed over the past year, providing access to these critical systems to 75 million people who didn't have this access before.
View details
Preview abstract
We study a variant of domain adaptation for named-entity recognition where multiple, heterogeneously tagged training sets are available. Furthermore, the test tag-set is not identical to any individual training tag-set. Yet, the relations between all tags are provided in a tag hierarchy, covering the test tags as a combination of training tags. This setting occurs when various datasets are created using different annotation schemes. This is also the case of extending a tag-set with a new tag by annotating only the new tag in a new dataset. We propose to use the given tag hierarchy to jointly learn a neural network that shares its tagging layer among all tag-sets. We compare this model to combining independent models and to a model based on the multitasking approach. Our experiments show the benefit of the tag-hierarchy model, especially when facing non-trivial consolidation of tag-sets.
View details