The last decade has seen a meteoric rise in applications of artificial intelligence (AI), and more specifically machine learning (ML) in our day-to-day lives, from our phones’ speech-to-text to the software we use to manage critical infrastructure.

This proliferation of ML is in part largely due to organizations’ storage and curation of massive amounts of data, ranging from instrumentation of marketing campaigns to sensors on a generator measuring vibrations. These large datasets are the necessary precursors to the application of ML.

My team at Upstream Tech relies heavily on ML in our work, particularly with HydroForecast. We’re able to do this because of the existence of aforementioned “large datasets.” In our case, these are archives of meteorological forecasts, satellite imagery of basins, and in situ government and customer gauge records. These data combine with hydrological theory and ML to result in the most accurate operational forecasts of their kind. You must be wondering – “most accurate?” – a bold claim!

Recently, the HydroForecast team competed in a year-long streamflow forecasting competition hosted by The Centre for Energy Advancement through Technological Innovation’s Hydropower Operations and Planning Interest Group. One goal of this competition was to determine whether AI models could beat existing approaches to forecasting streamflow.

To jump to the chase, the competition wrapped and the conclusion is: yes, AI is a powerful tool when it comes to forecasting the amount of water that will flow through a river or stream. HydroForecast won 23 of 25 categories across all of the forecasting regions, validating to have such a decisive result validated by RTI International.

After some reflection on the competition, I realized that there is a more interesting takeaway: other participants also used machine learning and, if you’ll excuse the double-negative, they didn’t just not win but in most cases performed worse than the traditional approaches to forecasting streamflow. Woah! 

Let’s dig deeper into this competition and forecasting streamflow more broadly to better understand which problems AI is a good fit for, when it’s not a great choice, and when it is how to apply it most effectively.

Visiting the Big Thompson at Estes Park USGS gauge location, one of USBR’s locations in the competition

AI or Nay-I

Powerful, “new” technologies are sometimes billed as silver bullets and panaceas; “they’ll solve your hardest problems and increase the efficiency of your strongest teams by 51%!” 

When first assessing a problem, I adhere to an adage my engineering degree drilled into me: “keep it simple, stupid,” a design principle originally noted by the U.S. Navy in the 60s that, in my interpretation, asks “is there a creative, simple solution to this problem, even if it sacrifices some performance or accuracy?” 

In the case of “nascent” technologies like AI, there is often a more discernible solution to a problem. Solutions devised from this viewpoint can often be more easily explained and maintained than their more complex counterparts.

In the case of streamflow forecasting, organizations can and have gotten mileage out of simple regression or long-term averages. Depending on the problem being solved with the forecasts, if the tolerance for error is high enough, these simpler approaches are easy to interpret and maintain.

However, if reducing error means safer, more efficient operations and more prescient planning, as it often does in the case of streamflow forecasting, sometimes simple solutions aren’t enough. It’s only in this case, once we’ve exhausted the candidacy of simple, creative solutions, that we’ll reach into our toolbox for machine learning.


The challenge of forecasting streamflow is a perfect fit for machine learning. There is a wide range of possible inputs, a massive record of these inputs (big data at its finest!), and a set of complex, interconnected relationships between these inputs and the predicted output. And best of all, there’s a wealth of established science and lessons from past solutions to draw from and build upon.

At first glance this is a perfect fit for ML, and an easy win. However, it is important to keep in mind that not all solutions that employ machine learning are created equally. I’ll show that for a complex problem such as this, it’s possible to create a machine learning model that will have worse performance than simple solutions and conceptual models.

When AI falls short

There are existing, complex solutions to forecasting streamflow. Conceptual physical modeling, meaning creating calibrated equations that attempt to capture the relationships between variables (e.g. precipitation) and streamflow, have been applied to forecast flows at a variety of horizons for decades. 

Conceptual models range in performance from woefully inadequate to pretty good. Their complexity is inherent to the problem itself: streamflow depends on so many different factors: the past winter’s snow, recent precipitation and temperature, upcoming precipitation and temperature, elevation and soil types across the basin, groundwater interaction, distribution of land classification, and many (many!) other factors. 

When the prior art, established science, and interdisciplinary nature of a problem are not sufficiently integrated into how machine learning is applied, the solution will either immediately underperform or – eventually – fail catastrophically. What this means in practice is that a model is provided with a limited or partial set of inputs that influence the forecast, is trained on a historical period that causes “overfitting,” or is otherwise structured in a way that violates the nature of the problem (for streamflow forecasting, the laws of physics).

In the case of forecasting streamflow, a hypothetical solution might take precipitation and temperature as inputs, some in situ flow measurements as observations, and an off-the-shelf machine learning model to tie it all together. Training and evaluation looks pretty good! The solution-builder presents the result and moves forward to place it into a decision-making workflow. 

But hold up, we’re savvy about hydrology. We know that there is more than precipitation and temperature that drive streamflow. Can you think of when and why this approach will fail, catastrophically?

The first failure would be acute: the first big storm (more water) or intense drought (less water) that occurs outside of the parameters of the data used to train the model would produce nonsense or incorrect predictions. And depending on how that model is used, that could have critical consequences like forcing an operational team to scramble due to a missed forecast.

The second failure would be gradual, but no less severe. Long-term nonstationarity (i.e. the gradual drift of the relationship between input and output) – in our case climate and landscape change – would cause the model to slowly deviate. This discrepancy might be almost imperceptible at first, but would become more exacerbated over time and no less consequential than the acute failure, especially as extreme weather events are projected to increase.

In each of the competition’s geographic regions, the best performance came from HydroForecast’s theory-guided machine learning approach. On average, machine learning models which were not theory-guided (“Statistical” above) underperformed conceptual models

A better AI: theory-guided machine learning

Thankfully, there is a better way to employ ML to predict natural systems: an approach that respects prior solutions and structurally integrates their wisdom. And I posit that this approach is key to the application of ML and AI to any complex problem. 

Our friend Curt Jawdy of Tennessee Valley Authority echoed this in his excellent reflections on the competition, asking “how will AI and conceptual models hybridize to provide a best-of-both approach?”

At Upstream Tech we call this a theory-guided machine learning approach, and it’s the beating heart of HydroForecast. We:

  1. Leverage expertise in meteorology and hydrology from our team and partners to inform how we select inputs
  2. Meet with our customers to understand and incorporate their wisdom of the river(s) they work on
  3. Build upon physical modeling approaches to inform how we train our models, evaluate our results, and make iterative improvements

How do we know this approach to applying machine learning makes HydroForecast stronger? 

The Forecast Rodeo participants included veteran forecasting teams at utilities like Tennessee Valley Authority and Hydro-Quebéc, governmental forecasts from agencies like NOAA’s National Weather Service River Forecast Centers, private vendors including Upstream Tech and Sapere, and – at some locations – public participants. We were not the only ones who submitted AI forecasts! However, one approach to applying machine learning (ours) performed vastly better than the others.

It’s likely that the other ML participants did not sufficiently incorporate hydrological theory. And we can speculate that the models’ performance would worsen over time as more extreme events occur and climate patterns and landscapes shift. In contrast, HydroForecast’s theory-guided machine learning design makes it the best forecasting model currently available, and it will continue to improve and perform in the years and decades to come.

It bears repeating: solutions that employ machine learning are not created equally. Often, there are simpler tools to use when devising solutions. And when machine learning is a good fit for the job, it is best applied with respect for the science and with the (human!) experts at the table.