The images below represent one of a number of different ways that the forecasts
produced by Deep Thunder can be evaluated and verified.
To evaluate the model forecasts, one must consider appropriate metrics. Traditionally,
these are defined from a meteorological perspective. That is, how well do the
forecasts correspond to reality and how do they compare to other forecasts?
Therefore, the first type of evaluation will be along those lines.
Later, additional metrics will be defined and presented here, which are based upon how
the forecasts are used and the weather sensitivity of the particular business problems for
which the model is being applied.
The first type of evaluation considers the comparison of the Deep Thunder results as well as
the NCEP North American Model (NAM) continental-scale model forecasts to near-surface observations at weather stations
operated by the National Weather Service,
known as metars
("METeorological Aviation Routine reports"). The observations are available at essentially
random locations roughly every hour, but with variation in the time of measurement by up to 20
minutes. The data are made available courtesy of the
National Weather Service via their
NOAAport
data transmission system. The NOAAport system used for this
project was developed by Planetary
Data, Inc.
This process turns out to be not as
straightforward as one would imagine due to a variety of inconsistent samplings in space, time and
observables (e.g., precision and error). From the Deep Thunder forecasts, there are
standard weather variables at the surface
at specific grid points at 16, 4 and 1 km resolution available every 10 minutes of forecast time.
From the NAM forecasts, although computed frequently at 12 km resolution, surface data are
only available at 40 km resolution every three hours.
So, the first step is that the results of both models are bilinearly interpolated to the locations of
the observing stations. Since the measurements are actually taken above the surface (2m for
temperature, humidity, etc. and 10m for wind), and the model topography is only an
approximation of the actual station elevation, a simple correction is applied.
Adjustments to pressure and temperature are based upon the lapse rate difference between
actual and model elevations. Later, corrections to temperature and winds
will be made by invoking similarity theory. Then the
Deep Thunder results are averaged for every hour and interpolated to the time of the measurement. The
NAM results are processed in a similar fashion but for every three hours. Then a variety of statistics
are computed, for each model forecast in total, by time and by location. Only a handful of those statistics are
shown herein. There are also occasional problems with the quality and available of the observations, as well as noise
due to the measurement process, which impact the results. Although simple quality control is used to eliminate
measurements which are clearly out of range, that is often insufficient.
In addition, this approach to verification is more suited to models at a synoptic scale or even a global scale as
well as traditional forecast analysis. For example, small errors in the phase, timing or
location of weather "events" in the model forecast (i.e., high-amplitude "features") can often be
manifested as significant error in the model results when using these techniques, even when the model
provides good skill at forecasting events realistically.
This situation is exacerbated by the limited sampling of data from metar sites in space (55 within the 4km nest and
only 9 within the 1km nest) and time (roughly once per hour).
To address some of these limitations, a small "mesonet" (network of weather stations) is being installed at a number
of locations within the 4km and 1km nest to enable nearly continuous observations of local weather.  
Some of the issues are discussed further in some papers that outline on-going efforts to evaluate and verify the
forecasts that are available for you to read. The first paper focuses on specific
events and long-term performance. Another paper describes the performance for snowstorm forecasts during the 2002-2003 winter. The folllowing papers may also be of interest:
Verification of the Most Recent Temperature and Dew Point Forecast
The first example shows temperature and dew point results only for those model results and
observations that are within the 4km and 1km nests of the Deep Thunder forecasts.
Each curve represents one of the
variables from either Deep Thunder or NAM, each in a different color.
Each curve is plotted as a function of forecast
time along the x-axis with two statistics being shown simultaneously.
The 24-hour model forecast is compared against the most recently available observations.
Typically, the current contents will reflect an evaluation of a model completed about a day ago.
The y-axis is bias
while the z-axis is root mean square error (both in kelvin or degrees C.). Hence, a negative
(or positive) bias for temperature implies that the model is too cool (or too warm). A negative
(or positive) bias for dew point implies that the model is too dry (or too moist). Root mean
square error is a common metric for forecast accuracy. Thus, better results are toward the bottom of the
z-axis (i.e., 0), implying closer correspondence to the observations. The combination of these metrics enables
one to see correlation between bias and accuracy as a function of forecast time.
Verification of the Most Recent Wind Speed Forecast
The second example shows wind speed results only for those model results and
observations that are within the 4km and 1km nests of the Deep Thunder forecasts.
One curve represents the Deep Thunder results while the other is for NAM, each in a different color.
Each curve is plotted as a function of forecast
time along the x-axis with two statistics being shown simultaneously.
The 24-hour model forecast is compared against the most recently available observations.
Typically, the current contents will reflect an evaluation of a model completed about a day ago.
The y-axis is bias
while the z-axis is root mean square error (in knots). Hence, a negative
(or positive) bias for implies that the model is too slow (or too fast). Root mean
square error is a common metric for forecast accuracy. Thus, better results are toward the bottom of the
z-axis (i.e., 0), implying closer correspondence to the observations. The combination of these metrics enables
one to see correlation between bias and accuracy as a function of forecast time.
Instructions
For each of the three-dimensional plots above (and below), you can interact
with them by clicking and dragging your mouse inside the image
in a limited fashion.
If you are having problems viewing or interacting with this
animation, make sure your browser has Javascript enabled.
Typical Deep Thunder operations are producing forecasts usually only two
times per day (e.g., 0Z and 12Z) while NAM results are received and
processed four times per day (0Z, 6Z, 12Z and 18Z). However, the former may be
generated more often or at other times. Hence, there may
be times when only NAM results are presented in these visualization.
Verification of the Past Week of Temperature and Dew Point Forecasts
The first example shows temperature and dew point results only for those model results and
observations that are within the 4km and 1km nests of the Deep Thunder forecasts.
Each curve represents one of the
variables from either Deep Thunder or NAM, each in a different color.
Each curve is plotted as a function of forecast
time along the x-axis with two statistics being shown simultaneously.
All of the model results generated in the last week are compared against the appropriate observations.
The y-axis is bias
while the z-axis is root mean square error (both in kelvin or degrees C.). Hence, a negative
(or positive) bias for temperature implies that the model is too cool (or too warm). A negative
(or positive) bias for dew point implies that the model is too dry (or too moist). Root mean
square error is a common metric for forecast accuracy. Thus, better results are toward the bottom of the
z-axis (i.e., 0), implying closer correspondence to the observations. The combination of these metrics enables
one to see correlation between bias and accuracy as a function of forecast time.
Verification of the Past Week of Wind Speed Forecasts
The second example shows wind speed results only for those model results and
observations that are within the 4km and 1km nests of the Deep Thunder forecasts.
One curve represents the Deep Thunder results while the other is for NAM, each in a different color.
Each curve is plotted as a function of forecast
time along the x-axis with two statistics being shown simultaneously.
All of the model results generated in the last week are compared against the appropriate observations.
The y-axis is bias
while the z-axis is root mean square error (in knots). Hence, a negative
(or positive) bias for implies that the model is too slow (or too fast). Root mean
square error is a common metric for forecast accuracy. Thus, better results are toward the bottom of the
z-axis (i.e., 0), implying closer correspondence to the observations. The combination of these metrics enables
one to see correlation between bias and accuracy as a function of forecast time.
Additional Instructions
If the forecast information presented on this page does not seem
to be current and you have visited this site recently, the results of the
previous visit may have been saved in your web browser's cache. If so,
you should change your cache settings (e.g., File->Preferences->Advanced->Cache
in Netscape and set the document comparison to "Every time"). When you
restart your browser, the problem should be solved. For your current session,
you should manually clear the cache and reload the page.
Currently, visualizations showing statistics accumulated for time are shown.
They will be augmented with similar results accumulated geographically.
Later, verification results for precipitation will be shown.
|