NWS Boosts Big Data Systems to Improve Forecasts

July 15, 2016, 10:26 a.m. EDT 8 Min Read

The U.S. National Weather Service’s Climate Prediction Center says this year's Atlantic hurricane season will be “near-normal,” with four to eight hurricanes, of which one to four might turn into Category 3 or higher storms.

Processing Content

Granted, it's hard to anticipate the whims of Mother Nature, but the agency has been monitoring the skies for some time, processing vast information feeds with massive computing resources. And the agency earlier this year enhanced it systems to crunch even bigger amounts of data and improve its forecasts.

What is now known as the National Weather Service began forecasting almost 150 years ago. The NWS, which is part of the National Oceanographic and Atmospheric Association, employees 5,000 people and operates 122 weather forecast offices, nine national centers, and other support offices.

Each year, it issues about 1.5 million forecasts and 50,000 storm warnings based on about 76 billion observations. While the NWS says it's difficult to measure prediction accuracy across the board—considering that the NWS issues forecasts for temperature, precipitation, severe weather, wind speed and direction, wave height, and more—the organization says that in 2015, its seven-day temperature forecasts had the same level of accuracy that five-day forecasts had 15 years before.

Always looking to improve, the NOAA announced in January that the NWS had boosted the capacity of its Weather and Climate Operational Supercomputing System.

Consisting of two Cray supercomputers and two IBM machines (based on Intel Sandy Bridge and Ivy Bridge chips), the enhanced system runs with a total of 5.78 petaflops of operational computing capacity, or 2.89 petaflops for each Cray machine—Luna, in Reston, Va., and Surge, in Orlando, Fla. That upgrade tripled the computing power for forecast generation, enabling more accurate forecasting for smaller regions, and empowering developers and data scientists to create better applications and data models.

According to Rebecca Cosgrove, who oversees the supercomputer system, the $47.6 million upgrade is driven both by an increase in the amount of data the NWS processes and the need to do more with it.

“There are ever-increasing amounts of data in the environmental sciences as we bring on more satellites, newer satellite data,” Cosgrove said a new global ocean-observing satellite, the Jason-3. “Then the other thing is, we need the compute power to model the weather at the scales that the weather happens. That helps us model more and more types of weather phenomenon on smaller scales.”

Cosgrove is chief of the Implementation and Data Services branch of the National Centers for Environmental Prediction's Central Operation. NCEP is a unit of the NWS. She says that upgrades have been steadily improving the NWS's ability to predict weather, both with greater precision and further into the future.

NCEP data scientists currently use 23 models to predict the weather, from specific hurricane models to the U.S. Global Forecast System (GFS), the service's primary and only global model.

Upgrades to the GFS lead to improvements to all other forecast models. A major upgrade to the GFS in 2015 improved its resolution from 27 kilometers down to 13, and other, more specific computational models can get down to 3 km and, in one case, 1.33 km. A further improvement to the GFS last month shifted from 3D modeling to 4D modeling, adding the dimension of time to understand how weather models evolve, leading to more accurate and timely predictions.

The NWS systems can hold 8 petabytes of data per machine. Every day, the NWS takes in 500 gigabytes of environmental data. Its various models produce 5 terabytes of data per day, which is uploaded to a server and released to the public.

Luna and Surge together perform 3 quadrillion calculations per second. While the two separately located supercomputers are intended to provide backup redundancy in the case of catastrophe, Cosgrove says it's not a case of one machine sitting idle, waiting for a crisis.

“We do use all 5.78 petaflops,” Cosgrove says.

At any time, one location is being used as the production system, running all of the operational weather models, and disseminating the data.

The other system is employed for development, and she says the developers make full use of the potential. “They would probably use twice as much if I would give it to them.”

She regularly switches production between the Luna and Surge systems, which makes quarterly upgrades and other maintenance easy to schedule, as well as making sure any switch in response to a sudden crisis would be well-practiced. The two systems are identically configured, with the same data, so there's no preference as to which is currently the production system.

The Data Flow

The supercomputers model environmental data, including measurements from satellites, weather balloons, and oceanographic observations. The data comes from many sources around the globe, generally in the handful of formats approved by the World Meteorological Organization. After ingestion, the data is converted to a smaller pool of standard formats and stored in one of 13 databases from which scientists can populate their weather models. Further, Cosgrove notes, output from running a model is put into the common directory structure so that output from one model can be used as input for another.

“It’s the idea of taking all this data and putting it in common areas, in similar formats, so that other applications can use them,” she says.

Models, which are either developed in-house or through the global meteorological community, are often written in Fortran, “because that language is designed to do mathematical calculations, and that's the basis of the weather models,” she says. “We also have codes written in other languages, such as C, and a few different scripting languages, such as Perl and Python.”

Perl and Python we use for the scripts that run the models. Because the models are comprised of code sourced from the global meteorological community, its many contributors sometimes create code in varying languages.

The data is crunched on the Cray supercomputers or IBM components of the system, using the native software of each. “On the Crays, we’ll use their version of how you do parallel processing, and on the IBM side we’ll use their version,” she says. “We use a lot of the Intel Compiler Suite, things like that, but most of the applications I’m running are written by either our developers or the atmospheric science and oceanographic communities.”

From Data to Diagrams

The actual outputs of NCEP's supercomputers are not the technicolor weather maps on your nightly news. The data is raw numbers in the core formats for meteorological data (GRIB, BUFR and NetCDF).

“We do put out some amount of images … of the model output, but we don’t get a lot into developing custom output products,” she says, noting that there is a very diverse user community, from operational forecasters to researchers, all using the data in different ways. “We want to put out what the models put out, and then let our users tailor that data as they see fit.”

Forecasters in the National Weather Service use a homegrown system called AWIPS (Advanced Weather Interactive Processing System) to ingest, integrate and display data. Forecasters use this tool to produce forecasts and visualizations. NCEP delivers its modeled data to AWIPS primarily via satellite broadcast and then it gets displayed in front of the forecasters, who can further manipulate the data to take local expertise into account and produce an official forecast.

“We’re kind of the background guidance for them, and then they add their value to it on top of it, and then distribute that to the public as our official forecast,” she says. “Every forecast office across the country has that same toolkit in AWIPS to process all of the data they get—not just my models but the satellite data, the radar data, all of the information about the atmosphere and the environment.”

Cosgrove's staff includes 10-15 people in two teams, with one integrating the models and the other working with the challenges of ingesting, managing and providing the data to users. Their IBM contract provides about 14 people to maintain the computers day-to-day, and there are about as many in-house staff monitoring and supporting the functioning of the supercomputers and the models.

NCEP has a center devoted to creating and refining the computational models, staffed with nearly 200 people. The worldwide meteorological community also contributes to the creation of models.

Models are run on a set schedule, the most frequently running hourly. In the case of a tropical system, relevant models are run four times a day. There are models for dispersion of gasses or liquids that would be run in the event of a volcanic eruption or chemical spill. “Those we do like on demand right away when it happens, as soon as we're asked,” she says.

And, of course, as the Atlantic hurricane season wears on—officially ending November 30—the National Weather Service reacts to specific events, such as releasing additional weather balloons and Hurricane Hunters, which are specially equipped planes that fly into the storms. All of this provides more data for the hurricane models, which are generally updated immediately before the season starts. This year's model was being tested in late May, ahead of the official start of hurricane season in June.

“We’re always improving the physics and improving features of the model,” she says. And as the first few storm systems emerge, there's a lot of interest from the modeling team. “When a new upgrade of the Global Model goes in, people usually take note of the first few big storms to watch the performance.”

It's still early in the Atlantic hurricane season, but as unpredictable as nature can be, the NWS has put more computational muscle than ever into staying a step or two ahead.