Combine many netCDF files into a single file with Python

Neeti Nayak
4 min readJun 1, 2020

--

NetCDF data come in many shapes. I was recently working with precipitation data downloaded from the NOAA Physical Science Laboratory Server. Unfortunately, the data were all in different years, and I couldn’t get a combined file to download for all the data.

Data/image provided by the NOAA ESRL Physical Sciences Laboratory, Boulder, Colorado, USA, from their website at http://psl.noaa.gov/

It seemed to be a simple problem, and I know there are many netCDF libraries that can handle such file merges without ripping the file open for its dimensions and variables.

After perusing many fora, I was able to piece together a solution to do the easiest job of taking different netCDF files that need to be combined over a dimension, in my case, it was time.

So what’re we trying to do again? We’re trying to go from disjointed netCDF files to a single netCDF.

  1. The simplest solution is to install the xarray library for Python. I use Jupyter Notebooks with Anaconda, so all I did was go to Anacoda UI, go to Environments, search in ‘All’, and then search for xarray. Once you get the right option, select it and click ‘Apply’ on the button right corner and let it finish installing (Click ok for any subsequent dialogues). There are other install options available on the xarray webpage.
steps to find xarray in Anaconda

2. Create a folder with all the data that needs to be combined. This will also be the folder where the Python script will reside. I had a folder called ‘precip’ with the hourly precipitation data mentioned above. As in the screenshot, the data was names in this format — ‘precip.hour.yyyy.nc’, yyyy being the year of observation recording.

3. Open Jupyter notebook, navigate to the created folder and create a new Python 3 notebook.

Steps to create a new Python3 notebook in the right folder

4. And now for the code. First, I imported all the libraries I care about. (I don’t need all these, but always import the first two out of force of habit, which I keep here for the sake of, well, authenticity.)

import netCDF4
import numpy
import xarray

5. This is the main line of code. I tell the script to open any netCDF file in my folder that begins with ‘precip.hour’, the * representing that whatever comes there is a don’t-care string. The way we want to combine the data is using the time coordinate dimension, which is the dimension that defines the relations of the data and order with respect to one another. And so we say combine = ‘by_coords’ and the concat_dim = ‘time’. (xarray provides more option to concatenate data with more complex needs like merging along two dimensions. For more, see here)

ds = xarray.open_mfdataset('precip.hour.*.nc',combine = 'by_coords', concat_dim="time")

6. I want to export this data into a combined netCDF, so that’s next.

ds.to_netcdf('precip_combined.nc')

7. And now I can check this data, its dimensions using ncdump, or plot it as a Spacetime Cube.

Here’s the ncdump

netcdf precip_combined.nc {
dimensions:
lon = 33;
lat = 21;
time = 309024;
variables:
float precip(time=309024, lat=21, lon=33);
:_FillValue = NaNf; // float
:least_significant_digit = 2S; // short
:long_name = "Hourly Accumulated Precipitation";
:valid_range = 0.0f, 10.0f; // float
:units = "in";
:precision = 2S; // short
:var_desc = "Precipitation";
:dataset = "CPC 2x2.5 Hourly US Precipitation";
:level_desc = "Surface";
:statistic = "Hourly Accumulation";
:parent_stat = "Observation";
:add_offset = 0.0f; // float
:scale_factor = 1.0f; // float
:missing_value = -9.96921E36f; // float
float lon(lon=33);
:_FillValue = NaNf; // float
:units = "degrees_east";
:long_name = "Longitude";
:actual_range = 220.0f, 300.0f; // float
:standard_name = "longitude";
:axis = "X";
:_CoordinateAxisType = "Lon";
float lat(lat=21);
:_FillValue = NaNf; // float
:units = "degrees_north";
:long_name = "Latitude";
:actual_range = 20.0f, 60.0f; // float
:standard_name = "latitude";
:axis = "Y";
:_CoordinateAxisType = "Lat";
double time(time=309024);
:_FillValue = NaN; // double
:long_name = "Time";
:delta_t = "0000-00-00 01:00:00";
:avg_period = "0000-00-00 01:00:00";
:verification = "accumulation from ob time to one hour later";
:standard_name = "time";
:axis = "T";
:units = "seconds since 1948-07-01 00:00:00";
:calendar = "proleptic_gregorian";
:_CoordinateAxisType = "Time";
// global attributes:
:title = "CPC 2x2.5 Hourly US Precipitation";
:Conventions = "CF-1.2";
:history = "created 05/20/2004 by CAS from data obtained from NCEP";
:description = "Gridded hourly Precipitation";
:platform = "Observations";
:documentation = "https://www.esrl.noaa.gov/psd/data/gridded/data.cpc_hour.html";
:Source = "http://www.cpc.ncep.noaa.gov/products/precip/realtime/tmin_station/retro.html";
:References = "https://www.esrl.noaa.gov/psd/data/gridded/data.cpc_hour.html";
:dataset_title = "CPC Hourly Precipitation for the United States";
:_CoordSysBuilder = "ucar.nc2.internal.dataset.conv.CF1Convention";
}

--

--

Neeti Nayak
Neeti Nayak

Written by Neeti Nayak

writes about the politics of tech, design and engineering. Wants chai all the time.

Responses (3)