Combine many netCDF files into a single file with Python
NetCDF data come in many shapes. I was recently working with precipitation data downloaded from the NOAA Physical Science Laboratory Server. Unfortunately, the data were all in different years, and I couldn’t get a combined file to download for all the data.
It seemed to be a simple problem, and I know there are many netCDF libraries that can handle such file merges without ripping the file open for its dimensions and variables.
After perusing many fora, I was able to piece together a solution to do the easiest job of taking different netCDF files that need to be combined over a dimension, in my case, it was time.
So what’re we trying to do again? We’re trying to go from disjointed netCDF files to a single netCDF.
- The simplest solution is to install the xarray library for Python. I use Jupyter Notebooks with Anaconda, so all I did was go to Anacoda UI, go to Environments, search in ‘All’, and then search for xarray. Once you get the right option, select it and click ‘Apply’ on the button right corner and let it finish installing (Click ok for any subsequent dialogues). There are other install options available on the xarray webpage.
2. Create a folder with all the data that needs to be combined. This will also be the folder where the Python script will reside. I had a folder called ‘precip’ with the hourly precipitation data mentioned above. As in the screenshot, the data was names in this format — ‘precip.hour.yyyy.nc’, yyyy being the year of observation recording.
3. Open Jupyter notebook, navigate to the created folder and create a new Python 3 notebook.
4. And now for the code. First, I imported all the libraries I care about. (I don’t need all these, but always import the first two out of force of habit, which I keep here for the sake of, well, authenticity.)
import netCDF4
import numpy
import xarray
5. This is the main line of code. I tell the script to open any netCDF file in my folder that begins with ‘precip.hour’, the * representing that whatever comes there is a don’t-care string. The way we want to combine the data is using the time coordinate dimension, which is the dimension that defines the relations of the data and order with respect to one another. And so we say combine = ‘by_coords’ and the concat_dim = ‘time’. (xarray provides more option to concatenate data with more complex needs like merging along two dimensions. For more, see here)
ds = xarray.open_mfdataset('precip.hour.*.nc',combine = 'by_coords', concat_dim="time")
6. I want to export this data into a combined netCDF, so that’s next.
ds.to_netcdf('precip_combined.nc')
7. And now I can check this data, its dimensions using ncdump, or plot it as a Spacetime Cube.
Here’s the ncdump
netcdf precip_combined.nc {
dimensions:
lon = 33;
lat = 21;
time = 309024;
variables:
float precip(time=309024, lat=21, lon=33);
:_FillValue = NaNf; // float
:least_significant_digit = 2S; // short
:long_name = "Hourly Accumulated Precipitation";
:valid_range = 0.0f, 10.0f; // float
:units = "in";
:precision = 2S; // short
:var_desc = "Precipitation";
:dataset = "CPC 2x2.5 Hourly US Precipitation";
:level_desc = "Surface";
:statistic = "Hourly Accumulation";
:parent_stat = "Observation";
:add_offset = 0.0f; // float
:scale_factor = 1.0f; // float
:missing_value = -9.96921E36f; // floatfloat lon(lon=33);
:_FillValue = NaNf; // float
:units = "degrees_east";
:long_name = "Longitude";
:actual_range = 220.0f, 300.0f; // float
:standard_name = "longitude";
:axis = "X";
:_CoordinateAxisType = "Lon";float lat(lat=21);
:_FillValue = NaNf; // float
:units = "degrees_north";
:long_name = "Latitude";
:actual_range = 20.0f, 60.0f; // float
:standard_name = "latitude";
:axis = "Y";
:_CoordinateAxisType = "Lat";double time(time=309024);
:_FillValue = NaN; // double
:long_name = "Time";
:delta_t = "0000-00-00 01:00:00";
:avg_period = "0000-00-00 01:00:00";
:verification = "accumulation from ob time to one hour later";
:standard_name = "time";
:axis = "T";
:units = "seconds since 1948-07-01 00:00:00";
:calendar = "proleptic_gregorian";
:_CoordinateAxisType = "Time";// global attributes:
:title = "CPC 2x2.5 Hourly US Precipitation";
:Conventions = "CF-1.2";
:history = "created 05/20/2004 by CAS from data obtained from NCEP";
:description = "Gridded hourly Precipitation";
:platform = "Observations";
:documentation = "https://www.esrl.noaa.gov/psd/data/gridded/data.cpc_hour.html";
:Source = "http://www.cpc.ncep.noaa.gov/products/precip/realtime/tmin_station/retro.html";
:References = "https://www.esrl.noaa.gov/psd/data/gridded/data.cpc_hour.html";
:dataset_title = "CPC Hourly Precipitation for the United States";
:_CoordSysBuilder = "ucar.nc2.internal.dataset.conv.CF1Convention";
}