/
We will use xarray (a Python package) to read data stored in Network Common Data Format (netCDF)
netCDF from CMIP6 CESM2 future scenario SSP2-4.5
# import necessary libraries (or use conda on terminal) import xarray as xr import numpy as np import pandas as pd import matplotlib.pyplot as plt # open netCDF file using xarray data = xr.open_dataset('Downloads/tas_day_CESM2-WACCM_ssp245_r1i1p1f1_gn_20150101-20241231.nc')
To view the contents of the data, which is stored as an xarray.Dataset object you can use
.head()
syntax orprint()
command# viewing the data data.head() print(data)
NetCDF: Understanding Dimensions and Variables
Dimensions define the sizes of the data arrays along specific axes in the file, providing shape and size information for variables.
- Typically used to define coordinate variables. Ex: a dimension ‘time’ with size 365 means there are 365 data values along the time axis in the corresponding variable
- Usefulness of dimensions:
- Data Extraction: knowing the structure of the data allows you to extract specific subsets of the data (such as data within a certain time frame, region, or depth level, which are determined by the dimensions)
- Data Analysis and Visualization: dimensions provide context for data analysis and for the creation of meaningful plots, maps, and other visuals
- Metadata Interpretation: associated meta data (units, description, etc.) is helpful when interpreting and using data correctly
- Data Integration: understanding dimensions is critical when integrating/merging data from different files or sources
Variables hold the actual data values as multidimensional arrays associated with dimensions.
- They are defined using one or more dimensions that specify the size of each dimension of the variable.
- Ex: a variable ‘temperature’ with dimensions time, lat, and lon
temperature(time, lat, lon)
Viewing what variables are stored in the data file
type(data) # call for the variables stored in the file print(data.variables.keys())
We can also explore variables individually
This will show what the units are for the variable, as well as other useful information such as the standard name.
# explore each of the variables individually lon = data.variables['lon'] print(lon) # there are 288 horizontal lines
lat = data.variables['lat'] print(lat) # there are 192 verticle lines
time = data.variables['time'] print(time) # we can see that first date and the last date that is included # notice that 'time' is an object here