Decoding Climate Data Filenames

If you’ve ever spent any time looking at the filenames given to climate model output, you’ll quickly realize that they’re pretty hard to understand! This is one of the most common sources of confusion for beginners (and honestly, sometimes for experienced climate model users). BUT, there is usually a method to the madness, and once you’ve gotten used to that, it becomes much easier to figure things out!

(**disclaimer: although we’ll present a few examples here, there can be DIFFERENT methods to the different madnesses of individual modeling centers. So you may need to do a bit of your own research for more specific applications – but the general principles laid out here usually still apply.**)

CMIP Filename Conventions

We’ll start with the naming conventions used for files submitted to the Coupled Model Intercomparison Project (CMIP) archive (see also the CMIP Walkthrough Tutorial if you want more hands-on experience). These follow a specific set of rules agreed on by the various modeling centers, which are designed to make sure that all the models have a standard set of names for variables, use the same units and refer to them the same way, etc etc. This convention is colloquially called CMOR, or the Climate Model Output Rewriter, for the set of codes that modeling centers use to actually convert the simulations they’ve done to those conventions. More information on CMOR can be found on the CMOR website.

You can also find a more detailed description (with lots of jargon!) of the precise definitions of various parts of filenames in the CMIP6 Data Reference Syntax (DRS) page.

To make it easier to follow, let’s start with the file we downloaded in the CMIP Walkthrough Tutorial. This file’s name is tas_Amon_CanESM5_historical_r10i1p1f1_gn_185001-201412.nc. We’ll break this down into pieces:

Variable name, model component, and temporal resolution

This is a surface temperature dataset, so its CMOR standard name is “tas”. It is part of the atmosphere component of the Earth system (or “realm”, as CMIP weirdly poetically calls it), and we’re looking at monthly temporal resolution: so that’s where the “Amon” comes from.

You can get a fuller list of the different variable names and the model “realms” they belong to at the CMIP website.
Model, experiment, and ensemble member names

This particular file was generated using the Canadian Earth System Model version 5, by the Canadian Centre for Climate Modelling and Analysis (CCCma). It happens to be a historical simulation, where the standard CMIP datasets for observed historical greenhouse gas, aerosol emissions, and land use changes have been used. Additionally, this is the 10th ensemble member that CanESM5 contributed to CMIP6
(why did we use this one? Spoiler alert: it was the one that showed up first in the ESGF MetaGrid search!).

You’ll probably notice as you work with climate data that the ensemble member names aren’t just numbers, but instead strings that look something like “r1i1p1f1”, “r2i1p1f1”, etc. The numbers before the i, p, and f are usually things you don’t have to worry about, but technically you should make sure they all match between the ensemble members you’re using! That’s because they have real meanings:
- i = the initialization method used for the ensemble member
  Some ensembles use only “butterfly effect” style small perturbations, and others have larger differences in initial conditions. This is something that becomes important when dealing with ensemble spread and sometimes is important for purposes of climate predictability.
- “p” = the physics of the ensemble member
  Sometimes modeling centers make small changes to the model physics during the course of running their simulations (for instance: changing the value of certain ocean mixing parameters, or the length of the model timestep). This doesn’t generally have a big effect on global climate, but is something you want to keep track of just in case it changes the particular thing you’re trying to look at, so they give it a separate number (“p1” vs “p2”).
- “f” = the forcing used by that ensemble member
  Again, modeling centers try to all use the same sets of forcings (external factors that impact climate, like volcanoes or greenhouse gas emissions). But occasionally there are updates made to those forcing datasets, or errors that get found – in that case, again you want to keep track of which one each member uses, in case it affects the answer.
Grid name and time period

Sometimes, modeling centers will interpolate the data they’re submitting to the CMIP archive from the original grid they used (the “native grid”) to a different one (usually a regular grid with even angular spacing). This again usually isn’t that big a deal, but it’s good to keep track of in case there’s a small effect on the thing you’re interested in due to the interpolation process. Here, the raw data is being provided: “gn” means “grid native”, or the original model’s grid spacing.

The time period covered by the data is also included: this one is pretty self-explanatory, but is always good to check when you’re first loading in the file! It’s a good way to make sure there aren’t any errors in your time handling, since you know when the first and last points in your time series should be.

Now that you’ve gotten some practice reading climate data filenames, we recommend going back to the ESGF MetaGrid and playing with different choices for the different fields discussed in the CMIP6 Walkthrough – those should make a lot more sense now!