Downloading Data from the CMIP6 Website: A Walkthrough

In this tutorial, we will go through the process of finding and downloading information from the Coupled Model Intercomparison Project phase 6, or CMIP6. Here, we will rely on the CMIP6 data portal hosted by the Earth System Grid Federation, also known as ESGF: more information on how this hosting service is configured can be found on the CMIP Data Access site.

Note: if you go through the steps in this tutorial, the data downloaded will be what you need to complete the time series plotting exercise on our companion Github page. We recommend going there next and working through the code!

Step 1: Decide on what data should be downloaded

The first thing to do is always to decide on your goals. Here we will focus on the surface air temperature data from the Canadian Earth System Model version 5 (CanESM5), over the historical period (1850-2014). The CanESM5 is a newer model, and data using this model were submitted to the CMIP6 project.

(note: this is the same dataset mentioned in the Data Naming Conventions example.)

Step 2: Locate the appropriate data files

This is the most complicated part of the process, since it involves navigating the various models and experiments on the ESGF. The location of the data portal is here: https://aims2.llnl.gov/search

You will see that this page is a search interface, meant to help users locate the portions of the massive CMIP archive that are useful for their particular application. 

Step 3: Select the project – CMIP6

The first thing to do is to select the project you are interested in. In this case, we know that CanESM5 is a CMIP6 model, so we should choose this option. 

In some cases, we might also be interested in slightly older data (for instance, to see how projections have changed when using newer models) – for this reason, the older phases of CMIP are also included (the CMIP5 and CMIP3 options in the drop-down menu). But here CMIP6 should do just fine!

Step 4: Refine the search further

We next want to specify the other parameters of the search enough to find the appropriate files (the step above returns over 13 million results!) There are several sets of drop-down menus on the left-hand side of the page, which specify different aspects of the desired dataset.

There are multiple ways of refining your search, and we encourage you to play around with the menus until you find one you prefer! Here is the one we tend to use:

Specify the experiment: historical

Restrict the search to simulations of the observational period – these are called “historical” in CMIP6 terms. This can be found by expanding the “Identifiers” menu and specifying “historical” in the “Experiment ID” field.

Note: if you start typing “historical” into the search box, you’ll see that there are several other experiments that start with “hist”. For instance, “hist-nat”, “hist-aer”, etc. These are not what you want right now! They refer to simulations where only some of the actual factors that impact the climate are included (i.e. not changing greenhouse gas emissions, things like that). The “historical” experiment is the one that includes everything – both human and natural influences on the climate.

When these two fields are selected, this is the screen that results:

The list is shorter – but still much too long for what we’re trying to do.

Specify the model: CanESM5

Since we know that we’re only after the CanESM5 output right now, we can restrict the output to only data generated with that model. This is listed under the “Identifiers” dropdown, in the “Source ID” field (since models are, after all, the sources of the data we’re trying to work with).

After typing “CanESM5” into the Source ID field, the search is now further restricted:

But 50,000 results is still way too many!

Specify the model component and variable of interest

We can really narrow things down a lot further by specifying that we’re interested in atmospheric output from CanESM5, and specifically the surface air temperature (“tas”) field. Both of these can be found under the “Classifications” dropdown!

To specify the atmosphere, go to the “Realm” search field and select “atmos”. Then to select the tas variable, go to “Variable ID” and enter “tas”. Note: if you’re looking for a given variable and you don’t know what its name is, you can also search using the “CF Standard Name” field. This is basically the plain English name, and if you start typing in “temperature”, for instance, the tas variable will also come up!

After restricting the search using the “atmos” and “tas” criteria, we’re almost to a manageable number of results:

Specify the time frequency and ensemble member

The next level of refinement is to specify the output frequency: most models save variables at a couple of different averaging periods. The most typical are daily and monthly, but for some (mostly atmospheric) applications people also want 6-hourly averages of things.

You can tell the search engine you want monthly output in the “Classifications” dropdown under “Frequency” (it’s called “mon”).

While we’re at it, let’s go ahead and also specify that we’re looking for a particular ensemble member. Ensembles are discussed in more detail in the Large Ensembles explainer page, but essentially these are different simulations run with the same exact climate model and scenario, with slight differences in the starting points (or “initial conditions”).

The ensemble members all have different names, so we can tell them apart: these are listed in the “Labels” dropdown, under “Variant ID”. You can scroll through the Variant ID menu to get a sense for the available numbers of simulations: you’ll notice that they all follow a similar syntax, with names like “ri1p1f1”. The different integers all have various meanings (for more detail see the CMIP6 Global Attributes doc linked below) that aren’t super important for these purposes: the point is that each different name corresponds to a different ensemble member.

Here we’ve selected the r10i1p1f1 member, for which the search results are now quite manageable! 6 files now show up in the search window:

Download a file!

Now we’re basically ready to do the download! The down arrows next to each of the search results will allow you to expand the information on each:

Here we expanded two at once, so you can see that the filename associated with each is identical.

Why are there six different identical search results??
It might seem a bit like the website is broken at first… but there are multiple different links to the same file since each of them represents access through a different node of the ESGF. The data is distributed across multiple servers at facilities around the world, to make it easier for users

located in different places to download to their local machines. So for example, in the screenshot above if you look through the “Metadata” field for the two first results, you’ll find that the “data_node” field is different:
data_node: crd-esgf-drc.ec.gc.ca
data_node: esgf-data1.llnl.gov

These represent data servers in Canada and California, respectively – so if you’re having trouble downloading one file, you can always try another!

Use the down arrow link under “Download/Copy URL” to download the desired file to your local machine – and make sure you keep track of the location of the file on your computer once you’ve done that! (You’ll need it in order to do anything interesting with the data.)

Step 5: Do Your Analyses!

Once you’ve successfully downloaded your file, you can actually do interesting things with it! As a starting point, you can follow along with the code tutorials here:

Github repo (CMIP6 Trends)

Other Search Parameters Decoded

To inform your searches further, a “decoder” for the website is as follows. See also the CMIP Data Access site for additional information on these topics!

Menu options:

(under “General”)

  • Activity ID
    This is the overall name of the “activity” (i.e. broader project). If you’re arriving ‘fresh’ to a blank search page, the only option available will be “CMIP”. However, after searching “CMIP6”, you’ll then find that multiple entries are available – these correspond to the ‘sub-MIPs’ discussed in the “CMIP and Other MIPs” section of this site.
  • Data Node
    The server which holds the data files you’re trying to access. In theory, all (or the majority of) the data files are mirrored across multiple nodes – in other words, multiple copies of the files are stored on data servers around the world to facilitate access by people living closer to one server than another.

(under “Identifiers”)

  • Source ID
    The name of the climate model which generated a given dataset. List of climate models contributing to CMIP6
  • Institution ID
    The modeling center responsible for developing the specified climate model. List of climate modeling center names contributing to CMIP6
  • Source Type
    An acronym describing the type of model each model is: “AOGCM” stands for “atmosphere-ocean general circulation model”, “BGC” for “biogeochemically-enabled model”, and there are various other things referring to more simplified models. We recommend not worrying too much about this one for novice users!
  • Experiment ID
    A descriptor referring to the type of simulation run with each model. Some of these are discussed in the “CMIP and Sub-MIPs” section of this site: for most climate impacts purposes, you’ll be mainly interested in the “historical” and “sspxx” or “rcpxx” simulations.
  • Sub Experiment ID
    A descriptor referring to more specific names for some types of experiments (i.e. initialization years for forecasting experiments, etc.). We recommend not worrying too much about this one for novice users!

(under “Resolutions”)

  • Nominal Resolution
    This is pretty much what it sounds like: the “nominal” resolution, or grid spacing, for a given model. It’s called “nominal” since some of the grids vary their spacing over the globe for various computational reasons; you can think of this as being the average size of a grid box. Some sizes are given in degrees and some in km; a handy conversion is
    1 degree = 100 km (approximately)

(under “Labels”)

  • Variant Label
    This is equivalent to the ensemble member name for most purposes. These follow the naming convention rxx-ixx-pxx-fxx, where xx is an integer (i.e. r1i1p1f1, etc.). Technically each of the numbers following the r, i, p, and f have specific meanings related to changes to model physics, forcing, and other things; for more detail you can see the CMIP Data Reference Syntax description.
    Generally, you don’t need to worry too much about it other than making sure that the names match if you’re trying to concatenate files with multiple chunks of time in them together!
  • Grid Label
    This is a weird one – it’s a descriptor that’s designed to indicate whether data has been transformed from its original grid or not, along with other technical aspects of the model’s grid. We recommend not worrying too much about this one for novice users!

(under “Classifications”)

  • Table ID
    This is another confusing search term (in our opinion). It refers to “MIP tables”, which are sets of variables output for a given model component (for instance, the atmosphere) at a particular time averaging frequency (for instance, monthly). It appears to be an effort to provide a quicker route to search for data, rather than specifying both the model component and frequency individually. A full list of the MIP tables can be found here, but generally we recommend not worrying too much about this one for novice users!
  • Frequency
    The time averaging frequency at which the desired variable has been saved by the model. Common options: daily, monthly, 3-hourly, 6-hourly
  • Realm
    The model component in which the desired variable was calculated: atmosphere, ocean, ice, land, etc.
  • Variable ID
    The name of the desired variable; this is generally a more or less reasonable shorthand for the plain English name (e.g. “tas” for surface air temperature). A full list of CMIP variables can be found here.
  • CF Standard Name
    The plain English name of the variable. This is given in a standardized way for all variables, such that all models use the same names; the set of standards is called the Climate Forecast (CF) standards. A full set of standard CF names can be found here.

Resources for Further Reading

CMIP Model and Experiment Documentation

CMIP6 Data Reference Syntax Document

CMIP6 Climate Model Names

CMIP6 Modeling Centers

CMIP Variables

References

Swart, N.C., Cole, J.N., Kharin, V.V., Lazare, M., Scinocca, J.F., Gillett, N.P., Anstey, J., Arora, V., Christian, J.R., Hanna, S. and Jiao, Y., 2019. The Canadian earth system model version 5 (CanESM5. 0.3). Geoscientific Model Development, 12(11), pp.4823-4873.