Welcome to GetCDCstation’s documentation!

GetCDCstation plugin

authors:

Deborah Niermann [1]
Deutscher Wetterdienst (DWD)

Etor E. Lucio-Eceiza [2], Mahesh Ramadoss
Institut für Meteorologie, Freie Universität Berlin, Deutsche Klimarechenzentrum (DKRZ)

Version from 2024-01-09

This plugin allows to get DWD station observations from the DWD Climate Data Center (CDC). The user can choose several locations, variables and time frequencies and convert it to cmor-conform netcdf, and ingest it to the databrowser.

  • Input one or several stations: The user can select stations via the DWD Station ID. If only one or few stations are selected use the input parameter Statonid; If several stations are selected, use the input parameter Stationlist.

  • link to other plugins: The station data are ingested directly to the databrowser under your own projectdata. From there they can be found from other Plugins like REALISTIC.

  • coordinates station: The produced netcdf file contains the longitude and latitude coordinates, that are associated with the latest timestep in the Station-Timeseries from Opendata. Be aware, that DWD-Stations can be relocated during time, so that the coordination in the produced netcdf file, are not mandatory valid for all timesteps in the netcdf file.

  • Metadata: The Plugin does not check the Metadata for possible inhomogeneities or wrong values. Be aware of station relocation, changing measurement systems, changing measurement times. Please check the downloaded Metadata for that, which are stored in Cachedir. for that Cacheclearmust be set to FALSE.

Warning

When using a list of variables do not launch the plugin in interactive mode (command line) as you will likely get the following error message:

OpenBLAS blas_thread_init: pthread_create failed for thread 43 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 1024 max

The login nodes have a maximum number of user processes set at 1024 (ulimit -a) Execute your job using the --batch-mode flag instead.

Input parameters:

  • stationid: Either one element or several separated by commas. To look up the Station-Id of a specific station, one can use the “*Beschreibung_Stationen.txt”-lists, that are saved within each folder at CDC. . The station lists differ for each variable and time frequency.

  • stationlist: If many stations should be cmorized at once, one can list several Id’s separated by newline (e.g. 1 per line). The needed format is based on the “*Beschreibung_Stationen.txt”-lists at opendata. You find an example in this plugin under test/testlist.txt.

    • To find your suitable stations and corresponding IDs use the CDC-Portal.

    • Be aware that coordinates are from latest station point, during time their could be rearrangement, that are not captured through this plugin. We assume that the station location is the last station location.

    • If you use a list, run the job through the workloader (–batchmode in cli or batchmode=True in python, see above).

  • outputdir: Folder where the cmorized station timeseries are stored. /path/to/freva/freva-work/<username>/evaluation_system/output/getcdcstation/out_id>

  • variable: Select a variable by using the “CMOR” name (hurs, pr, ps, sfcWind, sund, tas, tasmin, tasmax, wsgsmax)

  • time_frequency: Permitted time frequencies are 10min, 1hr, day, mon (mandatory field).

  • first_date: Start date of your interested time period with format YYYYMMDD.

  • last_date: End date of your interested time period with format YYYYMMDD.

  • cacheclear: The option to remove the temporary folder when the analysis tool is done.

  • cachedir: Temporary folder where the intermediate calculations and downloaded Station-Metadata are stored. /work/bmx825/miklip-work/<username>/evaluation_system/cache

  • link2database: Option to crawl the output netcdf files to add them to the XCES data pool, as in freva --crawl_my_data (default='False')

Table of available variables per time frequency:

The CDC does not offer the same variables for every time frequency. Additionally, only a subset (with most) of those offered by the CDC are available to download via the getCDC plugin currently:

Variable

10min

1hr

day

mon

hurs

pr

ps

sfcWind

sund

tas

tasmax

tasmin

wsgsmax

Method:

  • the downloading and cmorisations are done via bash and R scripts.

  • the plot can be created calling the movieplotter plugin via freva (optionally).

Output:

  • netcdf file, for each selected station. It has lat,lon,height as attributes + the CDC URL the data came from.

  • if the option Link2database='True' then the system creates a symbolic link from the OUTPUTDIR to each user’s own PROJECTDATA directory in /work/bm1159/xces-work/<username>/CMOR4LINK via : /work/bm1159/xces-work/<username>/CMOR4LINK/getcdcstation.<out_id>.project.product/

Warning

The lat, lon, station_height recorded at the NetCDF file corresponds to the last recorded position of the site. Be aware that a site may have had several position changes during its lifetime.

The following Metadaten_Geographie_01303.txt file, downloaded from the CDC, stores the position of the Essen-Bredeney site (stationID 1303) for its entire lifetime. with 10 different positions:

Stations_id;Stationshoehe;Geogr.Breite;Geogr.Laenge;von_datum;bis_datum;Stationsname
  1303;  105.00; 51.4450;  7.0120;18970801;19030430;Essen-Bredeney
  1303;  106.00; 51.4364;  7.0150;19030501;19131231;Essen-Bredeney
  1303;  107.00; 51.4310;  6.9848;19140101;19270731;Essen-Bredeney
  1303;  120.00; 51.4064;  6.9373;19270801;19450508;Essen-Bredeney
  1303;  120.00; 51.4167;  6.9333;19450509;19470930;Essen-Bredeney
  1303;  120.00; 51.4064;  6.9373;19471001;19650711;Essen-Bredeney
  1303;  153.50; 51.4059;  6.9671;19650712;19850526;Essen-Bredeney
  1303;  152.00; 51.4059;  6.9676;19850527;20000725;Essen-Bredeney
  1303;  150.00; 51.4041;  6.9677;20000726;20051207;Essen-Bredeney
  1303;  150.00; 51.4041;  6.9677;20051208;        ;Essen-Bredeney  <-- selected position
        **************************

Warning

During the netcdf file creation we assume that the station has been recording in UTC for the whole time period, however, that might not be the case and is not taken in account in this current plugin version. We will deal with it in the near future, however.

Naturally, this does not affect sites in day, mon time frequencies.

Programs:

  • sofware/: source file for R libraries and R scripts

  • src/: python wrapper file to communicate with freva, and bash script

  • test/: bash script with some preset parameters to easily run and check the functionality of the tool.

How to install the environment to run the plugin:

The plugin relies on certain R libraries. These can be easily installed within a conda environment. For that we will first need to make conda available, for example loading a freva instance:

$ git clone git@gitlab.dkrz.de:bm1159/plugins4freva/getcdcstation.git # or
$ git clone https://gitlab.dkrz.de/bm1159/plugins4freva/getcdcstation.git
$ cd getcdcstation
$ module load clint xces # or any other freva instace
$ make all

this will install a conda environment in ./getcdcstation/plugin_env/. Freva will recognise this environment when running the plugin.

How to run the plugin on development mode:

To run a local instance of the plugin (e.g., for developing or debugging):

$ module load clint xces # or any other freva instace
$ export  EVALUATION_SYSTEM_PLUGINS=/path/to/plugin/src/,getCDCstation_wrapper
$ freva plugin getCDCstation station_id=$STATIONID stationlist=$INPUTLIST outputdir=$OUT_DIR variable=
$VAR time_frequency=$TIMEFREQ first_date=$STARTDATE last_date=$ENDDATE cacheclear=$CACHECLEAR cachedir=$CACH
E_DIR link2database=${LINK2DB}
$ freva-plugin getcdcstation stationid=$stationid variable=$variable time_frequency=$time_frequency first_date=$first_date last_date=$last_date cacheclear=true link2database=true

Contact & References

Indices and tables