Welcome to GetCDCstation’s documentation!
GetCDCstation plugin
authors:
This plugin allows to get DWD station observations from the DWD Climate Data Center (CDC). The user can choose several locations, variables and time frequencies and convert it to cmor-conform netcdf, and ingest it to the databrowser.
Input one or several stations
: The user can select stations via the DWD Station ID. If only one or few stations are selected use the input parameterStatonid
; If several stations are selected, use the input parameterStationlist
.link to other plugins
: The station data are ingested directly to the databrowser under your ownprojectdata
. From there they can be found from other Plugins like REALISTIC.coordinates station
: The produced netcdf file contains the longitude and latitude coordinates, that are associated with the latest timestep in the Station-Timeseries from Opendata. Be aware, that DWD-Stations can be relocated during time, so that the coordination in the produced netcdf file, are not mandatory valid for all timesteps in the netcdf file.Metadata
: The Plugin does not check the Metadata for possible inhomogeneities or wrong values. Be aware of station relocation, changing measurement systems, changing measurement times. Please check the downloaded Metadata for that, which are stored inCachedir
. for thatCacheclear
must be set toFALSE
.
Warning
When using a list of variables do not launch the plugin in interactive mode (command line) as you will likely get the following error message:
OpenBLAS blas_thread_init: pthread_create failed for thread 43 of 128: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 1024 current, 1024 max
The login nodes have a maximum number of user processes set at 1024
(ulimit -a
) Execute your job using the --batch-mode
flag
instead.
Input parameters:
stationid
: Either one element or several separated by commas. To look up the Station-Id of a specific station, one can use the “*Beschreibung_Stationen.txt”-lists, that are saved within each folder at CDC. . The station lists differ for each variable and time frequency.stationlist
: If many stations should be cmorized at once, one can list several Id’s separated by newline (e.g. 1 per line). The needed format is based on the “*Beschreibung_Stationen.txt”-lists at opendata. You find an example in this plugin under test/testlist.txt.To find your suitable stations and corresponding IDs use the CDC-Portal.
Be aware that coordinates are from latest station point, during time their could be rearrangement, that are not captured through this plugin. We assume that the station location is the last station location.
If you use a list, run the job through the workloader (–batchmode in cli or batchmode=True in python, see above).
outputdir
: Folder where the cmorized station timeseries are stored./path/to/freva/freva-work/<username>/evaluation_system/output/getcdcstation/out_id>
variable
: Select a variable by using the “CMOR” name (hurs, pr, ps, sfcWind, sund, tas, tasmin, tasmax, wsgsmax
)time_frequency
: Permitted time frequencies are10min, 1hr, day, mon
(mandatory field).first_date
: Start date of your interested time period with format YYYYMMDD.last_date
: End date of your interested time period with format YYYYMMDD.cacheclear
: The option to remove the temporary folder when the analysis tool is done.cachedir
: Temporary folder where the intermediate calculations and downloaded Station-Metadata are stored./work/bmx825/miklip-work/<username>/evaluation_system/cache
link2database
: Option to crawl the output netcdf files to add them to the XCES data pool, as infreva --crawl_my_data
(default='False'
)
Table of available variables per time frequency:
The CDC does not offer the same variables for every time frequency. Additionally, only a subset (with most) of those offered by the CDC are available to download via the getCDC plugin currently:
Variable |
||||
---|---|---|---|---|
|
✓ |
✓ |
✓ |
|
|
✓ |
✓ |
✓ |
✓ |
|
✓ |
✓ |
✓ |
|
|
✓ |
✓ |
✓ |
|
|
✓ |
✓ |
||
|
✓ |
✓ |
✓ |
✓ |
|
✓ |
✓ |
||
|
✓ |
✓ |
||
|
✓ |
✓ |
✓ |
Method:
the downloading and cmorisations are done via bash and R scripts.
the plot can be created calling the movieplotter plugin via freva (optionally).
Output:
netcdf file, for each selected station. It has lat,lon,height as attributes + the CDC URL the data came from.
if the option
Link2database='True'
then the system creates a symbolic link from theOUTPUTDIR
to each user’s ownPROJECTDATA
directory in/work/bm1159/xces-work/<username>/CMOR4LINK
via :/work/bm1159/xces-work/<username>/CMOR4LINK/getcdcstation.<out_id>.project.product/
Warning
The lat, lon, station_height
recorded at the NetCDF file corresponds to the
last recorded position of the site. Be aware that a site may have had several
position changes during its lifetime.
The following Metadaten_Geographie_01303.txt
file, downloaded from the CDC,
stores the position of the Essen-Bredeney site (stationID 1303
) for its
entire lifetime. with 10 different positions:
Stations_id;Stationshoehe;Geogr.Breite;Geogr.Laenge;von_datum;bis_datum;Stationsname
1303; 105.00; 51.4450; 7.0120;18970801;19030430;Essen-Bredeney
1303; 106.00; 51.4364; 7.0150;19030501;19131231;Essen-Bredeney
1303; 107.00; 51.4310; 6.9848;19140101;19270731;Essen-Bredeney
1303; 120.00; 51.4064; 6.9373;19270801;19450508;Essen-Bredeney
1303; 120.00; 51.4167; 6.9333;19450509;19470930;Essen-Bredeney
1303; 120.00; 51.4064; 6.9373;19471001;19650711;Essen-Bredeney
1303; 153.50; 51.4059; 6.9671;19650712;19850526;Essen-Bredeney
1303; 152.00; 51.4059; 6.9676;19850527;20000725;Essen-Bredeney
1303; 150.00; 51.4041; 6.9677;20000726;20051207;Essen-Bredeney
1303; 150.00; 51.4041; 6.9677;20051208; ;Essen-Bredeney <-- selected position
**************************
Warning
During the netcdf
file creation we assume that the station has been
recording in UTC for the whole time period, however, that might not be
the case and is not taken in account in this current plugin version. We will
deal with it in the near future, however.
Naturally, this does not affect sites in day, mon
time frequencies.
Programs:
sofware/
: source file for R libraries and R scriptssrc/
: python wrapper file to communicate withfreva
, and bash scripttest/
: bash script with some preset parameters to easily run and check the functionality of the tool.
How to install the environment to run the plugin:
The plugin relies on certain R libraries. These can be easily installed within a conda environment. For that we will first need to make conda available, for example loading a freva instance:
$ git clone git@gitlab.dkrz.de:bm1159/plugins4freva/getcdcstation.git # or
$ git clone https://gitlab.dkrz.de/bm1159/plugins4freva/getcdcstation.git
$ cd getcdcstation
$ module load clint xces # or any other freva instace
$ make all
this will install a conda environment in
./getcdcstation/plugin_env/
. Freva will recognise this environment
when running the plugin.
How to run the plugin on development mode:
To run a local instance of the plugin (e.g., for developing or debugging):
$ module load clint xces # or any other freva instace
$ export EVALUATION_SYSTEM_PLUGINS=/path/to/plugin/src/,getCDCstation_wrapper
$ freva plugin getCDCstation station_id=$STATIONID stationlist=$INPUTLIST outputdir=$OUT_DIR variable=
$VAR time_frequency=$TIMEFREQ first_date=$STARTDATE last_date=$ENDDATE cacheclear=$CACHECLEAR cachedir=$CACH
E_DIR link2database=${LINK2DB}
$ freva-plugin getcdcstation stationid=$stationid variable=$variable time_frequency=$time_frequency first_date=$first_date last_date=$last_date cacheclear=true link2database=true