The goal of this document is to get you up and running with rsdmx as quickly as possible.
rsdmx
provides a set of classes and methods to read data
and metadata documents exchanged through the Statistical Data and
Metadata Exchange (SDMX) framework.
The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data:
SDMX allows to disseminate both data (a dataset) and metadata (the description of the dataset).
For this, the SDMX standard provides various types of documents, also known as messages. Hence there will be:
Generic
and
Compact
ones. The latter aims to provide a more compact XML
document. They are other data document types derivating from
the ones previously mentioned.Data Structure Definition
(DSD). As its name indicates, it
describes the structure and organization of a dataset, and will
generally include all the master/reference data used to characterize a
dataset. The 2 main types of metadata are (1) the concepts
,
which correspond to the dimensions and/or attributes
of the dataset, and (2) the codelists
which inventory the
possible values to be used in the representation of dimensions
and attributes.For more information about the SDMX standards, you can visit the SDMX website, or this introduction by EUROSTAT.
rsdmx offers a
low-level set of tools to read data and
metadata in the SDMX-ML format. Its strategy is to make
it very easy for the user. For this, a unique function named
readSDMX
has to be used, whatever it is a data
or metadata
document, or if it is local
or
remote
datasource.
What rsdmx
does support:
a SDMX format abstraction library, with focus on the the main
SDMX standard XML format (SDMX-ML), and the support of the three format
standard versions (1.0
, 2.0
,
2.1
)
an interface to SDMX web-services for a list of well-known data providers, such as OECD, EUROSTAT, ECB, UN FAO, UN ILO, etc (a list that should grow in a near future!). See it in action!
Let’s see then how to use rsdmx
!
rsdmx
can be installed from CRAN or from its development
repository hosted in Github. For the latter, you will need the
remotes
package and run:
This section will introduce you on how to read SDMX dataset documents, either from remote datasources, or from local SDMX files.
The following code snipet shows you how to read a dataset from a remote data source, taking as example the OECD StatExtracts portal: https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011
myUrl <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
You can try it out with other datasources, such as from the EUROSTAT portal: https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/NAMA_10_GDP/A.CP_MEUR.B1GQ.BE+LU
The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and more request examples.
Now, the service providers above mentioned are known by
rsdmx
which let users using readSDMX
with the
helper parameters. The list of service providers can be retrieved
doing:
Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me!
Let’s see how it would look like for querying an OECD
datasource:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011)
df <- as.data.frame(sdmx)
head(df)
It is also possible to query a dataset together with its
“definition”, handled in a separate SDMX-ML document named
DataStructureDefinition
(DSD). It is particularly useful
when you want to enrich your dataset with all labels. For this, you need
the DSD which contains all reference data.
To do so, you only need to append dsd = TRUE
(default
value is FALSE
), to the previous request, and specify
labels = TRUE
when calling as.data.frame
, as
follows:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011,
dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
head(df)
For embedded service providers that require a user
authentication/subscription key or token, it is possible to specify it
in readSDMX
with the providerKey
argument. If
provided, and that the embedded provider requires a specific key
parameter, the latter will be appended to the SDMX web-request. For
example, it’s the case for the new UNESCO SDMX
API.
Note that in case you are reading SDMX-ML documents with the native
approach (with URLs), instead of the embedded providers, it is also
possible to associate a DSD to a dataset by using the function
setDSD
. Let’s try how it works:
#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
key = list("TOT", NULL, NULL), start = 2010, end = 2011)
#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "MIG")
#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)
This example shows you how to use rsdmx
with
local SDMX files, previously downloaded from EUROSTAT.
#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)
#read local SDMX (set isURL = FALSE)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)
By default, readSDMX
considers the data source is
remote. To read a local file, add isURL = FALSE
.
This section will introduce you on how to read SDMX
metadata complete
data structure definition
(DSD)
This example illustrates how to read a complete DSD using a OECD StatExtracts portal data source.
dsdUrl <- "https://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1"
dsd <- readSDMX(dsdUrl)
rsdmx
is implemented in object-oriented way with
S4
classes and methods. The properties of S4
objects are named slots
and can be accessed with the
slot
method. The following code snippet allows to extract
the list of codelists
contained in the DSD document, and
read one codelist as data.frame
.
#get codelists from DSD
cls <- slot(dsd, "codelists")
#get list of codelists
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id"))
#get a codelist
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS")
In a similar way, the concepts
of the dataset can be
extracted from the DSD and read as data.frame
.
It is possible to save SDMX R objects as RData file (.RData, .rda, .rds), to then be able to reload them into the R session. It could be of added value for users that want to keep their SDMX objects in R data files, but also for fast loading of large SDMX objects (e.g. DSD objects) for use in statistical analyses and R-based web-applications.
To save a SDMX R object to RData file:
To reload a SDMX R object from RData file: