Code development platform for open source projects from the European Union institutions 🔵 EU Login authentication by SMS has been phased out. To see alternatives please check here

Skip to content

Including value ranges in CSVW sample distribution

In the currently proposed specification of CSVW to provide data dictionaries in HealthDCAT-AP, there is no possibility to indicate the expected/allowed values for the different variables. However, knowing about values and units is essential to go from discoverable data to assessing whether the data is actually good and useable.

Example 1: In dataset X, gender is recorded as 1 and 2, so the datatype would be 'integer'. Without knowing that 1 = male and 2 = female, a data user cannot know what the respective values correspond to. Also, often 9 is chosen for missing values but it could also mean gender "other".

Example 2: In dataset Y, the diagnosis of patients is coded with ICD-10 codes, while other variables use different coding systems. Of course all coding systems can be specified in the healthdcatap:hasCodingSystem, but then a data user does not know which variables use which coding system.

It would be very useful, in addition to the 5 proposed properties, to also add a property to the CSVW Column with which a data holder can specify which values or value ranges are allowed for a given variable.

Edited by Hannah Neikes