Including value ranges in CSVW sample distribution
In the currently proposed specification of CSVW to provide data dictionaries in HealthDCAT-AP, there is no possibility to indicate the expected/allowed values for the different variables. However, knowing about values and units is essential to go from discoverable data to assessing whether the data is actually good and useable.
Example 1: In dataset X, gender is recorded as 1 and 2, so the datatype would be 'integer'. Without knowing that 1 = male and 2 = female, a data user cannot know what the respective values correspond to. Also, often 9 is chosen for missing values but it could also mean gender "other".
Example 2: In dataset Y, the diagnosis of patients is coded with ICD-10 codes, while other variables use different coding systems. Of course all coding systems can be specified in the healthdcatap:hasCodingSystem, but then a data user does not know which variables use which coding system.
It would be very useful, in addition to the 5 proposed properties, to also add a property to the CSVW Column with which a data holder can specify which values or value ranges are allowed for a given variable.