A data dictionary is the metadata of the datasets.
Gives an overview of the types of data to be expected. With the possibility to omit sensitive information.
Dictionary Includes
data types
value range or allowed values
any relationship to other data elements
Example of a Dictionary
- Data glossary shows the defined controlled vocabulary
Exercise: Create a Data Dictionary
How would you describe your dataset?
take notes on metadata for your dataset max. 3 min
Without opening each files on your computer, which metadata variables would you need to understand your data 5 years from now?
Your time starts now!
What is a Data Catalogue?
Collection of metadata for the individual datasets within a consortia or project
Provide context and provenance for each dataset within the group of datasets
Search index at a glance for all work done within your group
Example
Homework: Create a Data Catalogue
Does your group or project have a variety of datasets?
take notes on metadata for the datasets in your groups or within your projects
Try to create a data catalogue and let us know how it goes.
If your colleague leaves the group, do you know how, what, and where the data lives?
Citation
Sandra Ng, Making clinical datasets FAIR (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/fair/tutorials/fair-clinical/tutorial.html Online; accessed Thu Mar 05 2026
Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012