Data Catalogues and Dictionaries

Cat Gonzalez

2026-04-02

Purpose of Data dictionary and Catalogues

  • For sensitive data
    • give an overview of the type of data collected
    • does not give the data itself

What is a Data dictionary?

  • A data dictionary is the metadata of the datasets.
    • Gives an overview of the types of data to be expected. With the possibility to omit sensitive information.

Dictionary Includes

  • data types
  • value range or allowed values
  • any relationship to other data elements

Example of a Dictionary

not applicable meme - Data glossary shows the defined controlled vocabulary

Exercise: Create a Data Dictionary

How would you describe your dataset?

  • take notes on metadata for your dataset max. 3 min
    • Without opening each files on your computer, which metadata variables would you need to understand your data 5 years from now?
  • Your time starts now!

What is a Data Catalogue?

  • Collection of metadata for the individual datasets within a consortia or project
  • Provide context and provenance for each dataset within the group of datasets
  • Search index at a glance for all work done within your group

Example

not applicable meme

Homework: Create a Data Catalogue

Does your group or project have a variety of datasets?

  • take notes on metadata for the datasets in your groups or within your projects
  • Try to create a data catalogue and let us know how it goes.
  • If your colleague leaves the group, do you know how, what, and where the data lives?

Citation

  • Sandra Ng, Making clinical datasets FAIR (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/fair/tutorials/fair-clinical/tutorial.html Online; accessed Thu Mar 05 2026
  • Hiltemann, Saskia, Rasche, Helena et al., 2023 Galaxy Training: A Powerful Framework for Teaching! PLOS Computational Biology 10.1371/journal.pcbi.1010752
  • Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012