Pathogen Portal

About Research Data Management (RDM)

From Data to Discovery: The Complex Path of Scientific Research

Research projects are complex and generate a variety of information types, illustrating the detailed and careful nature required in such work. The process initiates by collecting raw data from experiments or observations and culminates in detailed final reports that elucidate the findings and their significance. Along this journey, raw data is transformed into processed data that is more comprehensible, statistical methods are employed to scrutinize the data, and every aspect of the research—from the tools utilized to ethical approvals and adherence to regulations—is meticulously documented.

Research results may encompass academic papers, presentations, patents, and collaborative agreements, all necessitating careful planning and detailed records. The extensive range of information not only underscores the complexity of research projects but also ensures that the outcomes are verifiable, reusable, and comprehensible to others, thereby contributing to the global body of knowledge.

The importance of Research Data Management

By following good practices in Research Data Management (RDM), you can organize your research findings better. This makes it simpler for yourself and others to use and reuse your data later on. It's generally best to keep research results as open as possible and easy to find, access, use, and reuse, by following a set of guidelines called the FAIR principles.

The Hidden Barrier to AI Medical Breakthroughs: Why Data Management is Key

In a time when AI is expected to greatly improve medicine and boost scientific breakthroughs, it's crucial to understand that effective data management is key. For AI to fully realize its potential and make safe, data-driven decisions, managing data according to FAIR principles is essential. Whether it's human, animal, or environmental health, AI in healthcare and scientific research depends on access to large, well-organized, and structured datasets. AI algorithms require high-quality data to develop accurate and dependable models. Without proper data management, the data might be unreliable or unusable, which could make AI tools less effective and potentially endanger patient safety and the integrity of scientific research.

FAIR RDM

FAIR data management refers to a set of principles that guide researchers and other experts on how to manage data in a way that makes it easy for others to access, understand, and reuse. The term "FAIR" stands for Findable, Accessible, Interoperable, and Reusable. These ideas were first shared in a 2016 article called The FAIR Guiding Principles for scientific data management and stewardship by Wilkinson and others(1). Since then, these principles have become popular in many science fields. To learn more about how to apply these principles, check out the Danish e-Infrastructure Consortium’s page FAIR for Beginners (2) and the GO FAIR initiative's website or read more about the cost of not having FAIR data (3).

Understanding the FAIR Principles

Findable: First, make sure that both people and computers can easily locate the data. This involves giving each set of data a unique and lasting name (a persistent identifier) and providing detailed information (rich metadata) describing the data.
Accessible: After finding the data, it should be easy to understand how to get it. This means you can retrieve the data using its unique name through a standard method that is open and free for everyone. Also, the information about the data should always be available, even if the actual data is not.
Interoperable: Data should work well with other data and be easy to use with different programs or systems for analysis and storage. This requires using a widely accepted and understandable format for organizing the data.
Reusable: The main aim is for the data to be easily used again with little effort. This means having clear rules (data usage licenses) on how the data can be used and providing a detailed history of the data (data provenance) to ensure it can be accurately used or combined in new ways.

Key Points on Handling Pathogen Data

Managing the genomic data of harmful microorganisms can be different from dealing with human or environmental data. Human data is kept private, and environmental data is used for conservation. However, data on harmful microbes is vital for scientific research and medical practice.

This data needs to be easily accessible and compatible with other data for quick public health responses and controlling disease outbreaks. It also has to be carefully recorded and reusable to meet the strict standards of scientific research.

Importance of Pathogen Research Across Various Fields

Studying microbial pathogens is crucial in many medical fields like Clinical Research, Basic Research, Surveillance, and more. Each field aims to improve our understanding and management of diseases.

Clinical Research: Tests new treatments on patients.
Basic Science: Studies the basic properties of germs.
Surveillance: Collects ongoing health data for public health improvements.
Veterinary Research: Focuses on diseases that can transfer from animals to humans.
Plant Health and Biosafety: Ensures the health of plants and ecological balance.

These research areas are part of the One Health concept, linking the health of people, animals, and the environment. Each area has specific data management needs due to different rules and resources.

FAIR Principles for Managing Genomics Data of Disease-Causing Microorganisms

Understanding and managing the genomics data of disease-causing microorganisms is crucial for studying diseases, tracking outbreaks, and developing treatments. However, because this data can be linked to individuals, it must be handled carefully to protect privacy while still supporting scientific progress. The FAIR data management practices offer guidelines to ensure data is useful, accessible, and secure.

Findability in both contexts involves detailed metadata and standardized data indexing. However, pathogenic genomics data often requires integration with epidemiological and clinical data to be fully effective, necessitating more complex metadata schemas compared to those generally used in human genomics.

Accessibility highlights more stark differences; human genomics data which, in itself, is always personal data, is heavily regulated to protect individual privacy, often requiring controlled access environments. In contrast, pathogenic data, which may contain human data depending on the method of sampling and wet lab procedure, is primarily governed by considerations of biosecurity and public health urgency, which can sometimes necessitate more open access to support rapid, global response efforts during outbreaks.

Interoperability in human genomics data benefits from well-established international standards, such as those from the Global Alliance for Genomics and Health (GA4GH). Pathogenic genomics, while also utilizing these standards, must additionally align with public health databases and bioinformatics tools that are specifically designed for infectious disease surveillance and control, demanding a broader, more versatile approach to data integration.

Reusability of data is critical in both fields but comes with different expectations and requirements. Human genomics data reuse must be tightly coupled with consent and ethical considerations, often limiting the scope of future research. Pathogenic genomics data, on the other hand, is primarily reused for broader public health research and policy-making, requiring it to be highly adaptable and easily integrated with diverse data types.

To effectively manage pathogenic microorganisms' genomics data according to FAIR principles, researchers and other experts can utilize a suite of resources provided by ELIXIR and other actors in Norway, which facilitates the application of these principles. You can read more about them here.