Overview

The TigerData team at Princeton University tasked us with addressing the challenges and inefficiencies surrounding research data management within the university's research ecosystem.

Although the TigerData team has already evaluated the necessity for a centralized, organized data management system, our research findings further underscore the importance of this endeavor and offer additional valuable insights on how to address the overarching issues in this problem space.

Four Key Insights

By improving the scalability and usability of a data storage platform, TigerData can effectively handle the demands of researchers working with large datasets.

Promoting standardized data organization practices across the university will streamline collaboration and information sharing.

Enhancing the user experience by developing a user-friendly interface will make this powerful resource more accessible to researchers with varying levels of technical expertise.

Incentivizing adoption of TigerData over mandating its use will encourage broader participation.

Diving deeper into the problem space...

"Even though I see the value in using GitHub, I gravitate towards using WhatsApp and Google Suite."
An Interview Participant
People are creatures of habit. Even when clearly presented with a better option, researchers are likely to resort to their original methods. Our survey of 68 researchers with diverse backgrounds reveals that researchers tend to utilize the most convenient tools for managing their data.
Currently, institutional clusters (which can host large amounts of data) can only be accessed through command line prompts.
However, from our research, command line tools often have a steep learning curve, requiring users to memorize commands and understand complex syntax. For researchers with limited technical expertise, this can be daunting and time-consuming.
"I would prefer if there was a GUI for everything data-related. It would make my life so much easier."
Another Interview Participant
According to a recent cost-benefit analysis done by the European Union, decentralized research processes and a lack of FAIR principles across government institutions and universities cost the European economy about...
€10.2 billion
every year
We found that the cost for research was computed by collapsing across 7 cost parameters. As a team, we were specifically interested in looking at time and storage inefficiencies. We discovered that concerning time, approximately 31.52% of the time spent finding data could be saved with an appropriate application of FAIR principles and implementing quality metadata. Additionally, we found that on average, large datasets are operated through roughly 5 repositories apart from backups, which is highly inefficient.
"People don’t follow good Data Management practices and they panic at the time of publishing. They need to practice these protocols from the very beginning."
A Librarian at CMU
There are several guidelines that researchers need to comply with to publish papers under grants and funding from different organizations. Understanding the nuances of what is required can be a challenge.
Increasingly, there are federal and state mandates that urge institutions to implement open science and open access policies (e.g., OSTP’s official memorandum, NIH publishing criteria, etc.).