The Impact of Research Data Infrastructures: The Case of the AlphaFold Database

Authors

  • Angelo Kenneth Romasanta Esade Business School, Ramon Llull University, Barcelona, Spain
  • Jonathan Wareham Esade Business School, Ramon Llull University, Barcelona, Spain
  • Laia Pujol Priego Esade Business School, Ramon Llull University, Barcelona, Spain

DOI:

https://doi.org/10.23726/cij.2025.1597

Keywords:

bibliometrics, research infrastructure, data, impact assessment

Abstract

While the scientific output of research infrastructures is well documented, the broader effects of their secondary outputs, such as computational resources and datasets, remain poorly understood. To better understand the benefits of these public resources, this study explores the AlphaFold (AFDB) database, a collaboration between DeepMind and the European Molecular Biology Laboratory (EMBL) that democratizes access to protein structure data. Employing a quantitative case study strategy using bibliometric analysis, this study compares publications indexed in the Web of Science Core Collection citing the original AF paper (Jumper et al., 2021) with those citing the AlphaFold database (Varadi et al., 2022), covering publications up to August 2024. We examine the impact of the EMBL AlphaFold database on research themes, collaboration patterns, and scientific impact. Our exploratory analysis identifies several impacts: studies leveraging the AF database investigate application-focused themes and require collaboration between fewer institutions. This research highlights the wide-ranging impacts of research infrastructures, emphasizing the need for comprehensive impact assessments to inform future research policy and funding decisions.

References

Autio, E., Hameri, A. P., & Vuola, O. (2004). A framework of industrial knowledge spillovers in big-science centers. Research Policy, 33(1), 107-126.

Beagrie, N., & Houghton, J. (2021). The value and impact of EMBL-EBI managed data resources. European Bioinformatics Institute (EMBL-EBI). https://www.embl.org/documents/document/embl-ebi-impact-report-2021

Beck, S., Bergenholtz, C., Bogers, M., Brasseur, T. M., Conradsen, M. L., Di Marco, D., ... & Xu, S. M. (2022). The Open Innovation in Science research field: a collaborative conceptualisation approach. Industry and Innovation, 29(2), 136-185.

Beck, S., Bercovitz, J., Bergenholtz, C., Brasseur, T., D’Este, P., Dorn, A., ... & Zyontz, S. (2021). Experimenting with Open Innovation in Science (OIS) practices: A novel approach to co-developing research proposals. CERN IdeaSquare Journal of Experimental Innovation, 5(2), 28-49.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

D’ippolito, B., & Rüling, C. C. (2019). Research collaboration in Large Scale Research Infrastructures: Collaboration types and policy implications. Research Policy, 48(5), 1282-1296.

Fabre, R., Egret, D., Schöpfel, J., & Azeroual, O. (2021). Evaluating the scientific impact of research infrastructures: The role of current research information systems. Quantitative Science Studies, 2(1), 42-64.

Florio, M., & Sirtori, E. (2016). Social benefits and costs of large scale research infrastructures. Technological Forecasting and Social Change, 112, 65-78.

Heidler, R., & Hallonsten, O. (2015). Qualifying the performance evaluation of Big Science beyond productivity, impact and costs. Scientometrics, 104, 295-312.

Gold, E. R. (2021). The fall of the innovation empire and its possible rise through open science. Research Policy, 50(5), 104226.

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.

Mayernik, M. S., Hart, D. L., Maull, K. E., & Weber, N. M. (2017). Assessing and tracing the outcomes and impact of research infrastructures. Journal of the Association for Information Science and Technology, 68(6), 1341-1359.

Pujol Priego, L., Wareham, J., & Romasanta, A. K. S. (2022). The puzzle of sharing scientific data. Industry and Innovation, 29(2), 219-250.

Pujol Priego, L., & Wareham, J. (2023). From bits to atoms: Open source hardware at CERN. MIS Quarterly, 47(2), 639-668.

Pujol Priego, L. & Wareham, J. (2024). Data Commoning in the Life Sciences. MIS Quarterly, 48(2) 491-520.

Reed, M. S., Ferré, M., Martin-Ortega, J., Blanche, R., Lawford-Rolfe, R., Dallimer, M., & Holden, J. (2021). Evaluating impact from research: A methodological framework. Research Policy, 50(4), 104147.

Romasanta, A., Ahmadova, G., & Wareham, J. (2022). From potential to realized impacts: the bridging role of digital infrastructures in fair data. European Conference on Information Systems.

Scarrà, D., & Piccaluga, A. (2022). The impact of technology transfer and knowledge spillover from Big Science: a literature review. Technovation, 116, 102165.

Van Eck, N., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.

Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., ... & Velankar, S. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 50(D1), D439-D444.

Vicente-Saez, R., & Martinez-Fuentes, C. (2018). Open Science now: A systematic literature review for an integrated definition. Journal of business research, 88, 428-436.

Wareham, J., Priego, L. P., Romasanta, A. K., Mathiassen, T. W., Nordberg, M., & Tello, P. G. (2022). Systematizing serendipity for big science infrastructures: The ATTRACT project. Technovation, 116, 102374.

Downloads

Published

2025-05-26

How to Cite

Romasanta, A. K., Wareham, J., & Pujol Priego, L. (2025). The Impact of Research Data Infrastructures: The Case of the AlphaFold Database. CERN IdeaSquare Journal of Experimental Innovation, 9(1), 42–48. https://doi.org/10.23726/cij.2025.1597

Issue

Section

ATTRACT Socio Economic Studies

Categories