Open Science in the AI Era: Building the Infrastructure We Need
J. Francisco Avilés
AI Research Lab
The AI revolution is here, but our scientific infrastructure is stuck in the past. We are training models with billions of parameters, yet we are still publishing our findings in static PDFs that are difficult for machines to parse and even harder to reproduce.
The Reproducibility Crisis Meets AI
Machine learning has the potential to exacerbate the reproducibility crisis in science. If a model’s training data, hyperparameters, and evaluation metrics are locked away in proprietary formats or unstructured text, peer review becomes nearly impossible. We cannot allow algorithms to become “black boxes” in the scientific process.
Rethinking the Scientific Document
This is why we need to rethink how we publish science. A modern scientific document should not just be a visual representation of text and figures; it should be a computable artifact. Formats like SciMD (Scientific Markdown) are designed to be human-readable and machine-actionable. By embedding metadata, data links, and executable code blocks directly into the document, we can create a seamless bridge between the research and the publication.
Open Data, Open Models
Open science is more than just open access publishing. It means open datasets, open-source models, and transparent methodologies. As we integrate AI more deeply into the scientific workflow, we must ensure that the infrastructure we build promotes transparency and collaboration. The future of science depends on it.