AI for Science

Paradigms, tacit knowledge, and scientific agency in the age of large models

Neil D. Lawrence

Timothy Lillicrap

Bellairs Workshop on AI for Science

Rabbits and Headlights

ICML Deep Learning Workshop Panel 2015

A Provocation

Science is kind of like a rabbit in the headlights of the Deep Learning Machine waiting to be flattened.

Questions

  • Where do scientific paradigms live?
  • What role does human understanding and agency play?
  • What’s an emerging playbook for AI-for-science?

The Structure of Scientific Revolutions

Some prompts

  • Paradigms: shift in science or in underpinning information infrastructure?
  • Understanding: what do we still demand (and from whom)?

Conjectures and Refutations

Qualitative vs Quantitative

Qualitative vs Quantitative

  • Questions we care about are often qualitative (meaning, values, lived outcomes).
  • Questions we can test are often quantitative (metrics, averages, effect sizes).
  • AI may help bridge this gap by turning language and practice into analysable data.

Coding and Creatives

Coding and Creatives

  • Go back go ICML 2015 Deep Learning Workshop
  • Try to imagine a technology with mass impact on coding and creative work.

Crossing fields: recovering the “totality” of science

  • Struggle whole of science across fields and institutions.
  • LLM interfaces lower cost of moving across disciplines
  • Increase the risk of “plausible” error.
  • Main skill: skepticism

AI-for-science framework: three lenses

  • Task capabilities: what the model can do (predict, reason, generate, control) (Bommasani et al., 2022; Krenn et al., 2022).
  • Workflow needs: where value appears (hypothesis, design, analysis, writing, coding) (Arranz et al., 2023; Berman et al., 2024; European Research Council, 2023).
  • Context constraints: data, compute, latency, validation culture, uncertainty tolerance (Duarte and others, 2018).
  • See Cranmer et al. (2026) (in review).

The science gap: pattern matching vs mechanism

  • ML is strong at pattern extraction and function approximation.
  • Science needs mechanism: why, when, and under what interventions.
  • So many deployments are still guess-and-verify pipelines.
  • Key question: when is prediction enough, and when do we require explanation? (Bender et al., 2021; Pearl and Mackenzie, 2018).

Technical priorities for scientific AI

  • Causality: distinguish correlation from interventionally robust structure (Pearl and Mackenzie, 2018; Schölkopf, 2022).
  • Abstraction: discover useful intermediate scales and effective theories (Anderson, 1972; Jaynes, 1957).
  • Simulation: hybrid mechanistic + learned systems with explicit validity regimes (Cranmer et al., 2020; Oreskes et al., 1994).

To Tim

To Tim

  • Perhaps: “what is science?” as prediction, explanation, and control.
  • Perhaps: where control/RL intuitions shift how we evaluate models.
  • Perhaps: how model classes connect (equations, effective statistical models, ML systems).

What is science for?

  • Prediction: what will happen?
  • Explanation: why does it happen?
  • Control/Design: how do we intervene to get desired outcomes?
  • Different fields weight these differently; AI changes the balance.

Model types in science

  • Mechanistic models: equations from domain theory.
  • Effective/statistical models: coarse-grained abstractions (e.g., stat mech style summaries).
  • Learned models: ML/DL surrogates and pattern extractors.
  • Practice is hybrid: compose all three with clear verification boundaries.

First Wave of ML: Prediction

  • A model family \(f_\theta\) that maps inputs to outputs.
  • A learning objective that scores how well \(f_\theta\) matches data (and priors/constraints).
  • Optimisation + scale: we fit \(\theta\) with large compute and large datasets.

Prediction Examples

  • AlphaFold
  • GraphCast/Aardvark

Generalist Models

Generalist Examples

  • ImageNet
  • BERT
  • Chat interfaces
  • Polymathic

Physics foundation models

  • If the training data are rigorous equations (PDE/ODE solvers), we have a clearer sense of “ground truth.”
  • These models may not map onto human intuitions — but they can still be scientifically useful.
  • That makes physics a promising sandbox for a science of AI: what is learned, what generalises, and how do we verify?

Break (15 mins)

  • Return for delegation, tacit knowledge, and verification boundaries.

The Unreasonable Effectiveness of Orchestration

  • LLMs emulate human behaviour
  • Unsurprising that the work better “in collaboration”
  • Also with rapid access to information infrastructure.

Agent Examples

  • Denario
  • Claude Code
  • Codex

What are we Delegating?

  • Science as technology
  • Science as understanding

Judgement Examples

  • Covid19 Epidemiological Modelling
  • Does your model account for Facemasks?
  • Does your model account for Hotel Closures?

We might not have data, but we can do some arithmetic

Intellectual Debt

Technical Debt

  • Compare with technical debt.
  • Highlighted by Sculley et al. (2015).

Separation of Concerns

  • Decompose your complex problem/task into parts.
  • Each part manageable (e.g. by a small team)
  • Recompose to solve total problem

Addresses Complex Challenge

  • Highly successful approach to complex tasks.
  • Tuned to the human bandwidth limitation.
  • But the whole system still hard to understand.

Intellectual Debt

  • Technical debt is the inability to maintain your complex software system.
  • Intellectual debt is the inability to explain your software system.

Agentic Debt

  • Agentic AI could pay down technical and intellectual debt.
  • But it can create agentic debt: delegation without authority/authorship

Agentic Debt

  • Delegation of workflows without crisp boundaries.
  • Agentic debt is about unsafe or illegible delegation.

Judgment Layer

  • Agentic Orchestration compress reading/writing/coding into a single interface.
  • Tool use turns text into action: search, code execution, lab automation, simulation pipelines.
  • This shifts the bottleneck to verification and scientific judgement.

What do we mean by “understanding?”

  • Operational understanding: can I use it safely and know when it fails?
  • Mechanistic understanding: do I have an interpretable causal/mechanistic story?
  • Paradigm understanding: can the community reproduce, contest, and extend it?
  • Social understanding: are the ideas understood in the wider public and other fields?

Institutional tacit knowledge

  • The “judgement layer” in organisations is often tacit: norms, exceptions, escalation paths, and context that rarely makes it into documentation.
  • It lives in handoffs and approvals: what gets challenged, what gets waived, and what triggers a halt.
  • Ceding this tacit knowlege without making it explicit is how we accumulate agentic debt.

The Information Infrastructure

Communication Bandwidth

  • Human communication: walking pace (2000 bits/minute)
  • Machine communication: light speed (billions of bits/second)
  • Our sharing walks, machine sharing …

New Flow of Information

New Flow of Information

Evolved Relationship

Evolved Relationship

HAM

HAM

In Mathematical Context

Tao (IMO 2024): machine assistance in maths

  • Databases / tables (OEIS): store patterns and prior results.
  • Solvers (CAS, SAT/SMT): mechanised search and case analysis.
  • Modern triad: proof assistants, machine learning, large language models.

Repositories of knowledge: verifiable vs tacit

  • Digitally verifiable: proof assistants (e.g. Lean) and machine-checkable artefacts.
  • Operationally reliable: code, simulators, pipelines — repeatable, but not always interpretable.
  • Tacit + contextual: protocols, judgement, and field knowledge (bio/geo/social science).
  • LLMs can compress and transmit tacit knowledge — but they push the bottleneck to verification boundaries.

Authorship and accountability

  • We can hold humans to account for judgement
  • We can’t hold models (directly) to account
    • they don’t bear responsibility or liability
  • Need a named author to sign off

Institutional readiness: five gaps

  • Governance gap: policy exists, practice often lags (European Commission: Directorate-General for Research and Innovation and Montgomery, 2025; Resnik and Hosseini, 2025).
  • Research culture gap: weak incentives/careers for interdisciplinary team science (National Academy of Sciences et al., 2005).
  • Infrastructure gap: compute and models concentrated in few actors (Bommasani et al., 2022; Lawrence and Montgomery, 2024).

Institutional readiness: five gaps

  • Narrative gap: corporate visibility can eclipse public infrastructure contributions (Cave et al., 2018; Moult et al., 1995).
  • Coordination gap: funding/governance/data/skills often evolve separately (European Commission: Directorate-General for Research and Innovation and Montgomery, 2025).

Policy playbook: what to build now

  • Funding for domain-grounded, interdisciplinary AI-for-science work.
  • Governance for reproducibility, attribution, and safe use (Ball, 2023; Wachter et al., 2024).
  • Infrastructure: open tooling + shared compute pathways.
  • Data: trusted access and stewardship.
  • Talent/skills: train scientists to explain, verify, and challenge (Cranmer et al., 2026; European Commission: Directorate-General for Research and Innovation and Montgomery, 2025).

Developing Science

  • What do we want trainees to be able to explain, verify, and challenge?
  • Where is the verification boundary in AI-for-science systems?
  • Which artefacts are the new paradigm stores?

Over to Tim: agents, discovery, specialist tools

Workshop Questions

  1. Is there an emerging playbook for AI-for-Science?
  2. Are we converging on a standard recipe, accurate but slow simulators, amortized surrogates, differentiable pipelines, and how do we know when these surrogates truly generalize?
  3. Can AI grapple with open-ended discovery?

Workshop Questions

  1. Beyond supervised prediction, to what extent can current or near-future systems generate meaningful hypotheses, concepts, and research directions?
  2. What tools do we have, and what tools are missing?
  3. Which existing AI/ML capabilities are already reshaping scientific practice, and what critical tools (for uncertainty, causality, interpretability, or interfaces) are still absent?

Thanks!

  • company: Trent AI
  • book: The Atomic Human
  • twitter: @lawrennd
  • The Atomic Human pages Kuhn, Thomas: The Structure of Scientific Revolutions 295–299 , Popper, Karl: Conjectures and Refutations 327,328, intellectual debt 84, 85, 349, 365, separation of concerns 84-85, 103, 109, 199, 284, 371, intellectual debt 84-85, 349, 365, 376, topography, information 34-9, 43-8, 57, 62, 104, 115-16, 127, 140, 192, 196, 199, 291, 334, 354-5, anthropomorphization (‘anthrox’) 30-31, 90-91, 93-4, 100, 132, 148, 153, 163, 216-17, 239, 276, 326, 342, human-analogue machine (HAMs) 343-347, 359-359, 365-368.
  • newspaper: Guardian Profile Page
  • blog: http://inverseprobability.com

References

Anderson, P.W., 1972. More is different. Science 177, 393–396. https://doi.org/10.1126/science.177.4047.393
Arranz, D., Bianchini, S., Di Girolamo, V., others, 2023. Trends in the use of AI in science: A bibliometric analysis. Publications Office of the European Union. https://doi.org/10.2777/418191
Ball, P., 2023. Is AI leading to a reproducibility crisis in science? Nature 624, 22–25. https://doi.org/10.1038/d41586-023-03817-6
Bender, E.M., Gebru, T., McMillan-Major, A., Mitchell, M., 2021. On the dangers of stochastic parrots: Can language models be too big?, in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. pp. 610–623. https://doi.org/10.1145/3442188.3445922
Berman, G., Chubb, J., Williams, K., 2024. The use of artificial intelligence in science, technology, engineering, and medicine. The Royal Society.
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S. von, Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J.Q., Demszky, D., Donahue, C., Doumbouya, M., Durmus, E., Ermon, S., Etchemendy, J., Ethayarajh, K., Fei-Fei, L., Finn, C., Gale, T., Gillespie, L., Goel, K., Goodman, N., Grossman, S., Guha, N., Hashimoto, T., Henderson, P., Hewitt, J., Ho, D.E., Hong, J., Hsu, K., Huang, J., Icard, T., Jain, S., Jurafsky, D., Kalluri, P., Karamcheti, S., Keeling, G., Khani, F., Khattab, O., Koh, P.W., Krass, M., Krishna, R., Kuditipudi, R., Kumar, A., Ladhak, F., Lee, M., Lee, T., Leskovec, J., Levent, I., Li, X.L., Li, X., Ma, T., Malik, A., Manning, C.D., Mirchandani, S., Mitchell, E., Munyikwa, Z., Nair, S., Narayan, A., Narayanan, D., Newman, B., Nie, A., Niebles, J.C., Nilforoshan, H., Nyarko, J., Ogut, G., Orr, L., Papadimitriou, I., Park, J.S., Piech, C., Portelance, E., Potts, C., Raghunathan, A., Reich, R., Ren, H., Rong, F., Roohani, Y., Ruiz, C., Ryan, J., Ré, C., Sadigh, D., Sagawa, S., Santhanam, K., Shih, A., Srinivasan, K., Tamkin, A., Taori, R., Thomas, A.W., Tramèr, F., Wang, R.E., Wang, W., Wu, B., Wu, J., Wu, Y., Xie, S.M., Yasunaga, M., You, J., Zaharia, M., Zhang, M., Zhang, T., Zhang, X., Zhang, Y., Zheng, L., Zhou, K., Liang, P., 2022. On the opportunities and risks of foundation models. https://doi.org/10.48550/arXiv.2108.07258
Cave, S., Craig, C., Dihal, K., Dillon, S., Montgomery, J., Singler, B., Taylor, L., 2018. Portrayals and perceptions of AI and why they matter. The Royal Society. https://doi.org/10.17863/CAM.34502
Cranmer, K., Brehmer, J., Louppe, G., 2020. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences 117, 30055–30062. https://doi.org/10.1073/pnas.1912789117
Cranmer, K., Lawrence, N.D., Montgomery, J., Thérien, D., 2026. AI for science: Reframing AI’s role in discovery.
Duarte, J., others, 2018. Fast inference of deep neural networks in FPGAs for particle physics. JINST 13, P07027. https://doi.org/10.1088/1748-0221/13/07/P07027
European Commission: Directorate-General for Research and Innovation, Montgomery, J., 2025. Framework conditions and funding for AI in science – mutual learning exercise on national policies for AI in science – first thematic report. Publications Office of the European Union. https://doi.org/10.2777/7211107
European Research Council, 2023. Use and impact of artificial intelligence in the scientific process. European Research Council. https://doi.org/10.2828/10694
Jaynes, E.T., 1957. Information theory and statistical mechanics. Physical Review 106, 620–630. https://doi.org/10.1103/PhysRev.106.620
Krenn, M., Pollice, R., Guo, S.Y., Aldeghi, M., Cervera-Lierta, A., Friederich, P., Passos Gomes, G. dos, Häse, F., Jinich, A., Nigam, A.K., Yao, Z., Aspuru-Guzik, A., 2022. On scientific understanding with artificial intelligence. Nature Reviews Physics 4, 761–769. https://doi.org/10.1038/s42254-022-00518-3
Kuhn, T.S., 1962. The structure of scientific revolutions. University of Chicago Press, Chicago, IL.
Lawrence, N.D., Montgomery, J., 2024. Accelerating AI for science: Open data science for science. Royal Society Open Science 11, 231130. https://doi.org/10.1098/rsos.231130
Manning, C.D., 2015. Last words: Computational linguistics and deep learning. Computational Linguistics 41, 701–707. https://doi.org/10.1162/COLI_a_00239
Moult, J., Pedersen, J.T., Judson, R., Fidelis, K., 1995. A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Genetics 23, ii–iv. https://doi.org/10.1002/prot.340230303
National Academy of Sciences, National Academy of Engineering, Institute of Medicine, 2005. Facilitating interdisciplinary research. The National Academies Press, Washington, DC. https://doi.org/10.17226/11153
Oreskes, N., Shrader-Frechette, K., Belitz, K., 1994. Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641–646. https://doi.org/10.1126/science.263.5147.641
Pearl, J., Mackenzie, D., 2018. The book of why: The new science of cause and effect. Penguin Books.
Resnik, D.B., Hosseini, M., 2025. The ethics of using artificial intelligence in scientific research: New guidance needed for a new tool. AI Ethics 5, 1499–1521. https://doi.org/10.1007/s43681-024-00493-8
Schölkopf, B., 2022. Causality for machine learning, in: Geffner, H., Dechter, R., Halpern, J.Y. (Eds.), Probabilistic and Causal Inference: The Works of Judea Pearl. ACM, pp. 765–804.
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., Dennison, D., 2015. Hidden technical debt in machine learning systems, in: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 2503–2511.
Wachter, S., Mittelstadt, B., Russell, C., 2024. Do large language models have a legal duty to tell the truth? Royal Society Open Science 11, 240197. https://doi.org/10.1098/rsos.240197