AI for Science

Rabbits and Headlights

ICML Deep Learning Workshop Panel 2015

A Provocation

Science is kind of like a rabbit in the headlights of the Deep Learning Machine waiting to be flattened.

Questions

Where do scientific paradigms live?
What role does human understanding and agency play?
What’s an emerging playbook for AI-for-science?

The Structure of Scientific Revolutions

Some prompts

Paradigms: shift in science or in underpinning information infrastructure?
Understanding: what do we still demand (and from whom)?

Conjectures and Refutations

Qualitative vs Quantitative

Questions we care about are often qualitative (meaning, values, lived outcomes).
Questions we can test are often quantitative (metrics, averages, effect sizes).
AI may help bridge this gap by turning language and practice into analysable data.

Coding and Creatives

Go back go ICML 2015 Deep Learning Workshop
Try to imagine a technology with mass impact on coding and creative work.

Crossing fields: recovering the “totality” of science

Struggle whole of science across fields and institutions.
LLM interfaces lower cost of moving across disciplines
Increase the risk of “plausible” error.
Main skill: skepticism

AI-for-science framework: three lenses

Task capabilities: what the model can do (predict, reason, generate, control) (Bommasani et al., 2022; Krenn et al., 2022).
Workflow needs: where value appears (hypothesis, design, analysis, writing, coding) (Arranz et al., 2023; Berman et al., 2024; European Research Council, 2023).
Context constraints: data, compute, latency, validation culture, uncertainty tolerance (Duarte and others, 2018).
See Cranmer et al. (2026) (in review).

The science gap: pattern matching vs mechanism

ML is strong at pattern extraction and function approximation.
Science needs mechanism: why, when, and under what interventions.
So many deployments are still guess-and-verify pipelines.
Key question: when is prediction enough, and when do we require explanation? (Bender et al., 2021; Pearl and Mackenzie, 2018).

Technical priorities for scientific AI

Causality: distinguish correlation from interventionally robust structure (Pearl and Mackenzie, 2018; Schölkopf, 2022).
Abstraction: discover useful intermediate scales and effective theories (Anderson, 1972; Jaynes, 1957).
Simulation: hybrid mechanistic + learned systems with explicit validity regimes (Cranmer et al., 2020; Oreskes et al., 1994).

To Tim

Perhaps: “what is science?” as prediction, explanation, and control.
Perhaps: where control/RL intuitions shift how we evaluate models.
Perhaps: how model classes connect (equations, effective statistical models, ML systems).

What is science for?

Prediction: what will happen?
Explanation: why does it happen?
Control/Design: how do we intervene to get desired outcomes?
Different fields weight these differently; AI changes the balance.

Model types in science

Mechanistic models: equations from domain theory.
Effective/statistical models: coarse-grained abstractions (e.g., stat mech style summaries).
Learned models: ML/DL surrogates and pattern extractors.
Practice is hybrid: compose all three with clear verification boundaries.

First Wave of ML: Prediction

A model family \(f_\theta\) that maps inputs to outputs.
A learning objective that scores how well \(f_\theta\) matches data (and priors/constraints).
Optimisation + scale: we fit \(\theta\) with large compute and large datasets.

Prediction Examples

AlphaFold
GraphCast/Aardvark

Generalist Models

Generalist Examples

ImageNet
BERT
Chat interfaces
Polymathic

Physics foundation models

If the training data are rigorous equations (PDE/ODE solvers), we have a clearer sense of “ground truth.”
These models may not map onto human intuitions — but they can still be scientifically useful.
That makes physics a promising sandbox for a science of AI: what is learned, what generalises, and how do we verify?

Break (15 mins)

Return for delegation, tacit knowledge, and verification boundaries.

The Unreasonable Effectiveness of Orchestration

LLMs emulate human behaviour
Unsurprising that the work better “in collaboration”
Also with rapid access to information infrastructure.

Agent Examples

Denario
Claude Code
Codex

What are we Delegating?

Science as technology
Science as understanding

Judgement Examples

Covid19 Epidemiological Modelling
Does your model account for Facemasks?
Does your model account for Hotel Closures?

We might not have data, but we can do some arithmetic

Intellectual Debt

Technical Debt

Compare with technical debt.
Highlighted by Sculley et al. (2015).

Separation of Concerns

Decompose your complex problem/task into parts.
Each part manageable (e.g. by a small team)
Recompose to solve total problem

Addresses Complex Challenge

Highly successful approach to complex tasks.
Tuned to the human bandwidth limitation.
But the whole system still hard to understand.

Intellectual Debt

Technical debt is the inability to maintain your complex software system.
Intellectual debt is the inability to explain your software system.

Agentic Debt

Agentic AI could pay down technical and intellectual debt.
But it can create agentic debt: delegation without authority/authorship

Agentic Debt

Delegation of workflows without crisp boundaries.
Agentic debt is about unsafe or illegible delegation.

Judgment Layer

Agentic Orchestration compress reading/writing/coding into a single interface.
Tool use turns text into action: search, code execution, lab automation, simulation pipelines.
This shifts the bottleneck to verification and scientific judgement.

What do we mean by “understanding?”

Operational understanding: can I use it safely and know when it fails?
Mechanistic understanding: do I have an interpretable causal/mechanistic story?
Paradigm understanding: can the community reproduce, contest, and extend it?
Social understanding: are the ideas understood in the wider public and other fields?

Institutional tacit knowledge

The “judgement layer” in organisations is often tacit: norms, exceptions, escalation paths, and context that rarely makes it into documentation.
It lives in handoffs and approvals: what gets challenged, what gets waived, and what triggers a halt.
Ceding this tacit knowlege without making it explicit is how we accumulate agentic debt.

The Information Infrastructure

Communication Bandwidth

Human communication: walking pace (2000 bits/minute)
Machine communication: light speed (billions of bits/second)
Our sharing walks, machine sharing …

New Flow of Information

Evolved Relationship

HAM

In Mathematical Context

Tao (IMO 2024): machine assistance in maths

Databases / tables (OEIS): store patterns and prior results.
Solvers (CAS, SAT/SMT): mechanised search and case analysis.
Modern triad: proof assistants, machine learning, large language models.

Repositories of knowledge: verifiable vs tacit

Digitally verifiable: proof assistants (e.g. Lean) and machine-checkable artefacts.
Operationally reliable: code, simulators, pipelines — repeatable, but not always interpretable.
Tacit + contextual: protocols, judgement, and field knowledge (bio/geo/social science).
LLMs can compress and transmit tacit knowledge — but they push the bottleneck to verification boundaries.

Authorship and accountability

We can hold humans to account for judgement
We can’t hold models (directly) to account
- they don’t bear responsibility or liability
Need a named author to sign off

Institutional readiness: five gaps

Governance gap: policy exists, practice often lags (European Commission: Directorate-General for Research and Innovation and Montgomery, 2025; Resnik and Hosseini, 2025).
Research culture gap: weak incentives/careers for interdisciplinary team science (National Academy of Sciences et al., 2005).
Infrastructure gap: compute and models concentrated in few actors (Bommasani et al., 2022; Lawrence and Montgomery, 2024).

Institutional readiness: five gaps

Narrative gap: corporate visibility can eclipse public infrastructure contributions (Cave et al., 2018; Moult et al., 1995).
Coordination gap: funding/governance/data/skills often evolve separately (European Commission: Directorate-General for Research and Innovation and Montgomery, 2025).

Policy playbook: what to build now

Funding for domain-grounded, interdisciplinary AI-for-science work.
Governance for reproducibility, attribution, and safe use (Ball, 2023; Wachter et al., 2024).
Infrastructure: open tooling + shared compute pathways.
Data: trusted access and stewardship.
Talent/skills: train scientists to explain, verify, and challenge (Cranmer et al., 2026; European Commission: Directorate-General for Research and Innovation and Montgomery, 2025).

Developing Science

What do we want trainees to be able to explain, verify, and challenge?
Where is the verification boundary in AI-for-science systems?
Which artefacts are the new paradigm stores?

Over to Tim: agents, discovery, specialist tools

Workshop Questions

Is there an emerging playbook for AI-for-Science?
Are we converging on a standard recipe, accurate but slow simulators, amortized surrogates, differentiable pipelines, and how do we know when these surrogates truly generalize?
Can AI grapple with open-ended discovery?

Workshop Questions

Beyond supervised prediction, to what extent can current or near-future systems generate meaningful hypotheses, concepts, and research directions?
What tools do we have, and what tools are missing?
Which existing AI/ML capabilities are already reshaping scientific practice, and what critical tools (for uncertainty, causality, interpretability, or interfaces) are still absent?

Thanks!

company: Trent AI
book: The Atomic Human
twitter: @lawrennd
The Atomic Human pages Kuhn, Thomas: The Structure of Scientific Revolutions 295–299 , Popper, Karl: Conjectures and Refutations 327,328, intellectual debt 84, 85, 349, 365, separation of concerns 84-85, 103, 109, 199, 284, 371, intellectual debt 84-85, 349, 365, 376, topography, information 34-9, 43-8, 57, 62, 104, 115-16, 127, 140, 192, 196, 199, 291, 334, 354-5, anthropomorphization (‘anthrox’) 30-31, 90-91, 93-4, 100, 132, 148, 153, 163, 216-17, 239, 276, 326, 342, human-analogue machine (HAMs) 343-347, 359-359, 365-368.
newspaper: Guardian Profile Page
blog: http://inverseprobability.com

References

Anderson, P.W., 1972. More is different. Science 177, 393–396. https://doi.org/10.1126/science.177.4047.393

Arranz, D., Bianchini, S., Di Girolamo, V., others, 2023. Trends in the use of AI in science: A bibliometric analysis. Publications Office of the European Union. https://doi.org/10.2777/418191

Ball, P., 2023. Is AI leading to a reproducibility crisis in science? Nature 624, 22–25. https://doi.org/10.1038/d41586-023-03817-6

Bender, E.M., Gebru, T., McMillan-Major, A., Mitchell, M., 2021. On the dangers of stochastic parrots: Can language models be too big?, in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. pp. 610–623. https://doi.org/10.1145/3442188.3445922

Berman, G., Chubb, J., Williams, K., 2024. The use of artificial intelligence in science, technology, engineering, and medicine. The Royal Society.

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S. von, Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J.Q., Demszky, D., Donahue, C., Doumbouya, M., Durmus, E., Ermon, S., Etchemendy, J., Ethayarajh, K., Fei-Fei, L., Finn, C., Gale, T., Gillespie, L., Goel, K., Goodman, N., Grossman, S., Guha, N., Hashimoto, T., Henderson, P., Hewitt, J., Ho, D.E., Hong, J., Hsu, K., Huang, J., Icard, T., Jain, S., Jurafsky, D., Kalluri, P., Karamcheti, S., Keeling, G., Khani, F., Khattab, O., Koh, P.W., Krass, M., Krishna, R., Kuditipudi, R., Kumar, A., Ladhak, F., Lee, M., Lee, T., Leskovec, J., Levent, I., Li, X.L., Li, X., Ma, T., Malik, A., Manning, C.D., Mirchandani, S., Mitchell, E., Munyikwa, Z., Nair, S., Narayan, A., Narayanan, D., Newman, B., Nie, A., Niebles, J.C., Nilforoshan, H., Nyarko, J., Ogut, G., Orr, L., Papadimitriou, I., Park, J.S., Piech, C., Portelance, E., Potts, C., Raghunathan, A., Reich, R., Ren, H., Rong, F., Roohani, Y., Ruiz, C., Ryan, J., Ré, C., Sadigh, D., Sagawa, S., Santhanam, K., Shih, A., Srinivasan, K., Tamkin, A., Taori, R., Thomas, A.W., Tramèr, F., Wang, R.E., Wang, W., Wu, B., Wu, J., Wu, Y., Xie, S.M., Yasunaga, M., You, J., Zaharia, M., Zhang, M., Zhang, T., Zhang, X., Zhang, Y., Zheng, L., Zhou, K., Liang, P., 2022. On the opportunities and risks of foundation models. https://doi.org/10.48550/arXiv.2108.07258

Cave, S., Craig, C., Dihal, K., Dillon, S., Montgomery, J., Singler, B., Taylor, L., 2018. Portrayals and perceptions of AI and why they matter. The Royal Society. https://doi.org/10.17863/CAM.34502

Cranmer, K., Brehmer, J., Louppe, G., 2020. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences 117, 30055–30062. https://doi.org/10.1073/pnas.1912789117

Cranmer, K., Lawrence, N.D., Montgomery, J., Thérien, D., 2026. AI for science: Reframing AI’s role in discovery.

Duarte, J., others, 2018. Fast inference of deep neural networks in FPGAs for particle physics. JINST 13, P07027. https://doi.org/10.1088/1748-0221/13/07/P07027

European Commission: Directorate-General for Research and Innovation, Montgomery, J., 2025. Framework conditions and funding for AI in science – mutual learning exercise on national policies for AI in science – first thematic report. Publications Office of the European Union. https://doi.org/10.2777/7211107

European Research Council, 2023. Use and impact of artificial intelligence in the scientific process. European Research Council. https://doi.org/10.2828/10694

Jaynes, E.T., 1957. Information theory and statistical mechanics. Physical Review 106, 620–630. https://doi.org/10.1103/PhysRev.106.620

Krenn, M., Pollice, R., Guo, S.Y., Aldeghi, M., Cervera-Lierta, A., Friederich, P., Passos Gomes, G. dos, Häse, F., Jinich, A., Nigam, A.K., Yao, Z., Aspuru-Guzik, A., 2022. On scientific understanding with artificial intelligence. Nature Reviews Physics 4, 761–769. https://doi.org/10.1038/s42254-022-00518-3

Kuhn, T.S., 1962. The structure of scientific revolutions. University of Chicago Press, Chicago, IL.

Lawrence, N.D., Montgomery, J., 2024. Accelerating AI for science: Open data science for science. Royal Society Open Science 11, 231130. https://doi.org/10.1098/rsos.231130

Manning, C.D., 2015. Last words: Computational linguistics and deep learning. Computational Linguistics 41, 701–707. https://doi.org/10.1162/COLI_a_00239

Moult, J., Pedersen, J.T., Judson, R., Fidelis, K., 1995. A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Genetics 23, ii–iv. https://doi.org/10.1002/prot.340230303

National Academy of Sciences, National Academy of Engineering, Institute of Medicine, 2005. Facilitating interdisciplinary research. The National Academies Press, Washington, DC. https://doi.org/10.17226/11153

Oreskes, N., Shrader-Frechette, K., Belitz, K., 1994. Verification, validation, and confirmation of numerical models in the earth sciences. Science 263, 641–646. https://doi.org/10.1126/science.263.5147.641

Pearl, J., Mackenzie, D., 2018. The book of why: The new science of cause and effect. Penguin Books.

Resnik, D.B., Hosseini, M., 2025. The ethics of using artificial intelligence in scientific research: New guidance needed for a new tool. AI Ethics 5, 1499–1521. https://doi.org/10.1007/s43681-024-00493-8

Schölkopf, B., 2022. Causality for machine learning, in: Geffner, H., Dechter, R., Halpern, J.Y. (Eds.), Probabilistic and Causal Inference: The Works of Judea Pearl. ACM, pp. 765–804.

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., Dennison, D., 2015. Hidden technical debt in machine learning systems, in: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 2503–2511.

Wachter, S., Mittelstadt, B., Russell, C., 2024. Do large language models have a legal duty to tell the truth? Royal Society Open Science 11, 240197. https://doi.org/10.1098/rsos.240197