AI in Chemitry

Research Project

Chemistry is entering a new stage, where discovery is driven not only by experiments, but also by algorithms able to extract patterns from spectra, microscopy images, reaction data, and molecular structures. Our AI in Chemistry project explores this transition in a practical way: we develop methods that help chemists discover reactions faster, read complex analytical data more deeply, understand catalysts at unprecedented resolution, and connect molecular structure with function, toxicity, and material behavior. The goal is not to replace the chemist, but to build powerful digital tools that extend chemical intuition and make advanced analysis available to more researchers and students.

Why this project matters

For students, this project shows that chemistry is becoming a field where experiments, coding, data science, and mechanism can work together in one research program. For researchers, it demonstrates that AI is most powerful when it is tightly connected to real chemical questions: discovering overlooked reactions, explaining catalyst behavior, decoding spectra, reading complex images, and making chemistry more informative, reproducible, and sustainable.

The broader vision is that digital chemistry is not a side topic. It is becoming a practical research methodology for the next generation of chemical science.

From data overload to reaction discovery

One of the central ideas of the project is simple but powerful: chemistry already contains vast amounts of hidden knowledge in experimental data that were recorded but never fully interpreted. We develop machine-learning tools that help transform this “sleeping data" into new chemical insight.

Our recent study introduced a digital co-expert for reaction discovery. Instead of screening thousands of possibilities manually, the workflow generates candidate reactions, filters them computationally, clusters them with unsupervised machine learning, and then passes a focused set to expert evaluation. In this way, the most time-consuming stage of discovery was reduced by about 180-fold, from more than 1200 days of expert screening to about 7 days, leading to experimentally confirmed new cycloaddition reactions ( 10.1002/anie.202523905).

A complementary direction appeared in Nature Communications, where we showed that high-resolution mass spectrometry archives can be turned into a search space for discovery. Instead of running new experiments first, our machine-learning-powered engine searches tera-scale HRMS data to detect unknown reaction products and overlooked pathways. This work introduced the concept of “experimentation in the past" : using already existing experimental data as a discovery platform for new chemistry ( 10.1038/s41467-025-56905-8).

Digital co-expert workflow for reaction discovery
Workflow of AI-assisted reaction discovery and prioritization. link

To evaluate measured/analyzed data amounts in chemitry we have carried out a dedicted study, which revelad about 90% of lost data in exeperimetal chemitry, in terms of data being recorded but not being published in peer-reviewed literature ( 10.3390/chemistry7050160).

Understanding catalysis with AI: from particles to atoms, from 3D to 4D

Catalysis is one of the most challenging areas for data-driven chemistry because real catalytic systems are dynamic, heterogeneous, and difficult to observe in full. A major part of our project is devoted to building AI-assisted approaches that make catalyst behavior more explicit and more measurable.

Our recent project established the concept of Totally Defined Nanocatalysis. By combining nanomanipulation, electron microscopy, and neural-network analysis, we characterized individual Pd/C catalyst particles rather than treating the catalyst as a vague average. This revealed extraordinarily high performance hidden at the single-particle level, with turnover numbers reaching the order of 109 (10.1021/jacs.2c01283).

The next step, developed the idea into 4D catalysis: not only locating catalytic centers in space, but tracking how they change over time. This work showed that in Pd/C cross-coupling systems, monoatomic palladium centers — although representing only a small fraction of the total metal — can account for the overwhelming majority of catalytic activity. The study linked catalysis to dynamic transformations between nanoparticles, clusters, and single atoms, supported by AI-based image analysis ( 10.1021/jacs.3c00645).

These ideas were further systematized in a viewpoint on 4D catalysis, which explains why following the same catalyst region before and after reaction is essential, and how machine learning enables researchers to detect subtle structural changes that manual analysis would miss ( 10.1021/acscatal.3c03889). Together, these studies move catalysis away from static snapshots and toward dynamic, data-rich mechanistic understanding.

4D catalysis concept across spatial scales
Multiscale view of catalyst evolution in space and time. link

AI for analytical chemistry and spectroscopy

Modern chemistry produces more analytical data than any researcher can interpret manually. We therefore develop tools that make complex spectra computationally readable.

In 2022 we presented MEDUSA, a framework for fully automated unconstrained analysis of high-resolution mass spectrometry data. The method combines gradient-boosted decision trees and neural networks to reduce spectral complexity and infer molecular formulas from fine isotopic structure. The broader message is important: instead of using AI only for classification, it can be used to solve inverse analytical problems that were long considered too difficult for routine practice ( 10.1021/jacs.2c03631).

Another branch of this effort was developed for NMR spectroscopy, where machine learning was applied to 195Pt NMR prediction. The workflow links semiempirical modeling with ML to estimate chemical shifts for water-soluble platinum complexes. This is especially important for catalysis and medicinal chemistry, where rapid interpretation of metal-centered spectra can accelerate mechanism studies and compound design ( 10.1002/cphc.202200940).

From images to molecular identity

A distinctive feature of the project is the use of AI not only on tables and spectra, but also on chemical images.

In 2024, we showed that deep learning can recognize the molecular identity of closely related phosphonium salts from the visual appearance of the material itself. This is a striking step toward connecting molecular structure with micro- and nanomorphology, and it suggests that microscopy can become an information-rich analytical source rather than only a descriptive technique ( 10.1002/smll.202403423).

Earlier, we developed AI pipelines for real-time electron microscopy video analysis of ionic liquid/water systems. These works showed how neural networks can quantify dynamic microphase behavior in soft matter and analyze video streams that would be impractical to process manually ( 10.1002/smll.202007726; 10.1016/j.molliq.2023.121407).

Related studies used explainable AI and deep learning to interpret nanoparticle ordering and identify hidden defects in carbon materials — an approach relevant to catalysis, electronics, and materials diagnostics ( 10.1039/D0SC05696K; 10.1039/d4nr00952e).

Recognition of molecular structure from microscopy images
Visual-to-structure pipeline linking microscopy images with molecular recognition. link

Biology, safety, and sustainable chemistry

The AI in Chemistry project also expands into biological imaging, toxicology, and greener chemical design.

Our 2025 article described developed deep generative models for creating synthetic annotated biofilm images, making it possible to generate training data for segmentation and detection models even when manually labeled data are scarce. This is especially valuable for automated microscopy and biofilm analysis ( 10.1038/s41522-025-00647-4).

For a general digital biology framework, we combined automated SEM with deep learning for macroscale biofilm studies, showing how AI can quantify growth, cell coverage, and biocide effects much faster than manual analysis ( 10.1039/D3DD00048F).

For safer chemistry, we created data resources and online tools such as ILToxDB and Build-a-Bio-Strip, which help organize cytotoxicity data and evaluate the toxicity contribution of reaction components. These efforts connect AI, databases, and practical decision-making in sustainable synthesis ( 10.1021/acs.estlett.5c00860; 10.1021/acs.jcim.4c01381; 10.1038/s41597-024-04190-3).

AI-generated annotated biofilm images and downstream analysis
Synthetic annotated biofilm images for training segmentation and detection models. link


Selected publications

Development and methodology

  1. Reaction Discovery Involving Digital co-Expert with a Practical Application in Atom-Economic CycloadditionAngewandte Chemie International Edition (2026). DOI: 10.1002/anie.202523905
  2. Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry dataNature Communications (2025). DOI: 10.1038/s41467-025-56905-8
  3. Recognition of Molecular Structure of Phosphonium Salts from the Visual Appearance of Material with Deep Learning Can Reveal Subtle HomologsSmall (2024). DOI: 10.1002/smll.202403423
  4. Time-Resolved Formation and Operation Maps of Pd Catalysts Suggest a Key Role of Single Atom Centers in Cross-CouplingJournal of the American Chemical Society (2023). DOI: 10.1021/jacs.3c00645
  5. Toward Totally Defined Nanocatalysis: Deep Learning Reveals the Extraordinary Activity of Single Pd/C ParticlesJournal of the American Chemical Society (2022). DOI: 10.1021/jacs.2c01283
  6. Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine LearningJournal of the American Chemical Society (2022). DOI: 10.1021/jacs.2c03631
  7. Neural Network Analysis of Electron Microscopy Video Data Reveals the Temperature-Driven Microphase Dynamics in the Ions/Water SystemSmall (2021). DOI: 10.1002/smll.202007726

Applications

  1. Digitization of molecular complexity with machine learningChemical Science (2025). DOI: 10.1039/D4SC07320G
  2. Deep generative modeling of annotated bacterial biofilm imagesnpj Biofilms and Microbiomes (2025). DOI: 10.1038/s41522-025-00647-4
  3. ILToxDB: A Database on Cytotoxicity of Ionic LiquidsEnvironmental Science & Technology Letters (2025). DOI: 10.1021/acs.estlett.5c00860
  4. Totally defined catalysis as a new paradigm for describing catalytic systems by combining advanced analytics and machine learning techniquesAdvances in Organometallic Chemistry (2025). DOI: 10.1016/bs.adomc.2025.09.005
  5. Lost Data in Electron MicroscopyChemistry (2025). DOI: 10.3390/chemistry7050160
  6. 4D Catalysis Concept Enabled by Multilevel Data Collection and Machine Learning AnalysisACS Catalysis (2024). DOI: 10.1021/acscatal.3c03889
  7. Determining the orderliness of carbon materials with nanoparticle imaging and explainable machine learningNanoscale (2024). DOI: 10.1039/d4nr00952e
  8. A comprehensive dataset on cytotoxicity of ionic liquidsScientific Data (2024). DOI: 10.1038/s41597-024-04190-3
  9. Build-a-Bio-Strip: An Online Platform for Rapid Toxicity Assessment in Chemical SynthesisJournal of Chemical Information and Modeling (2024). DOI: 10.1021/acs.jcim.4c01381
  10. Boosting the generality of catalytic systems by the synergetic ligand effect in Pd-catalyzed C-N cross-couplingJournal of Catalysis (2024). DOI: 10.1016/j.jcat.2023.115240
  11. Cross-Disciplinary Glucose Biosensors: An ORMOSIL/Enzyme Material for Enhanced DetectionACS Applied Polymer Materials (2024). DOI: 10.1021/acsapm.4c01394
  12. Top 20 influential AI-based technologies in chemistryArtificial Intelligence Chemistry (2024). DOI: 10.1016/j.aichem.2024.100075
  13. Predicting 195Pt NMR Chemical Shifts in Water-Soluble Inorganic/Organometallic Complexes with a Fast and Simple Protocol Combining Semiempirical Modeling and Machine LearningChemPhysChem (2023). DOI: 10.1002/cphc.202200940
  14. Digital biology approach for macroscale studies of biofilm growth and biocide effects with electron microscopyDigital Discovery (2023). DOI: 10.1039/D3DD00048F
  15. Analyzing ionic liquid systems using real-time electron microscopy and a computational framework combining deep learning and classic computer vision techniquesJournal of Molecular Liquids (2023). DOI: 10.1016/j.molliq.2023.121407
  16. Automated Recognition of Nanoparticles in Electron Microscopy Images of Nanoscale Palladium CatalystsNanomaterials (2022). DOI: 10.3390/nano12213914
  17. Integration of thermal imaging and neural networks for mechanical strength analysis and fracture prediction in 3D-printed plastic partsScientific Reports (2022). DOI: 10.1038/s41598-022-12503-y
  18. Deep neural network analysis of nanoparticle ordering to identify defects in layered carbon materialsChemical Science (2021). DOI: 10.1039/D0SC05696K
  19. Electron microscopy dataset for the recognition of nanoscale ordering effects and location of nanoparticlesScientific Data (2020). DOI: 10.1038/s41597-020-0439-1

Full list of publications: link




AI Book
Artificial Intelligence in Catalysis: Experimental and Computational Methodologies
Editor(s):Valentine P. Ananikov, Mikhail V. Polynski
Print ISBN:9783527353859 |Online ISBN:9783527847068 | DOI:10.1002/9783527847068
2025 WILEY-VCH GmbH, Weinheim, Germany.

>