催化学报 ›› 2023, Vol. 50: 284-296.DOI: 10.1016/S1872-2067(23)64467-5

• 论文 • 上一篇    下一篇

用于高精度催化性能预测的精细结构敏感型深度学习框架

陈宇卓a, 王浩a, 陆冰a, 易倪b, 曹亮b, 王勇a, 毛善俊a,*()   

  1. a浙江大学化学系, 催化研究所先进材料与催化课题组, 浙江杭州 310028
    b浙江大学化学系, 催化研究所, 浙江杭州 310028
  • 收稿日期:2023-03-20 接受日期:2023-06-01 出版日期:2023-07-18 发布日期:2023-07-25
  • 通讯作者: *电子信箱: maoshanjun@zju.edu.cn (毛善俊).
  • 基金资助:
    国家重点研发计划(2021YFB3801600);浙江省“尖兵” “领雁”研发攻关计划项目(2022C01218);浙江省“尖兵” “领雁”研发攻关计划项目(2022C01151);浙江省“尖兵” “领雁”研发攻关计划项目(2023C01108)

Fine-structure sensitive deep learning framework for predicting catalytic properties with high precision

Yuzhuo Chena, Hao Wanga, Bing Lua, Ni Yib, Liang Caob, Yong Wanga, Shanjun Maoa,*()   

  1. aAdvanced Materials and Catalysis Group, Center of Chemistry for Frontier Technologies, State Key Laboratory of Clean Energy Utilization, Institute of Catalysis, Department of Chemistry, Zhejiang University, Hangzhou 310028, Zhejiang, China
    bInstitute of Catalysis, Department of Chemistry, Zhejiang University, Hangzhou 310028, Zhejiang, China
  • Received:2023-03-20 Accepted:2023-06-01 Online:2023-07-18 Published:2023-07-25
  • Contact: *E-mail: maoshanjun@zju.edu.cn (S. Mao).
  • Supported by:
    National Key R&D Program of China(2021YFB3801600);“Pioneer” and “Leading Goose” R&D Program of Zhejiang Province(2022C01218);“Pioneer” and “Leading Goose” R&D Program of Zhejiang Province(2022C01151);“Pioneer” and “Leading Goose” R&D Program of Zhejiang Province(2023C01108)

摘要:

催化剂表面的精细结构对结构敏感型反应有很大影响, 高通量(HT)筛选和机器学习(ML)可以有效地探索这些影响因素. 为了将ML与化学相结合, 必须首先将化学结构转换为可用作ML模型输入的特征编码, 目前常用的两种转换方法为描述符和图. 然而, 描述符的构建往往忽略原子连接, 这使得ML模型难以捕获与催化性能最相关的几何信息. 基于图的ML模型在更新节点的过程中会不可避免地丢失吸附位点的几何排列信息, 同时消息传递神经网络复杂, 导致其对电子或几何结构不敏感、缺乏可解释性. 因此, 目前仍然缺乏可以同时兼顾多相催化中电子和几何精细结构的可解释ML框架.

相比之下, 将化学结构转换为网格数据可以完全保留精细的几何信息. 鉴于此, 通过将催化剂表面结构和吸附位点信息分别转换为二维网格和一维描述符, 本文创建了一个名为“整体+局部”卷积神经网络(GLCNN)的数据增强(DA)卷积神经网络(CNN)ML框架, 其结合“整体+局部”特征, 无需复杂的编码即可捕获原始精细结构, DA的加入可以扩充数据集并减缓过拟合. GLCNN可以很好地预测和区分碳基过渡金属单原子催化剂上OH的吸附能, 平均绝对误差小于0.1 eV, 这是在大型数据集上训练的ML模型所能达到的较好结果. 将GLCNN与基于描述符或图的模型对比, 结果表明, 对比模型无法完全准确预测包含IB和IIB过渡金属或者顺式/反式构型催化剂的OH吸附能. 而GLCNN模型的预测效果明显好于对比模型, 表明网格和描述符的组合可以更好地体现催化活性中心的电子和精细几何结构信息. 另外, 对DA处理后的样本计算平均标准误差后发现, 通过DA获得的不同晶胞几乎不影响预测结果, 说明DA对晶胞的平移并不改变晶胞的性质, 表明GLCNN可以学习到周期性表面的边界条件信息. 与传统的CNN和基于描述符的单边特征提取不同, 本文中对精细结构敏感的ML框架可以通过不包含人类偏见的可解释性分析, 从几何和化学/电子特征中提取影响催化性能的关键因素, 如对称和配位元素. 一维描述符的特征重要性分析表明, 吸附位点的电子结构和对称性特征至关重要, 且金属对于催化性能的影响强于其配位环境. 将CNN卷积部分的中间输出可视化后发现, 碳基载体上远离金属的区域中很大一部分对催化性能几乎没有直接影响, 且卷积层会优先对金属原子反复关注, 再次强调金属的重要性高于其配位环境, 表明卷积核可以自动提取符合催化常识化学结构的几何信息. 对全连接层(FC)进行降维可视化分析后发现, 随着层数的增加, FC基于基本催化知识逐渐寻找特征提取的方向, 提取出更抽象的有利于吸附能预测的高维特征, 这与卷积部分类似. 综上, GLCNN框架为具有广阔物理和化学空间的多相催化剂的高精度HT筛选提供了可行方案.

关键词: 非均相催化, 机器学习, 精细结构敏感, “整体+局部”特征, 可解释性, 数据增强, “整体+局部”卷积神经网络

Abstract:

The fine structure of a surface considerably affects its catalytic performance in structurally sensitive reactions. High-throughput (HT) screening and machine learning (ML) are considered efficient for exploring the hidden rules of impacts. However, no protocol for constructing an interpretable ML framework sensitive to fine structures has been reported thus far. Herein, we developed a data augmented convolutional neural network (CNN)-based ML framework called "global + local" convolutional neural network (GLCNN), which combines "global + local" features. This framework captures original fine structures without the use of complicated encoding methods by transforming the catalytic surfaces and adsorption sites into two-dimensional grids and one-dimensional descriptors, respectively. The GLCNN framework accurately predicted and distinguished the adsorption energies of OH on a set of analogous carbon-based transition-metal single-atom catalysts with a mean absolute error of less than 0.1 eV. Moreover, this model yields the best results among popular models trained on large datasets so far. Unlike conventional CNN and descriptor-based models with one-sided feature extraction, this fine-structure-sensitive ML framework can extract key factors that affect the catalytic performance from both geometric and chemical/electronic features, such as symmetry and coordination elements, through unbiased interpretable analysis. This framework provides a feasible solution for the high-precision HT screening of heterogeneous catalysts with a broad physical and chemical space.

Key words: Heterogeneous catalyst, Machine learning, Fine-structure sensitive, "Global + local" feature, Interpretability, Data augmentation, "Global + local" convolutional neural network