催化学报 ›› 2025, Vol. 73: 159-173.DOI: 10.1016/S1872-2067(25)64725-5

• 论文 • 上一篇    下一篇

从实验室到工厂: 化学工程领域的大语言模型

周吉彬a,1, 徐飞扬b,1, 常志军c, 刘对平a, 李路路a, 崔健b, 李益b, 李鑫b,d,e(), 钱力c, 张智雄c, 胡国平b,e, 叶茂a(), 刘中民a   

  1. a中国科学院大连化学物理研究所, 低碳催化技术国家工程研究中心, 辽宁大连 116023
    b科大讯飞股份有限公司, 人工智能研究院, 安徽合肥 230000
    c中国科学院文献情报中心, 北京 100190
    d中国科学技术大学, 安徽合肥 230000
    e认知智能国家重点实验室, 安徽合肥 230000
  • 收稿日期:2025-03-27 接受日期:2025-05-13 出版日期:2025-06-18 发布日期:2025-06-12
  • 通讯作者: *电子信箱: maoye@dicp.ac.cn (叶茂),leexin@ustc.edu.cn (李鑫).
  • 作者简介:1共同第一作者.
  • 基金资助:
    辽宁滨海实验室联合类基金(LBLF-2023-01);中国科学院A类先导专项(XDA0490000);辽宁省重点研发计划(2023JH26/10200012)

From lab to fab: A large language model for chemical engineering

Jibin Zhoua,1, Feiyang Xub,1, Zhijun Changc, Duiping Liua, Lulu Lia, Jian Cuib, Yi Lib, Xin Lib,d,e(), Li Qianc, Zhixiong Zhangc, Guoping Hub,e, Mao Yea(), Zhongmin Liua   

  1. aNational Engineering Research Center of Lower-Carbon Catalysis Technology, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China
    bArtificial Intelligence Research Institute, iFLYTEK Co., Ltd., Hefei 230000, Anhui, China
    cNational Science Library, Chinese Academy of Sciences, Beijing 100190, China
    dUniversity of Science and Technology of China, Hefei 230000, Anhui, China
    eState Key Laboratory of Cognitive Intelligence, Hefei 230000, Anhui, China
  • Received:2025-03-27 Accepted:2025-05-13 Online:2025-06-18 Published:2025-06-12
  • Contact: *E-mail: maoye@dicp.ac.cn (M. Ye),leexin@ustc.edu.cn (X. Li).
  • About author:1Contributed equally to this work.
  • Supported by:
    Liaoning Binhai Laboratory(LBLF-2023-01);Strategic Priority ResearchProgram of Chinese Academy of Sciences(XDA0490000);Key Research and Development Program of Liaoning(2023JH26/10200012)

摘要:

化学工程技术的开发是一个复杂多阶段的过程, 涵盖实验室研究、过程放大到工业部署应用等多个环节. 该过程不仅需要化学、材料和工程等多学科的紧密协作, 还面临着漫长的研发周期和高昂的经济成本. 尽管以大语言模型为代表的生成式人工智能在基础研究领域取得显著进展, 但其在复杂工程问题中的深度应用仍面临挑战. 现有通用大语言模型对化学工程专业知识的理解有限, 难以支撑从实验室创新到工业化实施的全链条技术转化. 同时, 由于缺乏系统性评估基准, 难以客观评价大语言模型在化工专业场景中的实际性能.

为了应对上述挑战, 本文以星火大模型为基座, 成功开发出面向化学工程领域的垂直大语言模型ChemELLM, 其参数规模高达700亿. 同时, 为了全面且系统地评估大语言模型在化学工程领域的综合能力, 本文精心构建了首个化学工程多维度评估基准体系ChemEBench. 该体系采用从基础知识理解、领域高级解析到专业问题求解的递进式三级架构评估框架, 涵盖了催化剂设计、流体模拟、设备选型和安全评估等15个核心领域, 并设置101项细粒度评估任务, 实现了从基础理论认知到复杂工程建设的全维度能力评估. 基准测试结果表明, ChemELLM在上述关键指标上均表现卓越, 综合性能领先于O1-Preview, GPT-4o和DeepSeek-R1等主流大语言模型. 此外, 为了支撑大语言模型的高质量训练与微调, 构建了ChemEData数据集, 其中预训练语料规模达190亿token, 包含106万篇高质量专业文献、579万篇高价值专利以及1200本专业书籍; 微调数据集规模达10亿token, 包含275万对精心设计的问答对数据.

综上, 本研究聚焦化学工程领域大语言模型的开发, 提升其对化学工程领域的理解和推理能力, 有望建立从实验室研究到工业应用之间的桥梁, 加速化工新技术落地与产业化进程, 构建人工智能驱动化学工程创新的新范式. ChemELLM已上线部署并可公开访问, https://chemindustry.iflytek.com/chat.

关键词: 大语言模型, 化学工程, 过程开发, 多维度基准评估体系, 领域适用性

Abstract:

The development of chemical technologies, which involves a multistage process covering laboratory research, scale-up to industrial deployment, and necessitates interdisciplinary collaboration, is often accompanied by substantial time and economic costs. To address these challenges, in this work, we report ChemELLM, a domain-specific large language model (LLM) with 70 billion parameters for chemical engineering. ChemELLM demonstrates state-of-the-art performance across critical tasks ranging from foundational understanding to professional problem-solving. It outperforms mainstream LLMs (e.g., O1-Preview, GPT-4o, and DeepSeek-R1) on ChemEBench, the first multidimensional benchmark for chemical engineering, which encompasses 15 dimensions across 101 distinct essential tasks. To support robust model development, we curated ChemEData, a purpose-built dataset containing 19 billion tokens for pre-training and 1 billion tokens for fine-tuning. This work establishes a new paradigm for artificial intelligence-driven innovation, bridging the gap between laboratory‐scale innovation and industrial‐scale implementation, thus accelerating technological advancement in chemical engineering. ChemELLM is publicly available at https://chemindustry.iflytek.com/chat.

Key words: Large language model, Chemical engineering, Process development, Multidimensional benchmark, Domain adaptation