JCIM | G2GT: Retrosynthesis Prediction with Graph-to-Graph Attention Neural Network and Self-Training

发布时间: 2023-03-23
 浏览次数: 1
Zaiyun Lin, Shiqiu Yin, Lei Shi, Wenbiao Zhou, and Yingsheng John Zhang
Abstract:
Retrosynthesis prediction, the task of identifying reactant molecules that can be usedto synthesize product molecules, is a fundamental challenge in organic chemistry and related fields.To address this challenge, we propose a novel graph-to-graph transformation model, G2GT. Themodel is built on the standard transformer structure and utilizes graph encoders and decoders.Additionally, we demonstrate the effectiveness of self-training, a data augmentation technique thatutilizes unlabeled molecular data, in improving the performance of the model. To further enhancediversity, we propose a weak ensemble method, inspired by reaction-type labels and ensemblelearning. This method incorporates beam search, nucleus sampling, and top-k sampling to improveinference diversity. A simple ranking algorithm is employed to retrieve the final top-10 results. Weachieved new state-of-the-art results on both the USPTO-50K data set, with a top-1 accuracy of54%, and the larger more challenging USPTO-Full data set, with a top-1 accuracy of 49.3% andcompetitive top-10 results. Our model can also be generalized to all other graph-to-graphtransformation tasks. Data and code are available at https://github.com/Anonnoname/G2GT_2
  • 返回顶部