MV-MoCL: A Multi-View Molecular Contrastive Learning Framework by Integrating SMILES, 2D Graph, and 3D Geometry

Main Article Content

Hang Zhao
Huang Xu

Keywords

molecular property prediction, multi-view learning, graph neural networks, deep learning, contrastive learning, bioinformatics

Abstract

Machine learning-based molecular property prediction (MPP) has garnered significant attention in computer-aided drug discovery, with its primary goal being the accurate estimation of molecular properties using structural data to accelerate drug development. In recent years, multi-view-based molecular property prediction learning has aimed to integrate information from different molecular views to learn high-quality molecular representations. Furthermore, the three-dimensional geometric information of molecules encompasses a richer set of spatial features, which plays a critical role in enhancing the accuracy of molecular property prediction. However, existing models often overlook 3D information in molecules. To address this, we propose a contrastive learning model named MV-MoCL, which incorporates encoders for multiple molecular dimensions: a SMILES Transformer for the SMILES sequence view, a Graph Isomorphism Network (GIN) for 2D molecular graphs, and SchNet for 3D geometric conformations. By aligning representations from these views using a contrastive loss function, our approach captures rich, multi-faceted molecular features during pre-training, thereby improving performance on downstream molecular property prediction tasks and effectively mitigating the issue of scarce labeled data. The proposed model was evaluated on several benchmark datasets from MoleculeNet, and experiments demonstrate that MV-MoCL matches or surpasses existing models across multiple benchmarks.

Abstract 0 | PDF Downloads 0

References

  • Deng, J., Yang, Z., Wang, H., Ojima, I., Samaras, D. and Wang, F., (2023). A systematic study of key elements underlying molecular property prediction. Nature Communications, vol. 14, no. 1, p. 6395.
  • Guo, Z., Guo, K., Nan, B., Tian, Y., Iyer, R. G., Ma, Y., Wiest, O., Zhang, X., Wang, W. and Zhang, C., (2023). Published. Graph-based molecular representation learning. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. IJCAI Press, pp. 6638-6646.
  • Guo, Z., Yu, W., Zhang, C., Jiang, M. and Chawla, N. V., (2020). Published. GraSeq: graph and sequence fusion learning for molecular property prediction. Proceedings of the 29th ACM international conference on information & knowledge management. ACM, pp. 435-443.
  • Honda, S., Shi, S. and Ueda, H. R., (2019). Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738.
  • Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V. and Leskovec, J., (2019). Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265.
  • Jiang, X., Tan, L. and Zou, Q., (2024). DGCL: dual-graph neural networks contrastive learning for molecular property prediction. Briefings in Bioinformatics, vol. 25, no. 6, p. bbae474.
  • Kearnes, S., McCloskey, K., Berndl, M., Pande, V. and Riley, P., (2016). Molecular graph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design, vol. 30, no. 8, pp. 595-608.
  • Li, Q., Zhang, Y. and Yan, J., (2025). ESG: Resource or Burden? Evidence from Chinese Listed Firms with Innovation Capability as the Mediating Mechanism. Systems, vol. 13, no. 9, p. 831.
  • Li, Z., Jiang, M., Wang, S. and Zhang, S., (2022). Deep learning methods for molecular representation and property prediction. Drug Discovery Today, vol. 27, no. 12, p. 103373.
  • Lin, J., Zheng, Y., Chen, X., Ren, Y., Pu, X. and He, J., (2024). Published. Cross-view Contrastive Unification Guides Generative Pretraining for Molecular Property Prediction. Proceedings of the 32nd ACM International Conference on Multimedia. ACM, pp. 2108-2116.
  • Liu, S., Wang, H., Liu, W., Lasenby, J., Guo, H. and Tang, J., (2021). Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728.
  • Rong, Y., Bian, Y., Xu, T., Xie, W., Wei, Y., Huang, W. and Huang, J., (2020). Self-supervised graph transformer on large-scale molecular data. Advances in neural information processing systems, vol. 33, pp. 12559-12571.
  • Schütt, K., Kindermans, P.-J., Sauceda, H., Chmiela, S., Tkatchenko, A. and Müller, K.-R., (2017). Published. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates, Inc., pp. 992-1002.
  • Stärk, H., Beaini, D., Corso, G., Tossou, P., Dallago, C., Günnemann, S. and Liò, P., (2022). Published. 3d infomax improves gnns for molecular property prediction. International Conference on Machine Learning. PMLR, pp. 20479-20502.
  • Wang, S., Guo, Y., Wang, Y., Sun, H. and Huang, J., (2019). Published. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. ACM, pp. 429-436.
  • Wang, Y., Wang, J., Cao, Z. and Barati Farimani, A., (2022). Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, vol. 4, no. 3, pp. 279-287.
  • Wang, Z., Jiang, T., Wang, J. and Xuan, Q., (2024). Multi-modal representation learning for molecular property prediction: sequence, graph, geometry. arXiv preprint arXiv:2401.03369.
  • Wen, N., Liu, G., Zhang, J., Zhang, R., Fu, Y. and Han, X., (2022). A fingerprints based molecular property prediction method using the BERT model. Journal of Cheminformatics, vol. 14, no. 1, p. 71.
  • Wu, T., Tang, Y., Sun, Q. and Xiong, L., (2023). Molecular joint representation learning via multi-modal information of SMILES and graphs. IEEE/ACM transactions on computational biology and bioinformatics, vol. 20, no. 5, pp. 3044-3055.
  • Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K. and Pande, V., (2018). MoleculeNet: a benchmark for molecular machine learning. Chemical science, vol. 9, no. 2, pp. 513-530.
  • Xu, K., Hu, W., Leskovec, J. and Jegelka, S., (2018). How powerful are graph neural networks? arXiv preprint arXiv:1810.00826.
  • You, Y., Chen, T., Shen, Y. and Wang, Z., (2021). Published. Graph contrastive learning automated. International Conference on Machine Learning. PMLR, pp. 12121-12132.
  • You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z. and Shen, Y., (2020). Published. Graph contrastive learning with augmentations. Advances in neural information processing systems. Curran Associates, Inc., pp. 5812-5823.
  • Zhang, R., Lin, Y., Wu, Y., Deng, L., Zhang, H., Liao, M. and Peng, Y., (2024). MvMRL: a multi-view molecular representation learning method for molecular property prediction. Briefings in Bioinformatics, vol. 25, no. 4, p. bbae298.