Imputation of Compound Property Assay Data Using a Gene Expression Programming-Based Method

Zhou, Hongliang; Lin, Yanmei; Chen, Nan; Peng, Yuzhong

doi:10.1007/978-981-97-0903-8_13

Hongliang Zhou⁸,
Yanmei Lin⁸,
Nan Chen⁸ &
…
Yuzhong Peng^8,9

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2014))

Included in the following conference series:

International Conference on Applied Intelligence

177 Accesses

Abstract

Compound property assays are an important part of drug development, but incomplete data may occur for a variety of reasons. To deal with these incomplete data and improve the success rate of drug development, researchers often need to effectively impute the missing data. Therefore, this paper proposes a gene expression programming-based method, called GEP-CPI, for imputing missing compound property assay data. In GEP-CPI, the missing data imputation model is expressed by the parse tree of a chromosome, and then the optimal missing data imputation model is mined by iterative evolution of the chromosome population. Experimental results on three compound property assay related datasets demonstrates that the proposed method generally outperforms the state-of-the-art methods in imputing missing data of compound property assays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Genetic Programming-Based Imputation Method for Classification with Missing Data

An efficient ensemble method for missing value imputation in microarray gene expression data

Article Open access 13 April 2021

Cancer Based Pharmacogenomics Network for Drug Repurposing

References

Zhang, H., Zhou, S., Zhang, K., Guan, J.: Residual similarity based conditional independence test and its application in causal discovery. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 5, pp. 5942–5949 (2022)
Google Scholar
Zhang, H., Zhou, S., Yan, C., Guan, J., Wang, X.: Recursively learning causal structures using regression-based conditional independence test. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3108–3115 (2019)
Google Scholar
Zhang, H., Zhou, S., Yan, C., Wang, X., Zhang, J., Huan, J.: Learning causal structures based on divide and conquer. IEEE Trans. Cybern. 52(5), 3232–3243 (2022)
Article Google Scholar
Peng, Y., Zhang, Z., Jiang, Q., Guan, J., Zhou, S.: TOP: towards better toxicity prediction by deep molecular representation learning. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 318–325. IEEE (2019)
Google Scholar
Peng, Y., Zhang, Z., Jiang, Q., Guan, J., Zhou, S.: TOP: A deep mixture representation learning method for boosting molecular toxicity prediction. Methods 179(1), 55–64 (2020)
Article Google Scholar
Peng, Y., Lin, Y., **g, X., Zhang, H., Huang, Y., Luo, G.: Enhanced graph isomorphism network for molecular ADMET properties prediction. IEEE Access 8(1), 168344–168360 (2020)
Article Google Scholar
Little, R., Rubin, D.: Statistical Analysis with Missing Data, 2nd edn. Wiley, Hoboken (2019)
Google Scholar
Liu, K., Hu, X., Zhou, H.: Feature analyses and modeling of lithium-ion battery manufacturing based on random forest classification. IEEE/ASME Trans. Mechatron. 6, 2944–2955 (2021)
Article Google Scholar
Kim, E., Bae, G., Ahn, G.: A study on the imputation solution for missing speed data on UTIS by using adaptive k-NN algorithm. J. Korea Inst. Intell. Transp. Syst. 3, 66–77 (2014)
Article Google Scholar
Sahoo, A., Ghose, D.: Imputation of missing precipitation data using KNN, SOM, RF, and FNN. Soft. Comput. 12, 5919–5936 (2022)
Article Google Scholar
Ma, T., Hu, Y., Wang, J.: A novel vegetation index approach using sentinel-2 data and random forest algorithm for estimating forest stock volume in the Helan mountains, Ningxia, China. Remote Sens. 15(7), 1853 (2023)
Google Scholar
Zushida, K., Haohao, Z., Shimamur, H.: Application and analysis of random forest algorithm for estimating lawn grass lengths in robotic lawn mower. Int. J. Mech. Eng. Appl. (1), 6 (2021)
Google Scholar
Rahman, M., Islam, M.: Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl.-Based Syst. 53, 51–65 (2013)
Article Google Scholar
Che, Z., Purushotham, S., Cho, K.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018)
Article Google Scholar
Phiwhorm, K., Saikaew, C., Leung, C.: Adaptive multiple imputations of missing values using the class center. J. Big Data 9(1), 52 (2022)
Article Google Scholar
Chen, J., Huang, H., Tian, F.: A selective bayes classifier for classifying incomplete data based on gain ratio. Knowl.-Based Syst. 21(7), 530–534 (2008)
Article Google Scholar
Johnson, T., Isaac, N., Paviolo, A.: Handling missing values in trait data. Glob. Ecol. Biogeogr. 30(1), 51–62 (2021)
Article Google Scholar
Fei, K., Li, Q., Zhu, C.: Non-technical losses detection using missing values’ pattern and neural architecture search. Int. J. Electr. Power Energy Syst. 134, 107410 (2022)
Article Google Scholar
Dinh, D., Huynh, V., Sriboonchitta, S.: Clustering mixed numerical and categorical data with missing values. Inf. Sci. 571, 418–442 (2021)
Article MathSciNet Google Scholar
Zhang, Y., Wang, Y., Gong, D.: Clustering-guided particle swarm feature selection algorithm for high-dimensional imbalanced data with missing values. IEEE Trans. Evol. Comput. 26(4), 616–630 (2021)
Article Google Scholar
Di, N.: Missing data analysis with fuzzy C-Means: a study of its application in a psychological scenario. Expert Syst. Appl. 6, 6793–6797 (2011)
Google Scholar
Wang, J., Li, D., Zhang, H.: An improvement of support vector machine imputation algorithm based on multiple iteration and grid search strategies. In: 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), pp. 538–543 (2020)
Google Scholar
Kengkanna, A., Ohue, M.: Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction. ar**v preprint ar**v:2304.06253 (2023)
Irwin, B., Levell, J., Whitehead, T.: Practical applications of deep learning to impute heterogeneous drug discovery data. J. Chem. Inf. Model. 6, 2848–2857 (2020)
Article Google Scholar
Whitehead, T., Irwin, B., Hunt, P.: Imputation of assay bioactivity data using deep learning. J. Chem. Inf. Model. 3, 1197–1204 (2019)
Article Google Scholar
Whitehead, T., Irwin, B., Hunt, P.: Imputing compound activities based on sparse and noisy data. In: The American Chemical Society (ACS), p. 257 (2019)
Google Scholar
Sarir, P., Chen, J., Asteris, P.: Develo** GEP tree-based, neuro-swarm, and whale optimization models for evaluation of bearing capacity of concrete-filled steel tube columns. Eng. Comput. 37, 1–19 (2021)
Article Google Scholar
Ren, L., Wang, N., Pang, W.: Modeling and monitoring the material removal rate of abrasive belt grinding based on vision measurement and the gene expression programming (GEP) algorithm. Int. J. Adv. Manuf. Technol. 120(1–2), 385–401 (2022)
Article Google Scholar
Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst. (2), 87–129 (2001)
Google Scholar
Changan, Y., Yuzhong, P., **ao, Q.: Principles and Applications of Gene Expression Programming Algorithm. China Science Publishing, Bei**g (2010)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (#62262044), and Natural Science Foundation of Guangxi Province (#2023GXNSFAA026027), the Project of Guangxi Chinese medicine multidisciplinary crossover innovation team (#GZKJ2311).

Author information

Authors and Affiliations

Key Lab of Scientific Computing and Intelligent Information Processing, Nanning Normal University, Nanning, 5300001, Guangxi, China
Hongliang Zhou, Yanmei Lin, Nan Chen & Yuzhong Peng
Guangxi Academy of Sciences, Nanning, 530007, Guangxi, China
Yuzhong Peng

Authors

Hongliang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yanmei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Nan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yanmei Lin or Yuzhong Peng .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Guangxi Academy of Sciences, Guangxi, China
Changan Yuan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, H., Lin, Y., Chen, N., Peng, Y. (2024). Imputation of Compound Property Assay Data Using a Gene Expression Programming-Based Method. In: Huang, DS., Premaratne, P., Yuan, C. (eds) Applied Intelligence. ICAI 2023. Communications in Computer and Information Science, vol 2014. Springer, Singapore. https://doi.org/10.1007/978-981-97-0903-8_13

Download citation

DOI: https://doi.org/10.1007/978-981-97-0903-8_13
Published: 01 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0902-1
Online ISBN: 978-981-97-0903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Imputation of Compound Property Assay Data Using a Gene Expression Programming-Based Method

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Genetic Programming-Based Imputation Method for Classification with Missing Data

An efficient ensemble method for missing value imputation in microarray gene expression data

Cancer Based Pharmacogenomics Network for Drug Repurposing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Imputation of Compound Property Assay Data Using a Gene Expression Programming-Based Method

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Genetic Programming-Based Imputation Method for Classification with Missing Data

An efficient ensemble method for missing value imputation in microarray gene expression data

Cancer Based Pharmacogenomics Network for Drug Repurposing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation