Abstract
The analysis of histological images is based on visual assessment of tissues by specialists using an optical microscopy. This task can be time-consuming and challenging, mainly due to the complexity of the structures and diseases under investigation. These facts have motivated the development of computational methods to support specialists in research and decision-making. Despite the different computational strategies available in the literature, the solutions based on genetic algorithm have not been fully explored to provide the best combination of features, selection algorithms and classifiers. In this paper, we describe an approach based on genetic algorithm able to evaluate a significant number of features, selection methods and classifiers in order to provide an acceptable association for the diagnosis and pattern recognition of non-Hodgkin lymphomas and colorectal cancer. The chromosomal structure was represented with four genes. The evaluation and selection of individuals, as well as the crossover and mutation processes, were defined to distinguish the groups under investigation, with the highest AUC value and the smallest number of features. The tests were performed considering 1512 features from histological images, different population sizes and number of iterations. An initial population of 50 individuals and 50 iterations provided the best result (AUC value of 0.984) for the colorectal histological images. For non-Hodgkin lymphoma images, the best result (AUC value of 0.947) was obtained with a population of 500 individuals and 50 iterations. The proposed methodology with detailed information regarding the methods, features and best associations are relevant contributions for the community interested in the study of pattern recognition of colorectal cancer and lymphomas.
Similar content being viewed by others
Availability of data and material
Not applicable.
References
Ab Wahab MN, Nefti-Meziani S, Atyabi A (2015) A comprehensive review of swarm optimization algorithms. PloS One 10(5):e0122827
Al-Rajab M, Lu J, Xu Q (2017) Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Comput Methods Program Biomed 146:11–24
Alteri R, Kramer J, Simpson S (2014) Colorectal cancer facts and figures 2014–2016. American Cancer Society, Atlanta, pp 1–30
Anbarasi M, Anupriya E, Iyengar N (2010) Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int J Eng Sci Technol 2(10):5370–5376
Bai J, Jiang H, Li S, Ma X (2019) Nhl pathological image classification based on hierarchical local information and googlenet-based representations. BioMed Res Int 2019
Brancati N, De Pietro G, Frucci M, Riccio D (2019) A deep learning approach for breast invasive ductal carcinoma detection and lymphoma multi-classification in histological images. IEEE Access 7:44709–44720
Breiman L (2001) Mach Learn. Random forests 45(1):5–32
Bruderer E, Singh JV (1996) Organizational evolution, learning, and selection: a genetic-algorithm-based model. Acad Manag J 39(5):1322–1349
Căliman A, Ivanovici M (2012) Psoriasis image analysis using color lacunarity. In: 2012 13th international conference on optimization of electrical and electronic equipment (OPTIM), IEEE, pp 1401–1406
Candes E, Demanet L, Donoho D, Ying L (2006) Fast discrete curvelet transforms. Multisc Model Simul 5(3):861–899
Chakraborty M, Mukhopadhyay S, Dasgupta A, Patsa S, Anjum N, Ray J (2016) A new approach of oral cancer detection using bilateral texture features in digital infrared thermal images. In: 2016 IEEE 38th annual international conference of the engineering in medicine and biology society (EMBC), IEEE, pp 1377–1380
Chan HP, Charles E, Metz P, Lam KL (1990) Improvement in radiologists’ detection of clustered microcalcifications on mammograms. Arbor 1001:48109–0326
Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure. In: Machine learning proceedings 1995, Elsevier, pp 108–114
Côté-Allard U, Campbell E, Phinyomark A, Laviolette F, Gosselin B, Scheme E (2020) Interpreting deep learning features for myoelectric control: a comparison with handcrafted features. Front Bioeng Biotechnol 8:158
Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221
Demanet L (2008) The curvelet organization. http://www.curvelet.org/software.html. Accessed: 01.24.2018
Doi K (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 31(4–5):198–211
Eltoukhy MM, Faye I, Samir BB (2012) A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. Comput Biol Med 42(1):123–128
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
Gardner MW, Dorling S (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636
Gibson A, Nicholson C, Patterson J (2016) Eclipse deeplearning4j development team. deeplearning4j: Open-source distributed deep learning for the jvm, apache software foundation license 2.0. http://deeplearning4j.org. Accessed: 2019-12-10
Gonçalves EC, Freitas AA, Plastino A (2018) A survey of genetic algorithms for multi-label classification. In: 2018 IEEE congress on evolutionary computation (CEC), IEEE, pp 1–8
Gou J, Ma H, Ou W, Zeng S, Rao Y, Yang H (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115:356–372
Gurcan MN, Sahiner B, Petrick N, Chan HP, Kazerooni EA, Cascade PN, Hadjiiski L (2002) Lung nodule detection on thoracic computed tomography images: preliminary evaluation of a computer-aided diagnosis system. Med Phys 29(11):2552–2558
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Iesmantas T, Alzbutas R (2018) Convolutional capsule network for classification of breast cancer histology images. In: International conference image analysis and recognition, Springer, pp 853–860
INCA (2017) Estimate 2018: Cancer incidence in brazil (P-34)
Ivanovici M, Richard N, Decean H (2009) Fractal dimension and lacunarity of psoriatic lesions-a colour approach. Medicine 6(4):7
Jaffar MA, Siddiqui AB, Mushtaq M (2018) Ensemble classification of pulmonary nodules using gradient intensity feature descriptor and differential evolution. Clust Comput 21(1):393–407
Jørgensen AS, Rasmussen AM, Andersen NKM, Andersen SK, Emborg J, Røge R, Østergaard LR (2017) Using cell nuclei features to detect colon cancer tissue in hematoxylin and eosin stained slides. Cytom Part A 91(8):785–793
Kalkan H, Nap M, Duin RP, Loog M (2012) Automated classification of local patches in colon histopathology. In: 2012 21st international conference on pattern recognition (ICPR), IEEE, pp 61–64
Karnan M, Logheshwari T (2010) Improved implementation of brain mri image segmentation using ant colony system. In: 2010 IEEE international conference on computational intelligence and computing research, IEEE, pp 1–4
Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zöllner FG (2016) Multi-class texture analysis in colorectal cancer histology. Sci Rep 6:27988
Kečo D, Subasi A, Kevric J (2018) Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput Appl 30(5):1601–1610
Khan A, Qureshi AS, Hussain M, Hamza MY, et al. (2019) A recent survey on the applications of genetic programming in image processing. ar**v preprint ar**v:190107387
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Machine learning proceedings 1992, Elsevier, pp 249–256
Kolter JZ, Ng AY (2009) Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th annual international conference on machine learning, pp 521–528
Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2017) Auto-weka 2.0: automatic model selection and hyperparameter optimization in weka. J Mach Learn Res 18(1):826–830
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Le Cessie S, Van Houwelingen JC (1992) Ridge estimators in logistic regression. J R Stat Soc Ser C Appl Stat 41(1):191–201
Li J, Sarma KV, Ho KC, Gertych A, Knudsen BS, Arnold CW (2017) A multi-scale u-net for semantic segmentation of histological images from radical prostatectomies. In: AMIA annual symposium proceedings, american medical informatics association, vol 2017, p 1140
Li W, Li J, Sarma KV, Ho KC, Shen S, Knudsen BS, Gertych A, Arnold CW (2018) Path r-cnn for prostate cancer diagnosis and gleason grading of histological images. IEEE Trans Med Imaging 38(4):945–954
Liu L, Liu X, Wang N, Zou P (2018) Modified cuckoo search algorithm with variational parameters and logistic map. Algorithms 11(3):30
Lu C, Zhu Z, Gu X (2014) An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. J Med Syst 38(9):97
Martins AS, Neves LA, Faria PR, Tosta TA, Bruno DO, Longo LC, do Nascimento MZ (2019) Colour feature extraction and polynomial algorithm for classification of lymphoma images. In: Iberoamerican congress on pattern recognition, Springer, pp 262–271
Masood K, Rajpoot N, (2009) Texture based classification of hyperspectral colon biopsy samples using clbp. In: IEEE international symposium on biomedical imaging: from nano to macro, (2009) ISBI’09. IEEE, pp 1011–1014
Mathworks (2020) Deep learning models. https://ch.mathworks.com/solutions/deep-learning/models.html. Accessed: 2020-06-20
Mejbri S, Franchet C, Reshma IA, Mothe J, Brousset P, Faure E (2019) Deep analysis of cnn settings for new cancer whole-slide histological images segmentation: the case of small training sets
Mitchell M (1998) An introduction to genetic algorithms. MIT press
Mohamed AW, Sabry HZ, Khorshid M (2012) An alternative differential evolution algorithm for global optimization. J Adv Res 3(2):149–165
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design
Naiyar M, Asim Y, Shahid A (2015) Automated colon cancer detection using structural and morphological features. In: 2015 13th international conference on frontiers of information technology (FIT), IEEE, pp 240–245
Ng AY (2004) Feature selection, l 1 vs. l 2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on Machine learning, p 78
Nikolaidis N, Nikolaidis I, Tsouros C (2011) A variation of the box-counting algorithm applied to colour images. ar**v preprint ar**v:11072336
Özçift A, Gülten A (2013) Genetic algorithm wrapped bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases. Digit Signal Process 23(1):230–237
Paul D, Su R, Romain M, Sébastien V, Pierre V, Isabelle G (2017) Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier. Comput Med Imaging Graph 60:42–49
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Quinlan JR (2014) C4.5: programs for machine learning. Elsevier
Rathore S, Iftikhar MA, Hussain M, Jalil A (2013) Classification of colon biopsy images based on novel structural features. In: 2013 IEEE 9th international conference on emerging technologies (ICET), IEEE, pp 1–6
Remamany KP, Chelliah TR, Chandrasekaran K, Subraman K (2015) Brain tumor segmentation in mri images using integrated modified pso-fuzzy approach. Int Arab J Inf Technol 12(6A):797–805
Ribeiro MG, Neves LA, Roberto GF, Tosta TA, Martins AS, do Nascimento MZ, (2018) Analysis of the influence of color normalization in the classification of non-hodgkin lymphoma images. In: 2018 31st SIBGRAPI conference on graphics. Patterns and Images (SIBGRAPI), IEEE, pp 369–376
Ribeiro MG, Neves LA, do Nascimento MZ, Roberto GF, Martins AS, Tosta TAA, (2019) Classification of colorectal cancer based on the association of multidimensional and multiresolution features. Expert Syst Appl 120:262–278. https://doi.org/10.1016/j.eswa.2018.11.034
Roberto GF, Neves LA, Nascimento MZ, Tosta TA, Longo LC, Martins AS, Faria PR (2017) Features based on the percolation theory for quantification of non-hodgkin lymphomas. Comput Biol Med 91:135–147
Roberto GF, Nascimento MZ, Martins AS, Tosta TA, Faria PR, Neves LA (2019) Classification of breast and colorectal tumors based on percolation of color normalized images. Comput Graph 84:134–143
Samanta S, Ahmed SS, Salem MAMM, Nath SS, Dey N, Chowdhury SS (2014) Haralick features based automated glaucoma classification using back propagation neural network. In: FICTA (1), pp 351–358
Schölkopf B, Smola AJ, Bach F, et al. (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press
Shah M, Wang D, Rubadue C, Suster D, Beck A (2017) Deep learning assessment of tumor proliferation in breast cancer histological images. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM), IEEE, pp 600–603
Shamir L, Orlov N, Eckley DM, Macura TJ, Goldberg IG (2008) Iicbu 2008: a proposed benchmark suite for biological image analysis. Med Biol Eng Comput 46(9):943–947
Siegel RL, Miller KD, Jemal A (2020) Cancer statistics, 2020. CA: A Cancer J Clin 70(1):7–30
Sirinukunwattana K, Pluim JP, Chen H, Qi X, Heng PA, Guo YB, Wang LY, Matuszewski BJ, Bruni E, Sanchez U et al (2017) Gland segmentation in colon histology images: the glas challenge contest. Med Image Anal 35:489–502
Song Y, Li Q, Huang H, Feng D, Chen M, Cai W (2017) Low dimensional representation of fisher vectors for microscopy image classification. IEEE Trans Med Imaging 36(8):1636–1649
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855
Van Ginneken B, Romeny BTH, Viergever MA (2001) Computer-aided diagnosis in chest radiography: a survey. IEEE Trans Med Imaging 20(12):1228–1241
Van Rossum G, Drake FL (2011) The python language reference manual. Network Theory Ltd
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Welikala R, Fraz MM, Dehmeshki J, Hoppe A, Tah V, Mann S, Williamson TH, Barman SA (2015) Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Comput Med Imaging Graph 43:64–77
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85
Xerri L, Dirnhofer S, Quintanilla-Martinez L, Sander B, Chan JK, Campo E, Swerdlow SH, Ott G (2016) The heterogeneity of follicular lymphomas: from early development to transformation. Virchows Arch 468(2):127–139
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, Springer, pp 117–136
Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: 2009 world congress on nature & biologically inspired computing (NaBIC), IEEE, pp 210–214
Zhang X, Wang J, Hong C, Luo W, Wang C (2015) Design, synthesis and evaluation of genistein-polyamine conjugates as multi-functional anti-alzheimer agents. Acta Pharm Sin B 5(1):67–73
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor Newslett 6(1):80–89
Funding
This study was financed in part by the: National Council for Scientific and Technological Development CNPq (Grants Nos. #427114/2016-0, #304848/2018- 2, #430965/2018-4 and #313365/2018-0); and State of Minas Gerais Research Foundation - FAPEMIG (Grant No. #APQ-00578-18).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Code availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Taino, D.F., Ribeiro, M.G., Roberto, G.F. et al. Analysis of cancer in histological images: employing an approach based on genetic algorithm. Pattern Anal Applic 24, 483–496 (2021). https://doi.org/10.1007/s10044-020-00931-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00931-3