Abstract
Training effective models for segmentation or classification of microscopy images is a hard task, complicated by the scarcity of adequately labeled data sets. In this context, self-supervised learning strategies can be deployed to learn suitable image representations from the available large quantity of unlabeled data, e.g. the 500k electron microscopy images that compose the CEM500k data sets.
In this work, we investigate a self-supervised strategy for representation learning based on a colorization pre-text task on microscopy images. We integrate the colorization task into the BYOL (Bootstrap your own latent) self-supervised contrastive pre-training strategy. We train the self-supervised architecture on the CEM500k data set of electron microscopy images. As backbone of the BYOL framework, we investigate the use of Resnet50 and a Stand-alone Self-Attention network, and subsequently test them as feature extractors for downstream classification and segmentation tasks.
The Self-Attention encoders pre-trained with the colorization-based BYOL method are able to learn effective features for segmentation of microscopy images, achieving higher results than those of encoders, both Resnet- and Self-Attention-based, trained with the original BYOL. This shows the effectiveness of colorization as pre-text for a downstream segmentation task on microscopy images. We release the code at https://github.com/nis-research/selfsup-byol-colorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Github repository: https://github.com/nis-research/selfsup-byol-colorization.
References
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. cite arxiv:2005.12872 (2020)
Casser, V., Kang, K., Pfister, H., Haehn, D.: Fast mitochondria detection for connectomics. Nat. Methods 16(12), 1247–1253 (2019)
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017). http://arxiv.org/abs/1706.05587
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations (2020)
Chen, X., He, K.: Exploring simple Siamese representation learning. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15745–15753 (2021). https://doi.org/10.1109/CVPR46437.2021.01549
Conrad, R., Narayan, K.: CEM500k, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning. eLife 10, e65894 (2021). https://doi.org/10.7554/eLife.65894
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Grill, J.B., et al.: Bootstrap your own latent: A new approach to self-supervised learning (2020)
Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T.: Patch-NetVLAD: multi-scale fusion of locally-global descriptors for place recognition. In: CVPR, pp. 14141–14152 (2021)
He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735 (2020). https://doi.org/10.1109/CVPR42600.2020.00975
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
**g, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: A survey. CoRR abs/1902.06162 (2019). http://arxiv.org/abs/1902.06162
Kasthuri, N., et al.: Saturated reconstruction of a volume of neocortex. Cell 162(3), 648–661 (2015). https://doi.org/10.1016/j.cell.2015.06.054
Kim, D., Cho, D., Yoo, D., Kweon, I.S.: Learning image representations by completing damaged jigsaw puzzles (2018)
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. CoRR abs/1603.06668 (2016). http://arxiv.org/abs/1603.06668
Leyva-Vallina, M., Strisciuglio, N., Petkov, N.: Generalized contrastive optimization of Siamese networks for place recognition. CoRR abs/2103.06638 (2021). https://arxiv.org/abs/2103.06638
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1411.4038 (2014). http://arxiv.org/abs/1411.4038
Lucchi, A., Smith, K., Achanta, R., Knott, G., Fua, P.: Supervoxel-based segmentation of mitochondria in EM image stacks with learned shape features. IEEE Trans. Med. Imag. 31(2), 474–486 (2012). https://doi.org/10.1109/TMI.2011.2171705
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE CVPR, pp. 4040–4048. ar**v:1512.02134 (2016)
Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The Mapillary Vistas dataset for semantic understanding of street scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles (2017)
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. CoRR abs/1604.07379 (2016). http://arxiv.org/abs/1604.07379
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Singh, S., et al.: Self-supervised feature learning for semantic segmentation of overhead imagery. In: BMVC (2018)
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1526–1535 (2018)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. CoRR abs/1603.08511 (2016). http://arxiv.org/abs/1603.08511
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. CoRR abs/1612.01105 (2016). http://arxiv.org/abs/1612.01105
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR abs/1703.10593 (2017). http://arxiv.org/abs/1703.10593
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pandey, V., Brune, C., Strisciuglio, N. (2022). Self-supervised Learning Through Colorization for Microscopy Images. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-031-06430-2_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)