Abstract
Transformers are more expressive than convolutions due to their global receptive field and lack of inherent biases. However, they require large amounts of training data to use this expressivity, which might hinder their application in scenarios with scarce training data. In the past, several works explored the idea of adding convolutions to transformer architecture to mitigate this issue. Yet, they underutilize hierarchical features extracted by these convolutions. We propose ConvSegformer, a hybrid deep learning model, by modifying the SegFormer architecture. We add a parallel convolutional encoder to extract hierarchical features that guide the SegFormer model. Furthermore, we replace the simple decoding scheme of the SegFormer architecture with a progressive upsampling method using features from both SegFormer and convolutional encoders. Finally, we modify the efficient self-attention module in the SegFormer branch to integrate transformer and convolution features. We demonstrate the efficacy of the proposed method in detecting discontinuities in Gamma Portable Radar Interferometer (GPRI) images. The code is available at https://github.com/VimsLab/ConvSegFormer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ali, A., et al.: XCiT: cross-covariance image transformers. Adv. Neural. Inf. Process. Syst. 34, 20014–20027 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. ar**v preprint ar**v:1706.05587 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1601–1610 (2021)
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Fan, T., Wang, G., Li, Y., Wang, H.: MA-Net: a multi-scale attention network for liver and tumor segmentation. IEEE Access 8, 179656–179665 (2020). https://doi.org/10.1109/ACCESS.2020.3025372
Gulati, A., et al.: Conformer: convolution-augmented transformer for speech recognition. ar**v preprint ar**v:2005.08100 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861 (2017)
Hudson, D.A., Zitnick, L.: Generative adversarial transformers. In: International Conference on Machine Learning, pp. 4487–4499. PMLR (2021)
Jiang, Y., Chang, S., Wang, Z.: TransGAN: two transformers can make one strong gan. ar**v preprint ar**v:2102.07074 1(3) (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ar**v preprint ar**v:1412.6980 (2014)
Li, H., **ong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. ar**v preprint ar**v:1805.10180 (2018)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I., Papathanassiou, K.P.: A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 1(1), 6–43 (2013). https://doi.org/10.1109/MGRS.2013.2248301
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7262–7272 (2021)
Wu, H., et al.: CvT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)
**ao, T., Singh, M., Mintun, E., Darrell, T., Dollár, P., Girshick, R.: Early convolutions help transformers see better. Adv. Neural. Inf. Process. Syst. 34, 30392–30400 (2021)
**e, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Yuan, K., Guo, S., Liu, Z., Zhou, A., Yu, F., Wu, W.: Incorporating convolution designs into visual transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 579–588 (2021)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. ar**v preprint ar**v:2010.04159 (2020)
Acknowledgement
This work was supported by the Office of Naval Research, Arctic and Global Prediction Program as part of the Sea Ice Dynamics Experiment (SIDEx) under award number N00014-19-1-2606. The authors would like to thank Dr. Jennifer Hutchings, Dr. Chris Polashenski, and other members of the SIDEx team for their contributions to this work in the form of invaluable input and feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dulam, R.V.S., Fedders, E.R., Mahoney, A.R., Kambhamettu, C. (2023). ConvSegFormer - A Convolution Aided SegFormer Architecture for Detection of Discontinuities in Wrapped Interferometric Phase Imagery of Sea Ice. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-31438-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31437-7
Online ISBN: 978-3-031-31438-4
eBook Packages: Computer ScienceComputer Science (R0)