On the Relevance of Temporal Features for Medical Ultrasound Video Recognition

Smith, D. Hudson; Lineberger, John Paul; Baker, George H.

doi:10.1007/978-3-031-43895-0_70

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14221))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

3804 Accesses

Abstract

Many medical ultrasound video recognition tasks involve identifying key anatomical features regardless of when they appear in the video suggesting that modeling such tasks may not benefit from temporal features. Correspondingly, model architectures that exclude temporal features may have better sample efficiency. We propose a novel multi-head attention architecture that incorporates these hypotheses as inductive priors to achieve better sample efficiency on common ultrasound tasks. We compare the performance of our architecture to an efficient 3D CNN video recognition model in two settings: one where we expect not to require temporal features and one where we do. In the former setting, our model outperforms the 3D CNN - especially when we artificially limit the training data. In the latter, the outcome reverses. These results suggest that expressive time-independent models may be more effective than state-of-the-art video recognition models for some common ultrasound tasks in the low-data regime. Code is available at https://github.com/MedAI-Clemson/pda_detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography

Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

References

Amirian, S., Rasheed, K., Taha, T.R., Arabnia, H.R.: Automatic image and video caption generation with deep learning: a concise review and algorithmic overlap. IEEE Access 8, 218386–218400 (2020)
Article Google Scholar
Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: a survey of problem characteristics and applications. Pattern Recogn. 77, 329–353 (2018)
Article Google Scholar
Chen, H., et al.: automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 507–514. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_62
Chapter Google Scholar
Dezaki, F.T., et al.: Deep residual recurrent neural networks for characterisation of cardiac cycle phase from echocardiograms. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 100–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_12
Chapter Google Scholar
Ding, X., Li, B., Hu, W., **ong, W., Wang, Z.: Horror video scene recognition based on multi-view multi-instance learning. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7726, pp. 599–610. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37431-9_46
Chapter Google Scholar
Gu, Z., Mei, T., Hua, X.S., Tang, J., Wu, X.: Multi-layer multi-instance learning for video concept detection. IEEE Trans. Multimedia 10(8), 1605–1616 (2008)
Article Google Scholar
Heo, B., et al.: Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. ar**v preprint ar**v:2006.08217 (2020)
Howard, J.P., et al.: Improving ultrasound video classification: an evaluation of novel deep learning methods in echocardiography. J. Med. Artif. Intell. 3 (2020)
Google Scholar
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: International Conference on Machine Learning, pp. 2127–2136. PMLR (2018)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Article Google Scholar
Kornblith, A.E., et al.: Development and validation of a deep learning strategy for automated view classification of pediatric focused assessment with sonography for trauma. J. Ultrasound Med. 41(8), 1915–1924 (2022)
Article Google Scholar
Lei, H., Ashrafi, A., Chang, P., Chang, A., Lai, W.: Patent ductus arteriosus (PDA) detection in echocardiograms using deep learning. Intell.-Based Med. 6, 100054 (2022)
Google Scholar
Liu, S., et al.: Deep learning in medical ultrasound analysis: a review. Engineering 5(2), 261–275 (2019)
Article Google Scholar
Luo, W., **ng, J., Milan, A., Zhang, X., Liu, W., Kim, T.K.: Multiple object tracking: a literature review. Artif. Intell. 293, 103448 (2021)
Article MathSciNet Google Scholar
Mazzia, V., Angarano, S., Salvetti, F., Angelini, F., Chiaberge, M.: Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recogn. 124, 108487 (2022)
Article Google Scholar
Ouyang, D., et al.: Video-based AI for beat-to-beat assessment of cardiac function. Nature 580(7802), 252–256 (2020)
Article Google Scholar
Patra, A., Huang, W., Noble, J.A.: Learning spatio-temporal aggregation for fetal heart analysis in ultrasound video. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 276–284. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_32
Chapter Google Scholar
Plizzari, C., Cannici, M., Matteucci, M.: Spatial temporal transformer network for skeleton-based action recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12663, pp. 694–701. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68796-0_50
Chapter Google Scholar
Pu, B., Li, K., Li, S., Zhu, N.: Automatic fetal ultrasound standard plane recognition based on deep learning and IIoT. IEEE Trans. Industr. Inf. 17(11), 7771–7780 (2021)
Article Google Scholar
Rasheed, K., Junejo, F., Malik, A., Saqib, M.: Automated fetal head classification and segmentation using ultrasound video. IEEE Access 9, 160249–160267 (2021)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inform. Process. Syst. 27 (2014)
Google Scholar
Sofka, M., Milletari, F., Jia, J., Rothberg, A.: Fully convolutional regression network for accurate detection of measurement points. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 258–266. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_30
Chapter Google Scholar
Stikic, M., Schiele, B.: Activity recognition from sparsely labeled data using multi-instance learning. In: Choudhury, T., Quigley, A., Strang, T., Suginuma, K. (eds.) LoCA 2009. LNCS, vol. 5561, pp. 156–173. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01721-6_10
Chapter Google Scholar
Taye, M., Morrow, D., Cull, J., Smith, D.H., Hagan, M.: Deep learning for fast quality assessment. J. Ultrasound Med. 42(1), 71–79 (2022)
Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Wightman, R.: Pytorch image models. https://github.com/rwightman/pytorch-image-models (2019). https://doi.org/10.5281/zenodo.4414861
**a, H., Zhan, Y.: A survey on temporal action localization. IEEE Access 8, 70477–70487 (2020)
Article Google Scholar
Yang, J., Yan, R., Hauptmann, A.G.: Multiple instance learning for labeling faces in broadcasting news video. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 31–40 (2005)
Google Scholar
Zhang, H.B., et al.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019)
Article Google Scholar

Download references

Acknowledgement

We thank Clemson University for their generous allotment of compute time on the Palmetto Cluster.

Author information

Authors and Affiliations

Clemson University, Clemson, SC, 29634, USA
D. Hudson Smith & John Paul Lineberger
Medical University of South Carolina, Charleston, SC, 29425, USA
George H. Baker

Authors

D. Hudson Smith
View author publications
You can also search for this author in PubMed Google Scholar
John Paul Lineberger
View author publications
You can also search for this author in PubMed Google Scholar
George H. Baker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Hudson Smith .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2688 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, D.H., Lineberger, J.P., Baker, G.H. (2023). On the Relevance of Temporal Features for Medical Ultrasound Video Recognition. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14221. Springer, Cham. https://doi.org/10.1007/978-3-031-43895-0_70

Download citation

DOI: https://doi.org/10.1007/978-3-031-43895-0_70
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43894-3
Online ISBN: 978-3-031-43895-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

On the Relevance of Temporal Features for Medical Ultrasound Video Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography

Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2688 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

On the Relevance of Temporal Features for Medical Ultrasound Video Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography

Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2688 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation