What Is Holding Back Convnets for Detection?

Pepik, Bojan; Benenson, Rodrigo; Ritschel, Tobias; Schiele, Bernt

doi:10.1007/978-3-319-24947-6_43

Bojan Pepik¹⁷,
Rodrigo Benenson¹⁷,
Tobias Ritschel¹⁷ &
…
Bernt Schiele¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9358))

Included in the following conference series:

German Conference on Pattern Recognition

2407 Accesses
26 Citations

Abstract

Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects “what did the network learn?”, and “what can the network learn?”. We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnets are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings as data augmentation. We report the best known results on Pascal3D+ detection and view-point estimation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Depthwise Separable Convolutions and Variational Dropout within the context of YOLOv3

Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection

Do Semantic Parts Emerge in Convolutional Neural Networks?

Article Open access 17 October 2017

References

Agrawal, P., Girshick, R., Malik, J.: Analyzing the performance of multilayer neural networks for object recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 329–344. Springer, Heidelberg (2014)
Chapter Google Scholar
Bengio, Y., Delalleau, O.: On the expressive power of deep architectures. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 18–36. Springer, Heidelberg (2011)
Chapter Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Google Scholar
Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: NIPS (2014)
Google Scholar
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: NIPS, pp. 2933–2941 (2014)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Enzweiler, M., Gavrila, D.M.: A mixed generative-discriminative framework for pedestrian classification. In: CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The 2007 Pascal Visual Object Classes Challenge. Springer-Verlag, Berlin (2007)
Google Scholar
Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. Arxiv. No. 1405.5769 (2015). http://lmb.informatik.uni-freiburg.de//Publications/2015/FDIB15
Fischer, P., Dosovitskiy, A., Brox, T.: Descriptor matching with convolutional neural networks: a comparison to sift (2014). ar**v:1405.5769
Girshick, R.: Fast R-CNN (2015). ar**v:1504.08083
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. ar**v (2014)
Google Scholar
Goodfellow, I., Le, Q., Saxe, A., Ng, A.Y.: Measuring invariances in deep networks. In: NIPS (2009)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). ar**v:1502.03167
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Le, Q.V., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: ICML (2012)
Google Scholar
Lenc, K., Vedaldi, A.: Understanding image representations by measuring their equivariance and equivalence. In: CVPR (2015)
Google Scholar
Li, H., Li, Y., Porikli, F.: Robust online visual tracking with a single convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 194–209. Springer, Heidelberg (2015)
Chapter Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, November 2015
Google Scholar
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR, June 2015
Google Scholar
Pepik, B., Stark, M., Gehler, P., Ritschel, T., Schiele, B.: 3D object class detection in the wild. In: 3DSI in Conjunction with CVPR (2015)
Google Scholar
Pepik, B., Stark, M., Gehler, P., Schiele, B.: Multi-view and 3D deformable part models. TPAMI (2015)
Google Scholar
Pishchulin, L., Jain, A., Andriluka, M., Thormaehlen, T., Schiele, B.: Articulated people detection and pose estimation: resha** the future. In: CVPR, June 2012
Google Scholar
Razavian, A.S., Azizpour, H., Maki, A., Sullivan, J., Ek, C.H., Carlsson, S.: Persistent evidence of local image properties in generic convnets (2014). ar**v:1411.6509
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops, pp. 512–519. IEEE (2014)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering (2015). ar**v:1503.03832
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR Workshop (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR (2015)
Google Scholar
Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC, vol. 2, p. 5 (2010)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions (2014). ar**v preprint ar**v:1409.4842
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR, pp. 1521–1528. IEEE (2011)
Google Scholar
Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. In: IJCV (2013)
Google Scholar
**ang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: WACV (2014)
Google Scholar
**e, S., Tu, Z.: Holistically-nested edge detection (2015). ar**v:1504.06375
Xu, J., Vazquez, D., Lopez, A.M., Marin, J., Ponsa, D.: Learning a part-based pedestrian detector in a virtual world. IEEE Trans. Intell. Transp. Syst. 15(5), 2121–2131 (2014)
Article Google Scholar
Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: CVPR, June 2015
Google Scholar
Zhu, X., Vondrick, C., Ramanan, D., Fowlkes, C.: Do we need more training data or better models for object detection? In: BMVC (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute for Informatics, Saarbrücken, Germany
Bojan Pepik, Rodrigo Benenson, Tobias Ritschel & Bernt Schiele

Authors

Bojan Pepik
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Benenson
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Ritschel
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bojan Pepik .

Editor information

Editors and Affiliations

Institute of Computer Science III, University of Bonn, Bonn, Germany
Juergen Gall
MPI for Intelligent Systems, University of Tübingen, Tübingen, Germany
Peter Gehler
Computer Vision Group, Visual Computing Institute, RWTH Aachen, Aachen, Germany
Bastian Leibe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pepik, B., Benenson, R., Ritschel, T., Schiele, B. (2015). What Is Holding Back Convnets for Detection?. In: Gall, J., Gehler, P., Leibe, B. (eds) Pattern Recognition. DAGM 2015. Lecture Notes in Computer Science(), vol 9358. Springer, Cham. https://doi.org/10.1007/978-3-319-24947-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-24947-6_43
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24946-9
Online ISBN: 978-3-319-24947-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

What Is Holding Back Convnets for Detection?

Abstract

Access this chapter

Similar content being viewed by others

Depthwise Separable Convolutions and Variational Dropout within the context of YOLOv3

Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection

Do Semantic Parts Emerge in Convolutional Neural Networks?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

What Is Holding Back Convnets for Detection?

Abstract

Access this chapter

Similar content being viewed by others

Depthwise Separable Convolutions and Variational Dropout within the context of YOLOv3

Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection

Do Semantic Parts Emerge in Convolutional Neural Networks?

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation