Distance Based Image Classification: A solution to generative classification’s conundrum?

Lin, Wen-Yan; Liu, Siying; Dai, Bing Tian; Li, Hongdong

doi:10.1007/s11263-022-01675-9

Distance Based Image Classification: A solution to generative classification’s conundrum?

Published: 12 October 2022

Volume 131, pages 177–198, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Wen-Yan Lin ORCID: orcid.org/0000-0002-1681-6595¹,
Siying Liu²,
Bing Tian Dai¹ &
…
Hongdong Li³

585 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s Wen-Yan et al. (IEEE Trans Pattern Anal Mach Intell, 2021) hierarchical generative process and non-semantic factors by an instance specific noise term. We use the model to develop a classification scheme which suppresses the impact of noise while preserving semantic cues. The result is a surprisingly accurate generative classifier, that takes the form of a modified nearest-neighbor algorithm; we term it distance classification. Unlike discriminative classifiers, a distance classifier: defines semantics by what-they-are; is amenable to incremental updates; and scales well with the number of classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random clustering ferns for multimodal object recognition

Article 08 April 2016

Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

SceneNet: A Perceptual Ontology for Scene Understanding

Notes

Image sample spaces may be so large that most potential images never exist. If so, it is possible that even a dataset containing all current images, will fail to densely populate an image sample space.
A generator-mean can be estimated by averaging a large number of its generated instances. As noise has a mean of zero, this estimate is unbiased.
Unlike in Eq. (22), in this context, ${\mathbf {E}}$ need not be the generator-of-everything. For example, if the dataset consists of different cat species, ${\mathbf {E}}$ would be the feline generator.
Our evaluation focuses only on the top one-class learning algorithms for these datasets. Comparisons with other one-class learning algorithms can be found in Lin et al. (2022b).
Our evaluation focuses only on the top one-class learning algorithms for these datasets. Comparisons with other one-class learning algorithms can be found in Lin et al. (2022b).
Code is available at: https://www.kind-of-works.com/

References

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.
Article Google Scholar
Aggarwal, C.C., Hinneburg, A., & Keim, D.A. (2001). On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory (pp. 420–434). Springer.
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In International conference on database theory (pp. 217–235). Springer.
Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X., & Oord, A. V. D. (2020). Are we done with imagenet? ar**v preprint ar**v:2006.07159
Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101–mining discriminative components with random forests. In Eur. conf. comput. vis. (pp. 446–461). Springer.
Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM (JACM), 58(3), 1–37.
Article MathSciNet MATH Google Scholar
Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C., & Alahari, K. (2018). End-to-end incremental learning. In Eur. conf. comput. vis. (pp. 233–248).
Chen, Y., Zhou, X. S., & Huang, T. S. (2001). One-class svm for learning in image retrieval. In IEEE int. conf. image process. (pp. 34–37). Citeseer.
Chen, Z., & Liu, B. (2018). Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3), 1–207.
Article Google Scholar
Coates, A., Ng, A., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the 14th international conference on artificial intelligence and statistics (pp 215–223).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conf. comput. vis. pattern recog. (pp. 248–255). IEEE.
Elson, J., Douceur, J. R., Howell, J., & Saul, J. (2007). Asirra: A captcha that exploits interest-aligned manual image categorization. In Proceedings of 14th ACM conference on computer and communications security (CCS). Association for Computing Machinery, Inc.
Hayes, T. L., Kafle, K., Shrestha, R., Acharya, M., & Kanan, C. (2019). Remind your neural network to prevent catastrophic forgetting. In Eur. conf. comput. vis.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conf. comput. vis. pattern recog. (pp. 770–778).
Hearst, M. A. (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18–28.
Article Google Scholar
Hyvönen, V., Pitkänen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., & Roos, T. (2016). Fast nearest neighbor search through sparse random projections and voting. In Big data (big data), 2016 IEEE international conference on (pp. 881–888). IEEE.
Jääsaari, E., Hyvönen, V., & Roos, T. (2019). Efficient autotuning of hyperparameters in approximate nearest neighbor search. In Pacific-Asia conference on knowledge discovery and data mining: Springer (in press).
Book Google Scholar
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., 7, 881–892.
Article MATH Google Scholar
Kuo, Y. H., Lin, H. T., Cheng, W. H., Yang, Y. H., & Hsu, W. H. (2011). Unsupervised auxiliary visual words discovery for large-scale image object retrieval. In IEEE conf. comput. vis. pattern recog. (pp. 905–912).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Ledent, A., Alves, R., Lei, Y., & Kloft, M. (2021). Fine-grained generalization analysis of inductive matrix completion. Advances in Neural Information Processing Systems, 34, 25540–25552.
Google Scholar
Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Adv. Neural Inform. Process. Syst. (pp. 7167–7177).
Lin, W. -Y., Liu, Z., & Liu, S. (2022a). Locally varying distance transform for unsupervised visual anomaly detection. In Proceedings of the European Conference of Computer Vision (to appear).
Lin, W. Y., Liu, S., Lai, J. H., & Matsushita, Y. (2018). Dimensionality’s blessing: Clustering images by underlying distribution. In IEEE conf. comput. vis. pattern recog. (pp. 5784–5793).
Lin, W.-Y., Liu, S., Ren, C., Cheung, N.-M., Li, H., & Matsushita, Y. (2022b). Shell theory: A statistical model of reality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6438–6453. https://doi.org/10.1109/TPAMI.2021.3084598.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Int. conf. comput. vis. (Vol. 2, pp. 1150–1157). IEEE.
Lu, Z., Sreekumar, G., Goodman, E., Banzhaf, W., Deb, K., & Boddeti, V. N. (2021). Neural architecture transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Markopoulos, P. P., Kundu, S., Chamadia, S., & Pados, D. A. (2017). Efficient l1-norm principal-component analysis via bit flip**. IEEE Transactions on Signal Processing, 65(16), 4252–4264.
Article MathSciNet MATH Google Scholar
Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Adv. neural inform. process. syst. (pp. 841–848).
Nilsback, M. E., & Zisserman, A. (2006). A visual vocabulary for flower classification. In IEEE conf. comput. vis. pattern recog, (Vol. 2, pp. 1447–1454). IEEE.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883.
Article Google Scholar
Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61–74.
Google Scholar
Rajasegaran, J., Khan, S., Hayat, M., Khan, F. S., & Shah, M. (2020). itaml: An incremental task-agnostic meta-learning approach. In IEEE conf. comput. vis. pattern recog. (pp. 13588–13597).
Rao, D., Visin, F., Rusu, A., Pascanu, R., Teh, Y. W., & Hadsell, R. (2019). Continual unsupervised representation learning. In Adv. neural inform. process. syst. (pp. 7647–7657).
Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). icarl: Incremental classifier and representation learning. In IEEE conf. comput. vis. pattern recog. (pp. 2001–2010).
Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 616–623).
Retrieved May 15, 2022, from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press.
van de Ven, G. M., Li, Z., & Tolias, A. S. (2021). Class-incremental learning with generative classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 3611–3620).
Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. (2019). Large scale incremental learning. In IEEE conf. comput. vis. pattern recog. (pp. 374–382).
Wu, L., Ganesh, A., Shi, B., Matsushita, Y., Wang, Y., & Ma, Y. (2010). Robust photometric stereo via low-rank matrix completion and recovery. In Asian conference on computer vision (pp. 703–717). Springer.
**ao, J., Hays, J., Ehinger, K.A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE conf. comput. vis. pattern recog. (pp. 3485–3492). IEEE.
**ao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ar**v preprint ar**v:1708.07747
Yoon, J., Yang, E., Lee, J., & Hwang, S. J. (2017). Lifelong learning with dynamically expandable networks. In Int. conf. learn. represent.
Zhang, H. (2005). Exploring conditions for the optimality of naive bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(02), 183–198.
Article Google Scholar
Zhou, B., Lapedriza, A., **ao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Adv. Neural Inform. Process. Syst. (pp. 487–495).

Download references

Acknowledgements

We would like to thank Ng Hongwei of Blackmagic Design for many hours of fruitful discussions; and the Lee Kong Chian foundation for supporting our work.

Author information

Authors and Affiliations

Singapore Management University, Singapore, Singapore
Wen-Yan Lin & Bing Tian Dai
Institute for Infocomm Research, Singapore, Singapore
Siying Liu
Australia National University, Canberra, ACT, Australia
Hongdong Li

Authors

Wen-Yan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Siying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bing Tian Dai
View author publications
You can also search for this author in PubMed Google Scholar
Hongdong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Yan Lin.

Additional information

Communicated by Zhouchen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The main body focuses on develo** the best possible distance classifier. In the appendix, we relate our proposed solution to some popular nearest-neighbor alternatives.

1.1 Nearest-Neighbor Classification

Finally, no discussion of distance based classification can be complete without mentioning the nearest-neighbor classifier. This section introduces reference based noise cancelled nearest-neighbor classifier. Figure 8 shows that (similar to our other distance based classifier), noise cancellation greatly improves the nearest neighbor classifier’s validation capability. The derivation is provided below.

Table 7 Comparing distance classification with traditional nearest-neighbors

Full size table

Let ${\mathbf {E}}$ be the ideal, common generator of all instances in a dataset and ${\mathbf {m}}$ be its mean. As ${\mathbf {m}}$ is constant relative to the process of generating individual instances, from Eq. (3) the distance of all ideal instances to ${\mathbf {m}}$ will be a constant, which we denote as $c_{\mathbf {m}}$. i.e. , if ${\mathbf {x}}_t, \, {\mathbf {x}}_{t'}$ are two ideal instances

$$\begin{aligned} a.s. \quad \Vert {\mathbf {x}}_t - {\mathbf {m}}\Vert = \Vert {\mathbf {x}}_{t'} - {\mathbf {m}}\Vert = c_{\mathbf {m}}. \end{aligned}$$

(44)

From Eq. (16), the distance of the noisy features ${\mathbf {x}}(t), \, {\mathbf {x}}(t')$ from ${\mathbf {m}}$ is:

$$\begin{aligned} a.s. \quad&\Vert {\mathbf {x}}(t) - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_t - {\mathbf {m}}\Vert ^2+ \Vert {\mathbf {n}}(t)\Vert ^2 = c_{\mathbf {m}}+\Vert {\mathbf {n}}(t)\Vert ^2 , \nonumber \\ a.s. \quad&\Vert {\mathbf {x}}(t') - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_{t'} - {\mathbf {m}}\Vert ^2 + \Vert {\mathbf {n}}(t')\Vert ^2 = c_{\mathbf {m}}+\Vert {\mathbf {n}}(t')\Vert ^2. \end{aligned}$$

(45)

From Eq. (17), the distance of ${\mathbf {x}}(t)$ from ${\mathbf {x}}(t')$ is

$$\begin{aligned} \Vert {\mathbf {x}}(t) - {\mathbf {x}}(t')\Vert ^2 \approx \Vert {\mathbf {x}}_t - {\mathbf {x}}_{t'} \Vert ^2 + \Vert {\mathbf {n}}(t)\Vert ^2 + \Vert {\mathbf {n}}(t')\Vert ^2. \end{aligned}$$

(46)

Thus, combining Eqs. (45) and (46), a noise cancelled nearest-neighbor distance be can be defined as:

$$\begin{aligned}&f^2({\mathbf {x}}(t), {\mathbf {x}}(t'), {\mathbf {m}}) \nonumber \\&\quad = \Vert {\mathbf {x}}(t) - {\mathbf {x}}(t') \Vert ^2 - \Vert {\mathbf {x}}(t) - {\mathbf {m}}\Vert ^2 - \Vert {\mathbf {x}}(t') - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_t - {\mathbf {x}}_{t'}\Vert ^2 - 2c_{\mathbf {m}}, \end{aligned}$$

(47)

where $f^2({\mathbf {x}}(t), {\mathbf {x}}(t'), {\mathbf {m}})$ approximates the ideal squared distance with a constant offset, $ -2c_{\mathbf {m}}$.

In terms of practical effectiveness, reference based noise canceled nearest-neighbor, is similar to that of standard normalization. However, the ability to achieve a normalization like effect with a different algorithm, helps validate our theory. It also provides a chance to reconsider, the classic algorithm of undergraduate textbooks, from a new perspective.

1.2 Distance vs Nearest-Neighbor Classification

Finally, it would be instructive to compare our distance classifier with a traditional nearest neighbor classification algorithm. For this task, we consider both euclidean and cosine distance classifiers with and without normalization/centering. They against our distance classifier in Table 7, which reports both classification accuracy and validation AUPRC.

As predicted, all the nearest-neighbor classification accuracies are respectable, with only minor variations across algorithms. However, there are large differences in AUPRC. For Euclidean nearest-neighbor, normalization significantly improves AUPRC, which shell theory and Sect. 4.3 explain in terms of a noise cancellation effect. A similar improvement occurs when applying the cosine distance to centered data-points. However, we do not offer an analytical explanation because cosine distances are not translational invariant; and thus, cannot be trivially analyzed using shell theory.

Although normalization significantly improves validation AUPRC of traditional nearest-neighbor algorithms, the scores remain significantly below that of our distance classifier. If normalization cannot be employed (such as in the context of strictly incremental learning), our distance classifier will have significantly high validation AUPRC than either of the traditional nearest-neighbor techniques.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, WY., Liu, S., Dai, B.T. et al. Distance Based Image Classification: A solution to generative classification’s conundrum?. Int J Comput Vis 131, 177–198 (2023). https://doi.org/10.1007/s11263-022-01675-9

Download citation

Received: 03 August 2021
Accepted: 13 August 2022
Published: 12 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11263-022-01675-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distance Based Image Classification: A solution to generative classification’s conundrum?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Random clustering ferns for multimodal object recognition

Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

SceneNet: A Perceptual Ontology for Scene Understanding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Nearest-Neighbor Classification

1.2 Distance vs Nearest-Neighbor Classification

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Distance Based Image Classification: A solution to generative classification’s conundrum?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Random clustering ferns for multimodal object recognition

Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

SceneNet: A Perceptual Ontology for Scene Understanding

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Nearest-Neighbor Classification

1.2 Distance vs Nearest-Neighbor Classification

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation