Log in

Distance Based Image Classification: A solution to generative classification’s conundrum?

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Most classifiers rely on discriminative boundaries that separate instances of each class from everything else. We argue that discriminative boundaries are counter-intuitive as they define semantics by what-they-are-not; and should be replaced by generative classifiers which define semantics by what-they-are. Unfortunately, generative classifiers are significantly less accurate. This may be caused by the tendency of generative models to focus on easy to model semantic generative factors and ignore non-semantic factors that are important but difficult to model. We propose a new generative model in which semantic factors are accommodated by shell theory’s Wen-Yan et al. (IEEE Trans Pattern Anal Mach Intell, 2021) hierarchical generative process and non-semantic factors by an instance specific noise term. We use the model to develop a classification scheme which suppresses the impact of noise while preserving semantic cues. The result is a surprisingly accurate generative classifier, that takes the form of a modified nearest-neighbor algorithm; we term it distance classification. Unlike discriminative classifiers, a distance classifier: defines semantics by what-they-are; is amenable to incremental updates; and scales well with the number of classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Image sample spaces may be so large that most potential images never exist. If so, it is possible that even a dataset containing all current images, will fail to densely populate an image sample space.

  2. A generator-mean can be estimated by averaging a large number of its generated instances. As noise has a mean of zero, this estimate is unbiased.

  3. Unlike in Eq. (22), in this context, \({\mathbf {E}}\) need not be the generator-of-everything. For example, if the dataset consists of different cat species, \({\mathbf {E}}\) would be the feline generator.

  4. Our evaluation focuses only on the top one-class learning algorithms for these datasets. Comparisons with other one-class learning algorithms can be found in Lin et al. (2022b).

  5. Our evaluation focuses only on the top one-class learning algorithms for these datasets. Comparisons with other one-class learning algorithms can be found in Lin et al. (2022b).

  6. Code is available at: https://www.kind-of-works.com/

References

  • Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.

    Article  Google Scholar 

  • Aggarwal, C.C., Hinneburg, A., & Keim, D.A. (2001). On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory (pp. 420–434). Springer.

  • Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “nearest neighbor” meaningful? In International conference on database theory (pp. 217–235). Springer.

  • Beyer, L., Hénaff, O. J., Kolesnikov, A., Zhai, X., & Oord, A. V. D. (2020). Are we done with imagenet? ar**v preprint ar**v:2006.07159

  • Bossard, L., Guillaumin, M., & Gool, L. V. (2014). Food-101–mining discriminative components with random forests. In Eur. conf. comput. vis. (pp. 446–461). Springer.

  • Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM (JACM), 58(3), 1–37.

    Article  MathSciNet  MATH  Google Scholar 

  • Castro, F. M., Marín-Jiménez, M. J., Guil, N., Schmid, C., & Alahari, K. (2018). End-to-end incremental learning. In Eur. conf. comput. vis. (pp. 233–248).

  • Chen, Y., Zhou, X. S., & Huang, T. S. (2001). One-class svm for learning in image retrieval. In IEEE int. conf. image process. (pp. 34–37). Citeseer.

  • Chen, Z., & Liu, B. (2018). Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 12(3), 1–207.

    Article  Google Scholar 

  • Coates, A., Ng, A., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the 14th international conference on artificial intelligence and statistics (pp 215–223).

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conf. comput. vis. pattern recog. (pp. 248–255). IEEE.

  • Elson, J., Douceur, J. R., Howell, J., & Saul, J. (2007). Asirra: A captcha that exploits interest-aligned manual image categorization. In Proceedings of 14th ACM conference on computer and communications security (CCS). Association for Computing Machinery, Inc.

  • Hayes, T. L., Kafle, K., Shrestha, R., Acharya, M., & Kanan, C. (2019). Remind your neural network to prevent catastrophic forgetting. In Eur. conf. comput. vis.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conf. comput. vis. pattern recog. (pp. 770–778).

  • Hearst, M. A. (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18–28.

    Article  Google Scholar 

  • Hyvönen, V., Pitkänen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., & Roos, T. (2016). Fast nearest neighbor search through sparse random projections and voting. In Big data (big data), 2016 IEEE international conference on (pp. 881–888). IEEE.

  • Jääsaari, E., Hyvönen, V., & Roos, T. (2019). Efficient autotuning of hyperparameters in approximate nearest neighbor search. In Pacific-Asia conference on knowledge discovery and data mining: Springer (in press).

    Book  Google Scholar 

  • Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., 7, 881–892.

    Article  MATH  Google Scholar 

  • Kuo, Y. H., Lin, H. T., Cheng, W. H., Yang, Y. H., & Hsu, W. H. (2011). Unsupervised auxiliary visual words discovery for large-scale image object retrieval. In IEEE conf. comput. vis. pattern recog. (pp. 905–912).

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Ledent, A., Alves, R., Lei, Y., & Kloft, M. (2021). Fine-grained generalization analysis of inductive matrix completion. Advances in Neural Information Processing Systems, 34, 25540–25552.

    Google Scholar 

  • Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Adv. Neural Inform. Process. Syst. (pp. 7167–7177).

  • Lin, W. -Y., Liu, Z., & Liu, S. (2022a). Locally varying distance transform for unsupervised visual anomaly detection. In Proceedings of the European Conference of Computer Vision (to appear).

  • Lin, W. Y., Liu, S., Lai, J. H., & Matsushita, Y. (2018). Dimensionality’s blessing: Clustering images by underlying distribution. In IEEE conf. comput. vis. pattern recog. (pp. 5784–5793).

  • Lin, W.-Y., Liu, S., Ren, C., Cheung, N.-M., Li, H., & Matsushita, Y. (2022b). Shell theory: A statistical model of reality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6438–6453. https://doi.org/10.1109/TPAMI.2021.3084598.

  • Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Int. conf. comput. vis. (Vol. 2, pp. 1150–1157). IEEE.

  • Lu, Z., Sreekumar, G., Goodman, E., Banzhaf, W., Deb, K., & Boddeti, V. N. (2021). Neural architecture transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Markopoulos, P. P., Kundu, S., Chamadia, S., & Pados, D. A. (2017). Efficient l1-norm principal-component analysis via bit flip**. IEEE Transactions on Signal Processing, 65(16), 4252–4264.

    Article  MathSciNet  MATH  Google Scholar 

  • Ng, A., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Adv. neural inform. process. syst. (pp. 841–848).

  • Nilsback, M. E., & Zisserman, A. (2006). A visual vocabulary for flower classification. In IEEE conf. comput. vis. pattern recog, (Vol. 2, pp. 1447–1454). IEEE.

  • Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883.

    Article  Google Scholar 

  • Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61–74.

    Google Scholar 

  • Rajasegaran, J., Khan, S., Hayat, M., Khan, F. S., & Shah, M. (2020). itaml: An incremental task-agnostic meta-learning approach. In IEEE conf. comput. vis. pattern recog. (pp. 13588–13597).

  • Rao, D., Visin, F., Rusu, A., Pascanu, R., Teh, Y. W., & Hadsell, R. (2019). Continual unsupervised representation learning. In Adv. neural inform. process. syst. (pp. 7647–7657).

  • Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). icarl: Incremental classifier and representation learning. In IEEE conf. comput. vis. pattern recog. (pp. 2001–2010).

  • Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 616–623).

  • Retrieved May 15, 2022, from https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

  • Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39). Cambridge University Press.

  • van de Ven, G. M., Li, Z., & Tolias, A. S. (2021). Class-incremental learning with generative classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 3611–3620).

  • Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. (2019). Large scale incremental learning. In IEEE conf. comput. vis. pattern recog. (pp. 374–382).

  • Wu, L., Ganesh, A., Shi, B., Matsushita, Y., Wang, Y., & Ma, Y. (2010). Robust photometric stereo via low-rank matrix completion and recovery. In Asian conference on computer vision (pp. 703–717). Springer.

  • **ao, J., Hays, J., Ehinger, K.A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In IEEE conf. comput. vis. pattern recog. (pp. 3485–3492). IEEE.

  • **ao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ar**v preprint ar**v:1708.07747

  • Yoon, J., Yang, E., Lee, J., & Hwang, S. J. (2017). Lifelong learning with dynamically expandable networks. In Int. conf. learn. represent.

  • Zhang, H. (2005). Exploring conditions for the optimality of naive bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(02), 183–198.

    Article  Google Scholar 

  • Zhou, B., Lapedriza, A., **ao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Adv. Neural Inform. Process. Syst. (pp. 487–495).

Download references

Acknowledgements

We would like to thank Ng Hongwei of Blackmagic Design for many hours of fruitful discussions; and the Lee Kong Chian foundation for supporting our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Yan Lin.

Additional information

Communicated by Zhouchen Lin.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The main body focuses on develo** the best possible distance classifier. In the appendix, we relate our proposed solution to some popular nearest-neighbor alternatives.

1.1 Nearest-Neighbor Classification

Finally, no discussion of distance based classification can be complete without mentioning the nearest-neighbor classifier. This section introduces reference based noise cancelled nearest-neighbor classifier. Figure 8 shows that (similar to our other distance based classifier), noise cancellation greatly improves the nearest neighbor classifier’s validation capability. The derivation is provided below.

Fig. 8
figure 8

Nearest-neighbor classification on STL-10 (Coates et al., 2011) with raw and reference based noise cancelled distances. Noise cancelled distances provide significantly better validation, as shown by the AUROC score

Table 7 Comparing distance classification with traditional nearest-neighbors

Let \({\mathbf {E}}\) be the ideal, common generator of all instances in a dataset and \({\mathbf {m}}\) be its mean. As \({\mathbf {m}}\) is constant relative to the process of generating individual instances, from Eq. (3) the distance of all ideal instances to \({\mathbf {m}}\) will be a constant, which we denote as \(c_{\mathbf {m}}\). i.e. , if \({\mathbf {x}}_t, \, {\mathbf {x}}_{t'}\) are two ideal instances

$$\begin{aligned} a.s. \quad \Vert {\mathbf {x}}_t - {\mathbf {m}}\Vert = \Vert {\mathbf {x}}_{t'} - {\mathbf {m}}\Vert = c_{\mathbf {m}}. \end{aligned}$$
(44)

From Eq. (16), the distance of the noisy features \({\mathbf {x}}(t), \, {\mathbf {x}}(t')\) from \({\mathbf {m}}\) is:

$$\begin{aligned} a.s. \quad&\Vert {\mathbf {x}}(t) - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_t - {\mathbf {m}}\Vert ^2+ \Vert {\mathbf {n}}(t)\Vert ^2 = c_{\mathbf {m}}+\Vert {\mathbf {n}}(t)\Vert ^2 , \nonumber \\ a.s. \quad&\Vert {\mathbf {x}}(t') - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_{t'} - {\mathbf {m}}\Vert ^2 + \Vert {\mathbf {n}}(t')\Vert ^2 = c_{\mathbf {m}}+\Vert {\mathbf {n}}(t')\Vert ^2. \end{aligned}$$
(45)

From Eq. (17), the distance of \({\mathbf {x}}(t)\) from \({\mathbf {x}}(t')\) is

$$\begin{aligned} \Vert {\mathbf {x}}(t) - {\mathbf {x}}(t')\Vert ^2 \approx \Vert {\mathbf {x}}_t - {\mathbf {x}}_{t'} \Vert ^2 + \Vert {\mathbf {n}}(t)\Vert ^2 + \Vert {\mathbf {n}}(t')\Vert ^2. \end{aligned}$$
(46)

Thus, combining Eqs. (45) and (46), a noise cancelled nearest-neighbor distance be can be defined as:

$$\begin{aligned}&f^2({\mathbf {x}}(t), {\mathbf {x}}(t'), {\mathbf {m}}) \nonumber \\&\quad = \Vert {\mathbf {x}}(t) - {\mathbf {x}}(t') \Vert ^2 - \Vert {\mathbf {x}}(t) - {\mathbf {m}}\Vert ^2 - \Vert {\mathbf {x}}(t') - {\mathbf {m}}\Vert ^2 \nonumber \\&\quad \approx \Vert {\mathbf {x}}_t - {\mathbf {x}}_{t'}\Vert ^2 - 2c_{\mathbf {m}}, \end{aligned}$$
(47)

where \(f^2({\mathbf {x}}(t), {\mathbf {x}}(t'), {\mathbf {m}})\) approximates the ideal squared distance with a constant offset, \( -2c_{\mathbf {m}}\).

In terms of practical effectiveness, reference based noise canceled nearest-neighbor, is similar to that of standard normalization. However, the ability to achieve a normalization like effect with a different algorithm, helps validate our theory. It also provides a chance to reconsider, the classic algorithm of undergraduate textbooks, from a new perspective.

1.2 Distance vs Nearest-Neighbor Classification

Finally, it would be instructive to compare our distance classifier with a traditional nearest neighbor classification algorithm. For this task, we consider both euclidean and cosine distance classifiers with and without normalization/centering. They against our distance classifier in Table 7, which reports both classification accuracy and validation AUPRC.

As predicted, all the nearest-neighbor classification accuracies are respectable, with only minor variations across algorithms. However, there are large differences in AUPRC. For Euclidean nearest-neighbor, normalization significantly improves AUPRC, which shell theory and Sect. 4.3 explain in terms of a noise cancellation effect. A similar improvement occurs when applying the cosine distance to centered data-points. However, we do not offer an analytical explanation because cosine distances are not translational invariant; and thus, cannot be trivially analyzed using shell theory.

Although normalization significantly improves validation AUPRC of traditional nearest-neighbor algorithms, the scores remain significantly below that of our distance classifier. If normalization cannot be employed (such as in the context of strictly incremental learning), our distance classifier will have significantly high validation AUPRC than either of the traditional nearest-neighbor techniques.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, WY., Liu, S., Dai, B.T. et al. Distance Based Image Classification: A solution to generative classification’s conundrum?. Int J Comput Vis 131, 177–198 (2023). https://doi.org/10.1007/s11263-022-01675-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01675-9

Keywords

Navigation