Log in

On the evaluation of generative models in music

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The modeling of artificial, human-level creativity is becoming more and more achievable. In recent years, neural networks have been successfully applied to different tasks such as image and music generation, demonstrating their great potential in realizing computational creativity. The fuzzy definition of creativity combined with varying goals of the evaluated generative systems, however, makes subjective evaluation seem to be the only viable methodology of choice. We review the evaluation of generative music systems and discuss the inherent challenges of their evaluation. Although subjective evaluation should always be the ultimate choice for the evaluation of creative results, researchers unfamiliar with rigorous subjective experiment design and without the necessary resources for the execution of a large-scale experiment face challenges in terms of reliability, validity, and replicability of the results. In numerous studies, this leads to the report of insignificant and possibly irrelevant results and the lack of comparability with similar and previous generative systems. Therefore, we propose a set of simple musically informed objective metrics enabling an objective and reproducible way of evaluating and comparing the output of music generative systems. We demonstrate the usefulness of the proposed metrics with several experiments on real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The deviation here refers to an element-wise standard deviation, which retains the dimension of each feature.

  2. https://www.hooktheory.com/theorytab.

  3. https://github.com/RichardYang40148/mgeval.

References

  1. Agarwala N, Inoue Y, Sly A (2017) Music composition using recurrent neural networks. Stanford University, Technical Report in CS224

  2. Ariza C (2009) The interrogator as critic: the turing test and the evaluation of generative music systems. Comput Music J 33(2):48–70

    Article  MathSciNet  Google Scholar 

  3. Asmus EP (1999) Music assessment concepts: a discussion of assessment concepts and models for student assessment introduces this special focus issue. Music Educ J 86(2):19–24

    Article  Google Scholar 

  4. Babbitt M (1960) Twelve-tone invariants as compositional determinants. Music Q 46(2):246–259

    Article  Google Scholar 

  5. Balaban M, Ebcioğlu K, Laske O (eds) (1992) Understanding music with AI: perspectives on music cognition. MIT Press, Cambridge

    Google Scholar 

  6. Bech S, Zacharov N (2007) Perceptual audio evaluation—theory, method and application. Wiley, London

    Google Scholar 

  7. Boot P, Volk A, de Haas WB (2016) Evaluating the role of repeated patterns in folk song classification and compression. J New Music Res 45(3):223–238

    Article  Google Scholar 

  8. Bretan M, Weinberg G, Heck L (2017) A unit selection methodology for music generation using deep neural networks. In: International conference on computational creativity (ICCC). Atlanta, Georgia, USA

  9. Briot JP, Hadjeres G, Pachet F (2019) Deep learning techniques for music generation—a survey. Springer, London

    Google Scholar 

  10. Chordia P, Rae A (2007) Raag recognition using pitch-class and pitch-class dyad distributions. In: International society of music information retrieval (ISMIR), pp 431–436. Vienna, Austria

  11. Chu H, Urtasun R, Fidler S (2016) Song from pi: a musically plausible network for pop music generation. In: International conference on learning representations (ICLR). San Juan, Puerto Rico

  12. Chuan CH, Herremans D (2018) Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In: Association for the advancement of artificial intelligence (AAAI). New Orleans, Louisiana, USA

  13. Colton S, Pease A, Ritchie G (2001) The effect of input knowledge on creativity. In: Technical reports of the Navy Center for Applied Research in Artificial Intelligence. Washington, DC, USA

  14. Dong HW, Hsiao WY, Yang LC, Yang YH (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Association for the advancement of artificial intelligence (AAAI). New Orleans, Louisiana, USA

  15. Gatys LA, Ecker AS, Bethge M (2016) A neural algorithm of artistic style. In: The annual meeting of the vision sciences society. St. Pete Beach, Florida, USA

  16. Geisser S (1993) Predictive inference, vol 55. CRC Press, Boca Raton

    Book  Google Scholar 

  17. Geman D, Geman S, Hallonquist N, Younes L (2015) Visual turing test for computer vision systems. Proc Natl Acad Sci 112(12):3618–3623

    Article  Google Scholar 

  18. Gero JS, Kannengiesser U (2004) The situated function–behaviour–structure framework. Des Stud 25(4):373–391

    Article  Google Scholar 

  19. Gurumurthy S, Sarvadevabhatla RK, Radhakrishnan VB (2017) Deligan: generative adversarial networks for diverse and limited data. In: IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, Hawaii, USA

  20. Guyot WM (1978) Summative and formative evaluation. J Bus Educ 54(3):127–129. https://doi.org/10.1080/00219444.1978.10534702

    Article  Google Scholar 

  21. Hadjeres G, Pachet F (2016) Deepbach: a steerable model for bach chorales generation. In: International conference on machine learning (ICML). New York City, NY, USA

  22. Hale CL, Green SK (2009) Six key principles for music assessment. Music Educ J 95(4):27–31. https://doi.org/10.1177/0027432109334772

    Article  Google Scholar 

  23. Henrik Norbeck’s abc tunes. Last accessed Mar 2018. http://www.norbeck.nu/abc/

  24. Huang CZA, Cooijmans T, Roberts A, Courville A, Eck D (2017) Counterpoint by convolution. In: International society of music information retrieval (ISMIR). Suzhou, China

  25. Huang KC, Jung Q, Lu J (2017) Algorithmic music composition using recurrent neural networking. Stanford University, Technical Report in CS221

  26. Huang X, Li Y, Poursaeed O, Hopcroft J, Belongie S (2016) Stacked generative adversarial networks. In: IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, Nevada, USA

  27. Johnson DD (2017) Generating polyphonic music using tied parallel networks. In: International conference on evolutionary and biologically inspired music and art, pp 128–143. Amsterdam, The Netherlands

  28. Jordanous A (2012) A standardised procedure for evaluating creative systems: computational creativity evaluation based on what it is to be creative. Cognit Comput 4(3):246–279

    Article  Google Scholar 

  29. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. In: International conference on learning representations (ICLR). Toulon, France

  30. Krumhansl C, Toiviainen P et al (2000) Dynamics of tonality induction: a new method and a new model. In: International conference on music perception and cognition (ICMPC). Keele, UK

  31. Lee K (2006) Automatic chord recognition from audio using enhanced pitch class profile. In: International computer music conference (ICMC). New Orleans, Louisiana, USA

  32. Liang F, Gotham M, Johnson M, Shotton J (2017) Automatic stylistic composition of bach chorales with deep lstm. In: International society of music information retrieval (ISMIR). Suzhou, China

  33. Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):5–55

    Google Scholar 

  34. Marsden A (2013) Music, intelligence and artificiality. In: Readings in music and artificial intelligence, pp 25–38. Routledge

  35. Meredith D (2016) Computational music analysis. Springer, Berlin

    Book  Google Scholar 

  36. Meyer LB (2008) Emotion and meaning in music. University of Chicago Press, Chicago

    Google Scholar 

  37. Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training. In: Advances in neural information processing systems, constructive machine learning workshop (NIPS CML). Barcelona, Spain

  38. Moog RA (1986) Midi: musical instrument digital interface. J Audio Eng Soc 34(5):394–404

    Google Scholar 

  39. Mroueh Y, Sercu T (2017) Fisher gan. In: Advances in neural information processing systems (NIPS). Long Beach, CA, USA

  40. O’Brien C, Lerch A (2015) Genre-specific key profiles. In: International computer music conference (ICMC). Denton, Texas, USA

  41. Pati KA, Gururani S, Lerch A (2018) Assessment of student music performances using deep neural networks. Appl Sci 8(4):507. https://doi.org/10.3390/app8040507. http://www.mdpi.com/2076-3417/8/4/507

  42. Pearce M, Meredith D, Wiggins G (2002) Motivations and methodologies for automation of the compositional process. Music Sci 6(2):119–147

    Article  Google Scholar 

  43. Pearce MT, Wiggins GA (2007) Evaluating cognitive models of musical composition. In: International joint workshop on computational creativity, pp 73–80. London, UK

  44. Pease A, Colton S (2011) On impact and evaluation in computational creativity: a discussion of the turing test and an alternative proposal. In: Proceedings of the AISB symposium on AI and philosophy, p 39. York, United Kingdom

  45. Pease T, Mattingly R (2003) Jazz composition: theory and practice. Berklee Press, Boston

    Google Scholar 

  46. Ritchie G (2007) Some empirical criteria for attributing creativity to a computer program. Minds Mach 17(1):67–99

    Article  Google Scholar 

  47. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems (NIPS). Barcelona, Spain

  48. Scott DW (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, Hoboken

    Book  Google Scholar 

  49. Shin A, Crestel L, Kato H, Saito K, Ohnishi K, Yamaguchi M, Nakawaki M, Ushiku Y, Harada T (2017) Melody generation for pop music via word representation of musical properties. ar**vpreprint ar**v:1710.11549

  50. Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton

    Book  Google Scholar 

  51. Simon I, Morris D, Basu S (2008) Mysong: automatic accompaniment generation for vocal melodies. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 725–734. Florence, Italy

  52. Sturm BL, Ben-Tal O (2017) Taking the models back to music practice: evaluating generative transcription models built using deep learning. J Creat Music Syst. https://doi.org/10.5920/JCMS.2017.09

    Article  Google Scholar 

  53. Temperley D, Marvin EW (2008) Pitch-class distribution and the identification of key. Music Percept Interdiscip J 25(3):193–212

    Article  Google Scholar 

  54. Theis L, van den Oord A, Bethge M (2016) A note on the evaluation of generative models. In: International conference on learning representations (ICLR). Caribe Hilton, San Juan, Puerto Rico. ar**v:1511.01844

  55. Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460

    Article  MathSciNet  Google Scholar 

  56. Turlach BA et al (1993) Bandwidth selection in kernel density estimation: a review. Université catholique de Louvain Louvain-la-Neuve

  57. Verbeurgt K, Dinolfo M, Fayer M (2004) Extracting patterns in music for composition via Markov chains. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 1123–1132. Springer, Ottawa, ON, Canada (2004)

  58. Waite E, Eck D, Roberts A, Abolafia D (2016) Project magenta: generating long-term structure in songs and stories. https://magenta.tensorflow.org/blog/2016/07/15/lookback-rnn-attention-rnn/

  59. Wu CW, Gururani S, Laguna C, Pati A, Vidwans A, Lerch A (2016) Towards the objective assessment of music performances. In: International conference on music perception and cognition (ICMPC). Hyderabad, AP, India

  60. Yang LC, Chou SY, Yang YH (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. In: International society of music information retrieval (ISMIR). Suzhou, China

  61. Zbikowski LM (2002) Conceptualizing music: cognitive structure, theory, and analysis. Oxford University Press, Oxford

    Book  Google Scholar 

  62. Zhang W, Wang J (2016) Design theory and methodology for enterprise systems. Enterp Inf Syst 10(3):245–248. https://doi.org/10.1080/17517575.2015.1080860

    Article  Google Scholar 

  63. Zhang WJ, Yang G, Lin Y, Ji C, Gupta MM (2018) On definition of deep learning. In: World automation congress (WAC). Stevenson, Washington, USA

  64. Zhou Z, Cai H, Rong S, Song Y, Ren K, Zhang W, Wang J, Yu Y (2018) Activation maximization generative adversarial nets. In: International conference on learning representations (ICLR). Vancouver, Canada

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Chia Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, LC., Lerch, A. On the evaluation of generative models in music. Neural Comput & Applic 32, 4773–4784 (2020). https://doi.org/10.1007/s00521-018-3849-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3849-7

Keywords

Navigation