1. Introduction

Kunqu Opera is one of the oldest forms of opera performed in China, dating back to the end of the Yuan Dynasty (1271 to 1368 AD) approximately 700 years ago. It dominated Chinese theatre from the sixteenth to the eighteenth centuries (see Figure 1) and is distinct in the virtuosity of its rhythmic patterns. It has been a dominant influence on more recent forms of local opera in China and is considered the mother of all Chinese operas, such as the Bei**g opera. In 2001, United Nations Educational, Cultural and Scientific Organization (UNESCO) proclaimed Kunqu Opera a Masterpiece of the Oral and Intangible Heritage of Humanity [1].

Figure 1
figure 1

Photo of the Kunqu ‘Hall of Longevity’. Playwright: Sheng Hong (1645 to 1704); actor and actress from Shanghai Kunqu Opera Troupe: Zheng-Ren Cai and **g-**an Zhang; performance venue: Hangzhou Theatre, China; photographer: Gen-Fang Chen; date taken: October 11, 2011.

Its composers wrote the musical scores of Kunqu Opera in Chinese using seven or ten characters, that is, in Gong-Che Notation (GCN) [2]. Several Kunqu Opera scores exist in the ancient literature [3]. In particular, there are three famous script sets that were popular during the Qing Dynasty (1636 to 1911 AD): Jiugong Dacheng Nanbei Ci Gongpu (A comprehensive anthology of texts and notation of the Southern and Northern opera tunes in nine modes, 1746 AD, with 4,466 musical works) [4], Nashuying Qupu (Nashu Studio Theatrical Music, 1792 AD, with more than 360 drama scripts) [5], and Nanci Dinglv (The Law of the Southern Word, 1720 AD, with 1,342 musical works) [6].

The risk of losing this rich cultural heritage must be urgently addressed, as Kunqu Opera is facing competition from mass culture, as well as a lack of audience interest since the eighteenth century because of the high level of technical knowledge required. Of the 400 arias regularly sung in opera performances in the mid-twentieth century, only a few dozen are performed now, such as Mudan Ting (The Peony Pavilion). Many Kunqu Opera works are available only as original manuscripts or photocopies, and their digitization and transformation into a machine-readable format is a pressing need.

Optical music recognition (OMR) refers to the automatic processing and analysis of images of musical notation. This process typically employs a scanner or other digital device to transform paper-based musical scores into a digital image, which is then processed, recognized, and automatically translated into a standard format for music files, such as the musical instrument digital interface (MIDI) format. OMR represents a comprehensive amalgamation of fields and methods, including musicology, artificial intelligence, image engineering, pattern recognition, and MIDI technology. OMR technology provides a way to convert paper-based scores and has numerous applications, including computer-assisted music teaching, digital music libraries, musical statistics, digital music image automatic classification, and synchronous music and audio communication.

The first published OMR work was conducted by Pruslin [7] at the Massachusetts Institute of Technology. Pruslin's system recognized a subset of Western Music Notation (WMN), primarily musical notes. Before the 1990s, many studies were performed [8, 9], and research had begun to focus on handwritten formats [911] and ancient (or folk) music, including medieval music [12], white mensural notation [13, 14], early music prints [15], Orthodox Hellenic Byzantine Music notation [16], and Greek traditional music [17]. During the same period, several commercial OMR software packages, such as Capella-scan, Optical Music easy Reader, Photo Score, Sharp Eye, Smart Score, and Vivaldi Scan, were developed. These reached the market for WMN, and their recognition rate exceeded 90% for commonly printed musical scores [18].

Some techniques and methods used in OMR technology research include projection [19], mathematical morphology [20], neural networks [21], fuzzy theory [22], genetic algorithms [23], high-level domain knowledge [24], graph grammar [25], and probability theory [26]. Most of these methods are intended for WMN scores and can convert paper-based scores to digital form (such as a MIDI sequence). Replacing the scores' manual input mode in this way will improve the rate of musical score digitization.

In this study, an OMR system for Kunqu Opera musical scores is presented. The remainder of this paper is organized as follows: Section 2 describes Kunqu Opera musical scores, Section 3 discusses the structure of the OMR system and some preliminary operations, Section 4 provides the experimental results for selected scores using the OMR system, and Section 5 offers ideas for future research.

2. Gong-Che Notation in Kunqu Opera musical scores

GCN is a type of musical notation that was developed and cultivated in East Asia, flourishing mainly in ancient China. GCN was invented in the Tang Dynasty (618 to 907 AD) [27] and became a popular form of music notation in China during the Song (960 to 1279 AD), Ming (1368 to 1644 AD), and Qing (1636 to 1911 AD) Dynasties. Spanning 1,000 years of use, it is a widely accepted musical notation in East Asia and resembles WMN in Europe. Many traditional Chinese musical manuscripts have been written in GCN [28], including the Kunqu Opera scripts. GCN represents musical notes using Chinese characters, with the basic GCN note pitch symbols represented by ten Chinese characters, as shown in Figure 2. The meaning of these characters is presented in Table 1.

Figure 2
figure 2

Ten GCN symbols of a pitch.

Table 1 Description of a GCN score's general symbols (pitch)

A Kunqu Opera GCN score contains contents such as the title, key signature, qupai (qupai is the general term for the tune names used in traditional Chinese lyrics and music writing, and every qupai singing tone has its own musical form, tone, and tonality), lyrics, and notes. An example of a GCN score is provided in Figure 3. Words constitute a primary component of a GCN score. GCN is scripted in the format of traditionally written Chinese: from top to bottom and from right to left. Rhythm marks are indicated to the right of the note characters.

Figure 3
figure 3

Musical score sheet of Nashu Studio Theatrical Music (p. 1983).

The title of a Kunqu Opera script is located in the first column, and the qupai is located in the second column, with the key signature in the top row. The lyrics with notes appear below the qupai, and the notes with the rhythm marks are to the upper right of the lyrics. A frame with four borders usually encircles all the columns. Figure 4 depicts this layout for a GCN score.

Figure 4
figure 4

Sample annotation for a sub-section of Figure 3., **anlv, a Chinese key signature in a musical work, and nine key signatures are commonly used in Kunqu Opera; , Bashing ganzhou, is a qupai, a fixed type of rhythm or melody in Chinese poetry and opera.

Commonly, a Chinese lyric word includes several GCN notes, as shown in Figure 5. The number of pitches in a Chinese lyric word ranges from 1 to 8, and the rhythm mark number of a pitch is from 0 to 2. Thus, words, pitches, and rhythms form a complex three-dimensional (3D) spatial relationship in a GCN score, with the first dimension containing lyrics, the second dimension containing pitches, and the third dimension containing rhythm marks. The font size of a Chinese lyric word is larger than the font size of a pitch, whose font size is in turn larger than that of a rhythm mark.

Figure 5
figure 5

Sample annotation for a sub-section of Figure  4 .

GCN does not mark a note's duration but instead gives the rhythm marks at regular intervals, with each rhythm mark being indicated in the corresponding note's upper right corner. The basic GCN note rhythm mark comprises eight symbols, as shown in Table 2, and two main types of rhythms are distinguishable: ‘ban’, the stronger beat, and ‘yan’, the weaker beat.

Table 2 Description of GCN score's general symbols (rhythm)

The difficulties encountered in the processing of GCN musical scores are due to the following: (1) the complex 3D relationship between symbols in a score's image, (2) symbols being written in different sizes, shapes, and intensities, (3) the variation in relative size between different components of a musical symbol, (4) different symbols appearing connected to each other and the same musical symbol appearing in separated components, and (5) difficulties in converting the musical information in GCN scores to other styles of musical notation, such as WMN.

3. Description of the OMR system for Kunqu Opera musical scores

The OMR system for Kunqu Opera (KOMR), shown in Figure 6, is an off-line optical recognition system comprising six independent stages: (1) image pre-processing, (2) document image segmentation, (3) feature extraction, (4) musical symbol recognition, (5) musical semantics, and (6) MIDI representation.

Figure 6
figure 6

The KOMR system architecture.

Because of the maturity of image digitizing technology [29], paper-based Kunqu Opera musical scores can easily be converted into digital images. Therefore, this paper does not discuss the image acquisition stage. The system processes a gray-level bitmap from a scanner or reads directly from image files, with the input bitmap generally being 300 dpi with 256 gray levels.

3.1 Image pre-processing

Pre-processing might involve any of the standard image-processing operations, including noise removal, blurring, de-skewing, contrast adjustment, sharpening, binarization, or morphology. Many operations may be necessary to prepare a raw input image for recognition, such as the selection of an interesting area, the elimination of nonmusical elements, image binarization, and the correction of image rotation.

Many documents, particularly historical documents such as those in Kunqu Opera GCN scores, rely on careful pre-processing to ensure good overall system performance and, in some cases, to significantly improve the recognition performance. In this work, we briefly touch upon the basic pre-processing operations of Kunqu Opera GCN scores, such as binarization, noise removal, skew correction, and the selection of an area of interest. Image binarization uses the algorithm of Otsu [30], noise removal is conducted using basic morphological operations [31], and image rotation has been corrected using the least squares criterion [32]. The area of interest in a Kunqu score is surrounded by a frame (wide lines) (see Figure 3). The frame is a connected component containing the longest line in the score, so we use the Hough transform [33] to locate the longest line then delete the connected component with the longest line from the score, leaving the area of interest. Because these are common image-processing operations, we do not give a detailed description here.

3.2 Document segmentation and analysis of GCN scores

Document segmentation is a key phase in the KOMR system. With the symbols in a GCN score document having been classified, this stage first segments the document into two sections, one including symbols for the notes and the other for the non-notes. Elements of non-note symbols, such as the title, key signature, qupai, lyrics, noise, and border lines of a textural framework, are then identified and removed.

Because music is a time-based art, the arrangement of the notes is one of its most important factors. Therefore, obtaining the arrangement of the notes in a GCN score is requisite to document the segmentation. Concordant with the writing style of the GCN score, the arrangement of the notes can be organized based on high-level field knowledge of Kunqu Opera scores.

Several document image segmentation methods have been proposed, with the best known being X-Y projection [34], run length smoothing algorithm (RLSA) [35], component grou** [36], scale-space analysis [37], Voronoi tessellation [38], and the Hough transform [39]. These methods are suitable for handwritten document segmentation, such as for GCN scores, which are compatible with X-Y projection, RLSA, scale-space analysis, and the Hough transform [40].

A preliminary result of GCN score segmentation has been presented in [41]. A self-adaptive RLSA was used to segment the image according to an X-axis function (denoted by PF(x)) indicating the number of foreground pixels in each column of the image. This X-axis function uses X-projection to compute the number of flex points that satisfy the following conditions: PF(x - 1) < PF(x) and PF(x) > PF(x + 1) or PF(x - 1) > PF(x) and PF(x) < PF(x + 1). Next, the algorithm iteratively smoothes the function and analyzes the next smoothed function to ensure that the number of flex points in both functions is equal. Finally, the image is segmented into several sub-images based on the X-axis values for flex points in the function. To extract notes from the image, all connected components are identified using a conventional connected component labeling algorithm, and the minimum bounding box of all connected components is computed. Next, the algorithm matches each connected component to its corresponding sub-image. According to the experimental results in [41], the rate of correct segmentation in lines is 98.9%, and the loss rate of notes is 2.7%; however, the total error rate of all notes and lyrics is almost 22%.

3.3 Symbol feature extraction

Selecting suitable features for pattern classes is a critical process in the OMR system. Feature representation is also a crucial step, because perfect feature data can effectively enhance the symbol recognition rate. The goal of feature extraction is to characterize a symbol to be recognized by measurements whose values are highly similar for symbols in the same category, but different for symbols in different categories. The feature data must also be invariant during relevant transformations, such as translation, rotation, and scaling.

Popular feature extraction methods for OCR and Chinese character recognition include peripheral feature [42], cellular feature [43], and so on [44]. Because symbols are written in different sizes in a GCN score, to construct a simple and intuitive approach, four types of structural features have been used in this exploratory work based on the reference [42, 43]; these are suited to feature extraction of symbols in a GCN score and to compare with the recognition rate of KOMR. An n × m matrix featuring symbol data is used to obtain symbols of the same size as in the feature matrix. These are obtained using a grid to segment the symbols [45]. In Figure 7, a sample symbol with size H × W is shown in sub-graph (a), and an n × m grid is shown in sub-graph (b). In this example, h 0 = 0 , h n = H , h i = H n × i , w 0 = 0 , w m = W , w j = W m × j , and the four features of each symbol are given by the following equations:

T 1 = , , , t i , j , , , , t i , j = x = w i - 1 w i y = h j - 1 h j f x , y , 1 i m , 1 j n
(1)
T 2 = , , , t i , j , , , , t i , j = x = w i - 1 w i f x , h j , 1 i m , 1 j n
(2)
T 3 = , , , t i , j , , , , t i , j = y = h j - 1 h j f w i , y , 1 i m , 1 j n
(3)
T 4 = , , , t i , j , , , , t i , j = 0 , for 1 < i < m and 1 < j < n x , if k = 0 x f k , h j = 0 and f x + 1 , h j = 1 , for j = 1 W - x , if k = x w m f k , h j = 0 and f x - 1 , h j = 1 , for j = m x , if k = 0 x f w i , k = 0 and f w i , x + 1 = 1 , for i = 1 H - x , if k = x h n f w i , k = 0 and f w i , x - 1 = 1 , for i = n
(4)

where f(x, y) is the value of a pixel at coordinates (x, y) in an image. If f(x, y) = 1, then the pixel is a foreground pixel; otherwise, 0 indicates a background pixel. The T1 feature is the number of pixels in each cell of the grid as a cellular feature, T2 feature appears to be the number of pixels falling on the upper edge of each grid cell, T3 is the number of pixels falling on the right edge of each grid cell, and T4 is the number to be some measure of how many background pixels there are from the edge to the first foreground pixel, if f(1,1) = 1, then t1,1 = 0; if f(n,1) = 1, then tn,1 = 0; if f(1,m) = 1, then t1,m = 0; if f(n,m) = 1, then tn,m = 0; and if there is no foreground pixel in the image, then ti,j = 0, for 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Figure 7
figure 7

Matrix featuring data of a musical symbol using an n×m grid partition (a to d).

3.4 Musical symbol recognition

Computer recognition of handwritten characters has been intensely researched for many years. Optical character recognition (OCR) is an active field, particularly for handwritten documents in such languages as Roman [45], Arabic [46], Chinese [47], and Indian [48].

Several Chinese character recognition methods have been proposed, with the best known being the transformation invariant matching algorithm [49], adaptive confidence transform based classifier combination [50], probabilistic neural networks [51], radical decomposition [52], statistical character structure modeling [53], Markov random fields [54], and affine sparse matrix factorization [55].

The basic symbols for musical pitch (see Figure 2) in a Kunqu Opera GCN score are Chinese characters, but other musical pitch symbols and all rhythm symbols are specialized symbols for GCN. Thus, the methods of Kunqu Opera GCN score recognition refer to the techniques of both OMR and OCR. In this paper, the following three approaches to musical symbol recognition are compared.

3.4.1 K-nearest neighbor

The K-nearest neighbor (KNN) classifier is one of the simplest machine learning algorithms. It classifies foregrounds based on the closest training examples in the feature space and is a type of instance-based, or lazy, learning in which the function is only approximated locally, and all computation is deferred until classification [56]. The neighbors are selected from a set of foregrounds for which the correct classification is known, a set that can be considered the training set for the algorithm, though no explicit training step is required.

In the experiment described in the manuscript, we choose half of the test musical symbols as training samples; then, for the convenience of computing, we set the value of K to 1. The feature data of each class are obtained from the samples by calculating the average feature data of all samples. Although the distance function can also be learned during the training, the corresponding distance functions for the four features above are determined by the following equations:

f S , T 1 = m × n - i = 1 m j = 1 n ρ , ρ = 1 if s i , j = t i , j , other ρ = 0 m × n
(5)
f S , T 2 = i = 1 m j = 1 n s i , j - t i , j m × n
(6)
f S , T 3 = i = 1 m j = 1 n s i , j - t i , j m × n
(7)
f S , T 4 = m × n - i = 1 m j = 1 n ρ , if α < t i , j s i , j < β , ρ = 1 , other ρ = 0 m × n
(8)

where f(S, T)1 ‒ 4 is the distance function of the corresponding feature 1-4, s i,j is the feature matrix of the prototype class (S), and t i,j is the feature matrix of the test sample (T). In Equation 5, if si,j = ti,j, 1 ≤ i ≤ m, 1 ≤ j ≤ n, then ρ = 1; otherwise, ρ = 0. f(S, T) counts the number of unequal elements in the feature matrixes (S and T). In Equations 6 and 7, f(S, T) calculates the sum of the difference between all corresponding elements in the feature matrixes (S and T). Equation 8 counts the number of non-approximate elements in S and T, using the parameters α and β as experience values. In this work, α = 0.9 and β = 1.1.

3.4.2 Bayesian decision theory

Bayesian decision theory is used in many statistics-based methods. Classifiers based on Bayesian decision theory are simple, probabilistic classifiers that apply Bayesian decision theory with conditional independence assumptions, providing a simple approach to discriminative classification learning [57].

In this work, the conditional probabilities P(T|S k ) r , 1 ≤ r ≤ 4 with each of the four features for the Bayesian decision theory classifier are calculated as follows:

P T | S k 1 = i = 1 m j = 1 n ρ , ρ = 1 if s i , j k = t i , j , other ρ = 0 m × n
(9)
P T | S k 2 = i = 1 m j = 1 n s i , j k - t i , j m × n
(10)
P T | S k 3 i = 1 m j = 1 n s i , j k - t i , j m × n
(11)
P T | S k 4 = m × n - i = 1 m j = 1 n ρ , ifα < t i , j s i , j k < β , ρ = 1 , other ρ = 0 m × n
(12)

where P(T|S k )1 ‒ 4 is the conditional probability of the corresponding feature 1-4, S = {S1, …, S k , …, S c } is the set of prototype classes, s i , j k is the feature matrix of the k th class S k in the set of prototype classes, and t i,j is the feature matrix of the test sample (T). In Equation 9, if s i , j k = t i,j , then ρ = 1; otherwise, ρ = 0. In Equation 12, α and β are again experience values, and in this work, we set α = 0.9 and β = 1.1.

In the experiment, the prior probability P(S k ), 1 ≤ k ≤ c of different symbols S k is not equal, and all prior probabilities come from the prior statistics of the training dataset. For example, the beat symbol ‘’ has the prior probability 0.354511. If P(S k |T) = max P(S j |T), T is classified to S k . The Bayesian rule was used. P S k | T = P T | S k r P S k / i = 1 c P T | S i P S i , 1 r 4 and has four expressions to estimate P(S k |T) which correspond to each of the four features.

3.4.3 Genetic algorithm

In the field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. GAs generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. In GAs, the search space parameters are encoded as strings, and a collection of strings constitutes a population. The processes of selection, crossover, and mutation continue for a fixed number of generations or until some condition is satisfied. GAs have been applied in such fields as image processing, neural networks, and machine learning [58].

In this work, the key GA parameter values are as follows:

  • Selection-reproduction rate: ps = 0.5

  • Crossover rate: pc = 0.6

  • Mutation rate: pm = 0.05

  • Population class: C = 12

  • Number of individuals in the population: Np = 200

  • Maximum iteration number: G = 300

An individual's fitness value is determined from the following fitness function:

F I u = v = 1 R f e h u , v , s u , v = v = 1 R f ρ = v = 1 R ρ m × n , if h i , j = s i , j , then ρ = 1 , else ρ = 0
(13)

where I u is the k th individual, F(I u ) is the fitness value of I u , R is the number of a gene bits in I u , h i,j is the j th gene of I u , s i,j is the feature matrix of the set of prototype classes corresponding to h i,j , the function e() computes the value of the comparative result between h i,j and s i,j , and m and n are the width and length, respectively, of a symbol's grid.

3.5 Semantic analysis and MIDI representation

After all the stages of the OMR system are complete (see Figure 6), the recognized symbols can be employed to write the score in different data formats, such as MIDI, Nyquist, MusicXML, WEDELMUSIC, MPEG-SMR, notation interchange file format (NIFF), and standard music description language (SMDL). Although different representation formats are now available, no standard exists for computer symbolic music representation. However, MIDI is the most popular digital music format in modern China, analogous to the status of the GCN format in ancient China.

This work selected MIDI for music representation, because the musical melody in a Kunqu Opera GCN score provides monophonic information. Thus, the symbols recognized from the GCN score can be represented by an array melody[L][2], with L representing the number of notes in the GCN score, the first dimension in the array representing the pitch of all notes, and the second dimension representing the duration of all notes.

Finally, the note information in melody[L][2] can be transformed into a MIDI file using an associated coding language, such as Visual C++, and the MIDI file of the Kunqu Opera GCN score can then be disseminated globally via the Internet.

4. Experimental results

A Chinese Kunqu Opera GCN score may contain information as Chinese characters and musical notes, such as lyrics, title, tune, pitch, rhythm, voices, preface of music, and so on. In this experiment, a dataset was randomly selected from two Kunqu score books [4, 5]. Because OCR for Chinese characters is actively researched [47] and OMR for musical information is a develo** field, we consider only the musical note information in this experiment. This includes pitches and rhythm marks (beats) but neglects other information such as lyrics (Chinese characters).

First, all scores in the set were segmented according to [41], and then the musical note information, including pitches and beats (rhythm marks), was decoded. Finally, the musical information was translated to a MIDI file.

4.1 Training and test datasets

To estimate the efficiency of KOMR, experiments were conducted using images randomly selected from Nashu Studio Theatrical Music[5] and Jiugong Dacheng Nanbei Ci Gongpu[4]. Thirty-nine images were used. The primary information from these images is shown in Table 3 and includes the number of pages, pitches, and beats. The resolution was calculated as (680-720) × (880-900). Figure 3 displays a common musical score sheet from page 1983 of Nashu Studio Theatrical Music.

Table 3 Information on the 39 selected images

To improve the recognition rate of all classifiers, a symbol's grid was used to render a partition of resolution 16 × 16 (n × m above). Each symbol was then resized to 16 × 16 pixels and converted to a 256-dimensional vector, a feature datum of the symbol.

To create training datasets, we randomly selected images from the available datasets. Specifically, the images on pp. 1975-1984 and pp. 4326-4335 (see Table 3) constituted the training sets, and other images were included in the test sets.

4.2 Results

To evaluate the accuracy of our results, the various classifiers were compared based on their test GCN scores. The three classifiers' pitch recognition rates for the test images are shown in Figure 8, with (a) the KNN classifier (in this work, K = 1), (b) the Bayesian classifier, and (c) the GA classifier. The beat recognition rates for (d) the KNN classifier and (e) the Bayesian classifier are also shown. The recognition rate obtained by KNN is higher than that of the other two classifiers.

Figure 8
figure 8

Recognition rate of 39 images using four features. X-axis: serial number of pages; Y-axis: (a to c) recognition rate of pitch using three classifiers and (d, e) recognition rate of beat using two classifiers. (f) Average recognition rate of pitch and beat.

From the results obtained for the musical notes, shown in Table 4, it can be seen that the KNN classifier outperformed the other two classifiers. Notably, the performance of the KNN classifier was clearly superior to that of the GA classifier. The recognition rate of beats is greater than that of pitches, and the results in Figure 8 indicate that the recognition rate of the beats in Nashu Studio Theatrical Music is greater than that in Jiugong Dacheng Nanbei Ci Gongpu. This is because the musical symbols for the beats in the first book included seven different symbols: , whereas there were only four different symbols in the second book: . Thus, each beat feature in the first book is very different from that in the second book, and the second has less discrimination. Meanwhile, Figure 8 and Table 4 suggest that the four features have little effect on the recognition rate.

Table 4 Experimental statistics of the recognition rate for all pitches and all beats using features 1 to 4

Following the recognition stage, the musical information of a GCN score was saved to an array melody[L][2], which was transformed into MIDI format and stored as a MIDI file (Additional file 1). Figure 9 shows the sheet score of the MIDI file in WMN based on the recognition of musical information from Figure 3.

Figure 9
figure 9

Sheet score of the MIDI file in WMN for Figure  3 .

5. Conclusions

As one of UNESCO's Oral and Intangible Heritages of Humanity, Kunqu Opera had a deep effect on ancient Chinese entertainment and has been studied in many ways. GCN is the most widely used recording method for ancient Chinese musical information, employed by a significant number of musical works, including all Kunqu Opera scores.

This paper presented the six-stage KOMR system, with the key phases being image segmentation and symbol recognition. This paper focused on the recognition phase, studying the effectiveness of three classifiers to obtain musical information from the GCN score. The results indicate that a KNN classifier is most suitable for this task.

However, because the average recognition rate of KOMR is less than 90%, it is necessary to find more powerful methods that can be applied to all phases of KOMR. These problems merit further research.

This work is preliminary and tentative and will help to preserve and popularize Chinese cultural heritage. In particular, this study will assist the conversion of handwritten musical Kunqu Opera scores to a machine-readable format, thus ensuring the possibility of disseminating and performing the original Kunqu Opera musical scores.