Introduction

According to a recent WHO report (2019), 34 million children younger than 14 years of age have a disabling hearing loss [1]. Childhood hearing loss is a public health concern, with its deleterious influence on an individual’s speech and language development, educational performance and social-emotional development, as well as the heavy financial burden to health care systems and society [2,3,4]. In an attempt to maximize speech and language competence in hearing-impaired children, the Joint Committee on Infant Hearing (JCIH) issued guidelines for Early Hearing Detection and Intervention (EHDI) programs for infants in 2000, and later were updated in 2007 and 2019. This emphasizes the importance of early auditory evaluation and intervention [5,6,7].

Apart from various tests assessing hearing thresholds, auditory outcome measures also play an important role in the auditory evaluation of children [8]. Auditory outcome measures collect information in regards to a child’s ability to detect, discriminate, identify and comprehend sounds, information that is almost impossible to obtain from audiometric tests [9]. There are a number of auditory outcome measurement tools available. The Infant–Toddler Meaningful Integration Scale (ITMAIS) is one that is able to evaluate infants and toddlers’ early prelingual auditory development (EPLAD) in aspects of detection, discrimination and identification of sounds. This is achieved from parental observation reports on children’s auditory behaviors in daily routines [9,10,11,12]. With the advantage of time-saving and freedom from reliance on test conditions and compliance of children, the ITMAIS has been translated into many different languages and widely used for EPLAD evaluation [13,14,15,16]. Moreover, its usefulness is reinforced by its high Cronbach’s alpha, split-half reliability and item-total correlation scores in the different language versions, which highlight the psychometric properties of the tool [13,14,15,16,17].

It is noteworthy that the satisfactory psychometric outcomes with ITMAIS have been assessed using classical test theory (CTT) [18]. CTT hypothesizes that observed score is the linear combination of underlying true score and random error [19]. The true score, which is essentially the expected value (e.g. the EPLAD) intended to measure by infinite administrations of the same assessment (e.g. ITMAIS), could only be obtained when there is no random error in assessment [20]. Random error, the difference between the true score and observed score, is assumed to be normally distributed and uncorrelated with the true score. CTT mainly measures two kinds of psychometric parameters: reliability and validity [21]. Reliability concentrates on the consistency between the true score and observed score. The higher of the reliability, the higher ability of the observed score representing true score. Validity represents the capacity of a scale to assess what the scale intended to assess [19, 22]. With the advantage of easy-to-analyze, and the effectiveness in evaluating test–retest reliability and external structure of scale, CTT has been widely used to evaluate the psychometric characteristics of scales for decades of years.

In contrast to CTT, Item Response Theory (IRT) uses non-linear mathematical models, and estimates both item parameters and individual latent traits of subjects in a common scale [19, 23]. Different models used in IRT analyses vary in functional forms and the amount of item parameters estimated. Specifically, the item parameters estimated in the framework of IRT rely on the mathematical models instead of response proportions or item-total correlations. Furthermore, the estimated parameters are stable and independent from particular samples, provided the samples are drawn from the same population. However, before IRT modeling and parameter estimation, the fundamental assumptions (i.e., unidimensionality, local independence, monotonicity), as well as model fitting, should be evaluated in advance. Despite rigid assumptions before modeling and challenging mathematical requirements, IRT is gradually being applied to patient-reported outcome measures [18, 19, 24]. In light of the advantages and disadvantages of the two theories, an approach using a combination of both CTT and IRT has been suggested and implemented in current modification and validation of outcome measurements, as well as in the field of auditory-specific patient-reported outcome measures [20, 22, 25, 26].

Therefore, the present study aimed to combine IRT and CTT to form a comprehensive and complementary approach to the psychometric analysis of ITMAIS. The characteristics of each item of ITMAIS in a common scale were analyzed using the IRT, followed by modification by trimming away poorly performing items without affecting scale parameters. The psychometric properties of the modified ITMAIS (ITMAIS-m) were re-evaluated using the CTT framework.

Materials and methods

Study design

The present study comprised two stages. In Stage 1, a retrospective study was conducted to analyze and modify the ITMAIS using the IRT framework. In Stage 2, psychometric properties of ITMAIS-m were examined using a separate sample, and verified in the aspects of reliability and validity using CTT. In the process of validity evaluation, the relationships between the ITMAIS-m and individual pure tone average threshold (PTA) and hearing grades were examined. The study was conducted in accordance with the principles of the Declaration of Helsinki, and the study protocol was approved by the Biomedical Ethics Committee of West China Hospital of Sichuan University.

Participants

In Stage 1, a total of 1983 Chinese children with different hearing grades and different types of hearing loss were recruited in the Hearing Center database of the West China Hospital of Sichuan University, Sichuan, China from Nov. 2006 to Jun. 2017. A total of 3404 ITMAIS assessments were undertaken before or after auditory intervention. Following exclusion of cases missing clinical data or item information, 1730 children (median age and interquartile range (IQR) 29.0 (17.6, 41.9) months) completed 3092 ITMAIS assessments (a total of 642 children assessed more than once) were included in the final statistical analysis.

In Stage 2, Chinese children with normal hearing or permanent hearing loss were recruited at the Hearing Center database from Jul. 2018 to Jun. 2019. Individuals with the possibility of a fluctuating hearing loss, confirmed auditory neuropathy spectrum disorder or other system disorder were excluded, eliminating any heterogeneous effects on ITMAIS-m assessment, and therefore on validity analysis. Participants in Stage 1 were not allowed to recruit in Stage 2. A total of 450 children (median age and IQR 5.7 (3.6, 9.3) months) provided 450 copies of the ITMAIS-m assessment (0 to 1 unanswered item was allowed) for analysis. Of the participants, 93 children were simultaneously assessed with a LittlEARS Auditory Questionnaire (LEAQ). Children in Stage 2 were subdivided into five age groups: 0–3.0 months, 3.1–6.0 months, 6.1–9.0 months, 9.1–16.0 months and 16.1–24.0 months.

Assessment tools

The ITMAIS assessment tool used was based on the Chinese version translated by Zheng et al. [13] (as shown in Additional file 1). The first item relating to reliance on auditory instruments, was not suitable for assessing children without auditory intervention. As a consequence, assessment in the present study involved 9 items, with item 1 excluded. Through a structured interview with parents or caregivers that took typically 10 min, a trained audiologist scored the frequencies of meaningful auditory incidents in children observed in daily routines. Each item was scored 0 to 4, in which 0 represented incidents never observed, and 1, 2, 3, 4 respectively represented incidents rarely, occasionally, frequently and always observed. The total score was expressed as a percentage by dividing the actual score by the maximum score. ITMAIS-m was assessed in the same manner.

LEAQ is another structured interview questionnaire, assessing early auditory development in children under the age of 2 years [27]. Parents or caregivers in the present study were supported by an audiologist in completing the LEAQ to avoid any misunderstanding of questions. The total score was calculated by summing the number of items answering ‘yes’.

Audiological tests

Children were subject to the auditory test battery following the ITMAIS-m or LEAQ assessment. Hearing grades and types were diagnosed by air and bone conduction of tone burst auditory brainstem responses, combined with otoacoustic emissions, acoustic immittance and behavioral audiometry. PTA was calculated using thresholds at 500, 1000, 2000 and 4000 Hz. Hearing grades were classified as mild, moderate, severe and profound hearing loss referring to PTA, according to the WHO criteria [28].

Statistical analysis

Stage 1: Item analysis and modification of ITMAIS

Item analysis and modification of the ITMAIS in Stage 1, realized with the Lavaan, Mokken, Mirt and Lordif package in R 3.5.3, was guided by the psychometric evaluation plan recommended by Reeve et al. [29, 30].

Item responses and traditional statistic description

Frequencies of missing data, mean score and answer options of each item were calculated. Individuals with any unanswered item were analyzed and excluded in the analysis in Stage 1. Inter-item correlations between 0.2 and 0.8 were considered acceptable [31].

Assumptions checking before IRT modeling

The assumption of unidimensionality tests whether ITMAIS measures a single dominant latent trait—EPLAD. In the present study, the assumption was evaluated by combining exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The sample of Stage 1 was randomly split into two parts (i.e., Sample part 1: 1546 vs. Sample part 2: 1546 ITMAIS assessments), which were used to conduct EFA and CFA separately. In the approach of EFA, judged by eigenvalues (a ratio between factors > 4), explainable proportions of variance (> 25%) and factor loadings, main factors were extracted by principal factor solution under parallel analysis [29]. The results of CFA referred to indices with a series of criteria representing good fit: comparative fit index (CFI) > 0.95, Tucker–Lewis index (TLI) > 0.95, root mean square error of approximation (RMSEA) < 0.06 and standardized root mean residuals (SRMR) < 0.08 [29, 32].

In the present study, the local independence means that there should not be any relationship among item responses after conditioning on the level of EPLAD. This assumption was assessed with residual correlations obtained from the 1-factor CFA analysis. The correlation less than 0.1 was considered as eligible local independence [33, 34].

The monotonicity assumption signifies that the probability of endorsing a category of an item in ITMAIS increases when the level of EPLAD ascends. It was analyzed by judging from graphs plotting item step response function and item response function in the Mokken package [29].

IRT model fit and parameters evaluation

Among the various models in the IRT family, we chose the graded response model (GRM), with its flexibility for items with polytomous and ordered responses [29, 35, 36]. After confirming with three assumptions, item fit between the observed and expected responses under GRM was investigated. The p value of goodness-of-fit index s-x2 < 0.001 was considered with item misfit [29, 37].

Briefly, in the approach of GRM, the probability of a person j endorsing the category k or higher of an item i in ITMAIS is calculated as follows:

$$P \, (X_{i} \ge \, k|\theta_{j} ) = \exp \left[ {\alpha_{i} \left( {\theta_{j} - \beta_{ik} } \right)} \right]/\left\{ {1 + \exp \, \left[ {\alpha_{i} \left( {\theta_{j} - \beta_{ik} } \right)} \right]} \right\}$$

where αi is the discrimination parameter of item i, βik represents the kth difficulty parameter for item i, and θj is the EPLAD level of person j. Each item has an independent discrimination parameter, indicating that the items may differ in their ability to differentiate children with various levels of EPLAD. Different ranges were proposed to better interpret the power of discrimination parameter α: 0.01–0.34 = very low; 0.35–0.64 = low; 0.65–1.34 = moderate; 1.35–1.69 = high; and > 1.70 = very high [38]. The difficulty parameter is defined as the level of EPLAD associated with a probability of 50% in response to the category k or higher of an item. GRM allows the spacing between the difficulties of categories to vary across items. The number of difficulty parameters of each item is equal to item categories minus 1. Since ITMAIS is a 5-category Likert scale, 4° of difficulty parameters for each item were produced.

In the present study, both item information and test information, representing the amount of information of each item, and thus the total scale that can provide at a given level of EPLAD was analyzed. In the framework of IRT, item information and test information graphically demonstrates the measurement precision of an item or a scale when assessing subjects with varied levels of EPLAD. The more information could be obtained at a specific level of EPLAD, the higher level of assessment precision and reliability of an item or a scale would be [19]. Therefore, the reliability in the framework of IRT is specified at the item level and combined with individual latent trait.

Differential item functioning (DIF) evaluation

DIF analysis aimed to identify discrepancies in responses between children with different genders or different evaluation times, given equivalent levels of EPLAD. In the present study, the iterative hybrid ordinal logistic regression was performed to test DIF of each item. The criterion of an item showing DIF was defined as the magnitude of McFadden pseudo R2 > 0.035 [39].

Stage 2: Reliability and validity verification of ITMAIS-m

Verification of ITMAIS-m was realized with SPSS 21.0 and JASP 0.10.2.0 [40]. Frequencies of missing data, mean score and answer options of each item in ITMAIS-m were calculated. The reliability of ITMAIS-m was analyzed with Cronbach’s α, of which 0.7–0.8 indicates acceptable, 0.8–0.9 indicates good, and above 0.9 represents excellent internal consistency [41]. The item-total correlations of ITMAIS-m were analyzed.

Previous studies have found that hearing grades (classified by PTA) and assessment age would affect the scoring of ITMAIS. Children with more severe hearing loss and younger age would receive lower ITMAIS scores [13, 42]. Therefore, in the aspect of convergent validity analysis, Pearson correlations or Spearman rank-order correlations were applied to explore the relationships of ITMAIS-m with PTA (the better ear) and assessment ages, depending on the distributions of data. The correlations of ITMAIS-m with another childhood auditory outcome measurement (i.e., LEAQ) was also tested. Strength of correlation was evaluated by the correlation coefficient r: < 0.3 small, 0.3–0.6 moderate, and > 0.6 large [43].

For known-group validity analysis, the discriminative power of ITMAIS-m among different hearing grades (the better ear) was analyzed by one-way analysis of variance, and effect size among groups was calculated by partial eta squared (ηp2). Furthermore, Bonferroni post hoc tests were performed, and the effect sizes between two groups were quantified by Cohen d. According to the literature, effect size calculated as ηp2 is small when index < 0.01, 0.01–0.06 moderate, and > 0.14 large [44]. The index of d is considered small (0.2–0.5), moderate (0.51–0.8), and large (> 0.8), according to Cohen [45].

Results

Characteristics of participants in Stages 1 and 2

Characteristics of the participants recruited in Stages 1 and 2 are summarized in Table 1. The assessment ages in Stage 1 were significantly older, with 1086 individuals assessed with ITMAIS in the follow-up period between 1 month and 4 years after auditory interventions. Children in this stage mostly had the level of profound hearing loss (66.2%) or sensorineural hearing loss (76.3%), while hearing grades in Stage 2 were uniformly distributed. The proportions with conductive (1.3%) and mixed (0.2%) hearing loss in Stage 2 were small, since most cases with the possibility of fluctuated hearing loss were excluded.

Table 1 Sample characteristics of Stage 1 and 2

Stage 1: Item analysis and modification of ITMAIS

Item responses and traditional statistic description

In Stage 1, the percentages of missing answers and response options for each item of ITMAIS are presented in an appendix (see Additional file 2). Percentages of missing answers of the nine items ranged from 0.1 to 2.4%. Inter-item correlations ranged from 0.62 to 0.84.

Assumptions checking

EFA demonstrated that the first factor had the largest eigenvalue of 7.01 (accounting for 75% of the variance) with the remainder having eigenvalues less than 1. One factor was thereby extracted, and items loading on the factor ranged from 0.80 to 0.90.

CFA analyzed with a different set of data in Stage 1 revealed a satisfactory 1-factor model fitting except the index of RMSEA (CFI = 0.949, TLI = 0.947, SRMR = 0.030, and RMSEA = 0.134). In comparisons to the 2-factor and 3-factor models, however, they did not ameliorate the model fitting significantly (2-factor model: CFI = 0.971, TLI = 0.969, SRMR = 0.023, and RMSEA = 0.103; 3-factor model: CFI = 0.964, TLI = 0.962, SRMR = 0.026, and RMSEA = 0.118). According to the results obtained from EFA and CFA, the current results indicated that the ITMAIS met the unidimensional assumption.

None of the items violated the assumption of local independence, with residual correlations smaller than 0.10 between items. Likewise, the nine items met the assumption of monotonicity. The relevant graphs plotting item step response function and item response function demonstrated that probabilities of endorsing higher categories in each item increase when auditory function elevates (as shown in the Additional file 3).

IRT model fit and parameters evaluation

Five items (item 2, 4, 7, 8, 9) of ITMAIS exhibited unsatisfactory item fit under GRM (p < 0.001). In view of the relatively lower factor loading of item 2 (0.80) in unidimensional analysis, it was removed, and re-evaluation demonstrated that only item 9 showed item misfit.

The unidimensionality assumption, item and scale parameters before and after removing item 2 were analyzed and compared. One-factor model fitting of the 8-item ITMAIS (removing item 2), with CFI = 0.946, TLI = 0.924, SRMR = 0.031, RMSEA = 0.154, varied little when compared to the original ITMAIS. Item 2 demonstrated discrimination parameter of 2.380 and item information of 1.758, with difficulty parameters ranging from − 1.583 to 0.590. After removing item 2, the discrimination parameters of the remaining 8 items elevated the largest by 0.232 (item 4), and difficulty parameters fluctuated the most by 0.026 (items 4 and 9). Item information of the remaining 8 items increased from the range of 4.487–8.938 to 4.798–9.259, with the largest elevation of 0.615 in item 4. Test information of the total scale increased from 47.754 to 48.061.

Figure 1 shows the trace lines of each item in ITMAIS. The trace lines demonstrated the probability of selecting a specific response of an item by individuals with a specified level of EPLAD. As shown in the Fig. 1, the response curves of the items were steep and centralized at the EPLAD range of − 1 to 1. It is evident in item 9, showing the response curves were centralized at the EPLAD level of 0. In comparison to other items with orderly response curves, the trace lines of item 2 were relatively poor, showing some of the response curves were disordered and overlapped.

Fig. 1
figure 1

Item trace lines of the 9 items (item 2–10) in ITMAIS. The x axis ‘θ’ represents the range of EPLAD. The y axis ‘P(θ)’ means the probability of an individual with specified EPLAD to respond to different categories of an item

Differential item functioning (DIF) evaluation

None of the items in ITMAIS displayed DIF, when individuals presented with different characteristics, i.e., male or female, assessment before or after auditory interventions.

Ultimately, ITMAIS was modified by removing item 2 in Stage 1. ITMAIS-m demonstrated better item fit, and the item and scale parameters were robust to such modification. Item parameters of ITMAIS-m are presented in Table 2. Item information of item 3–10 and the test information, before and after removing item 2, are plotted in Figs. 2 and 3.

Table 2 Estimates of discrimination and difficulty parameters of ITMAIS-m, under the GRM
Fig. 2
figure 2

Item information of item 3–10 of ITMAIS, before and after removing item 2. The solid lines represent item information after removing item 2. The dashed lines represent item information without removing item 2. The x axis ‘theta’ represents the range of EPLAD

Fig. 3
figure 3

Test information before and after removing item 2. The solid line represents test information of the ITMAIS-m after removing item 2. The dashed line represents test information of ITMAIS without removing item 2. The x axis ‘theta’ represents the range of EPLAD

Stage 2: Reliability and validity verification of ITMAIS-m

Frequencies of missing data, mean score and answer options of each item, as well as item-total correlations in ITMAIS-m were calculated and shown in an appendix (Additional file 4). The item-total correlations of the eight items in ITMAIS-m ranged from 0.693 to 0.851. The ITMAIS-m exhibited excellent internal consistency with Cronbach’s α = 0.919.

As shown in Table 3, correlation of ITMAIS-m with LEAQ was 0.932, suggesting a strong convergence. The correlations with PTA ranged from − 0.670 to − 0.909, and varied in different age groups. ITMAIS-m significantly correlated with assessment ages, when children were with normal hearing, mild, moderate or severe hearing loss, although the power was moderate in children with severe hearing loss (r = 0.380). There was no significant association between ITMAIS-m and assessment ages in children with profound hearing loss.

Table 3 Correlations of ITMAIS-m with LEAQ, PTA and age

Table 4 demonstrates that children with different hearing grades (normal-mild hearing loss, moderate hearing loss, severe-profound hearing loss) in different age ranges differed significantly in ITMAIS-m scores. The effect sizes ηp2 among groups ranged from 0.515 to 0.844. Post hoc comparisons demonstrated that, excepting comparison between moderate and severe-profound hearing loss within 3 months (Cohen d = 0.41), effect sizes between the other hearing grades in different age ranges were large, with Cohen d ranging from 0.93 to 5.83. The effect sizes of ITMAIS-m were larger when discriminating severe-profound hearing loss from other hearing grades than discriminating between normal hearing-mild hearing loss and moderate hearing loss.

Table 4 Known-group validity of ITMAIS-m in discriminating hearing grades in varied age ranges

Discussion

The main aim of this research was to modify and verify the ITMAIS—an auditory outcome measurement scale evaluating EPLAD for infants and toddlers—in the framework of psychometric analysis. The research is novel in that it combines modern (IRT) and traditional (CTT) psychometric theories to comprehensively evaluate a scale concentrating on prelingual auditory function. The modified version, ITMAIS-m was found to be reliable and valid tool to evaluate EPLAD in clinical practice precisely and efficiently.

A total of 1730 participants with varied characteristics, including wide age ranges (median (IQR) ages 29.0 (17.6, 41.9) months), different hearing grades and hearing types (normal hearing, or mild to profound hearing grades with sensorineural, conductive or mixed hearing types), and different assessment times (before or after auditory intervention), were recruited in the stage of IRT analysis. The large sample with different characteristics signifies that individuals are with different levels of latent trait, and the widely distributed latent trait covering the whole range enables accurate and stable item and scale parameters estimation with lower standard error [20, 46]. Barker et al. [47] has tried to use Rasch; i.e., a one-parameter IRT model, to examine the psychometric properties of ITMAIS. Their conclusions, however, may deserve further discussion as a result of the limitation imposed by the small, homogenous and tailored sample of 23 cochlear implanted children with severe to profound sensorineural hearing loss.

In the present study, GRM model fitting demonstrated that five items were poorly fitted. In view of item content, item 2 (Does the child produce well-formed syllables and syllable-sequences that are recognized as speech?) mainly evaluates preverbal vocalization, which differs from the nature of EPLAD. In addition, the results of the poor performance of trace lines of item 2, the minor variations of dimensionality and item parameters after deletion, as well as ameliorated GRM model fitting after deletion, indicates it is appropriate to modify ITMAIS by removing item 2.

Although the GRM model fitting of the 7-item ITMAIS, removing both item 2 and 9, is preferable, the plunge of test information (from 48.061 to 40.216) and the highest information provided by item 9 (9.259) suggests it is not advisable to modify ITMAIS by removing item 9, with the possible loss of a large amount of information. Moreover, the content of item 9 (Does the child spontaneously know the difference between speech and non-speech stimuli with listening alone?) largely reflects the function of sound discrimination and identification, which is highly related to the nature of EPLAD. Given that no optimal fit indices exist, it is recommended that strict IRT model fitting is not vital, and some unsatisfactorily fitted items may be retained if identified with a close clinical relationship [29].

To date, there are few studies that concentrate on IRT analysis of scales evaluating EPLAD, although the EPLAD is fundamental and vital to speech and language development [47, 48]. IRT is an accessible way to develop or modify a scale focusing on item responses. Good performed items, with adequate model fit, high discriminative power, appropriate difficulty range and no signs of DIF, could be selected out through this approach. In the present study, we have identified that ITMAIS-m assesses the sole latent trait, i.e., EPLAD, with the method of unidimensionality checking. Each item had a very high discriminative power (α > 1.70), and the 8 items of ITMAIS-m demonstrated difficulty span covering the level of EPLAD from − 1.146 to + 1.150, implying that ITMAIS-m is robust in discriminating an individual with EPLAD below or above the mean level (θ = 0). Considering the difficulty range of the items in ITMAIS-m is not wide enough to cover the full range of EPLAD, it would be a further direction to research on widening the difficulty range of ITMAIS by adding more items.

As shown in Fig. 3, the maximum test information of ITMAIS-m reaches to 48.061. With the formula that reliability = 1–1/test information, the reliability of ITMAIS-m could reach the highest level of 0.979 when evaluating children with EPLAD approaches approximately the mean level (θ = 0) [49]. This is consistent with the results obtained from the analysis in Stage 2, in which the Cronbach’s α of ITMAIS-m was 0.919. Considering the centralized tendency of the test information, the results indicate that ITMAIS-m would provide sufficient information when assessing children with EPLAD approximately distributed between − 1.3 and 1.5 SD. Within this range, the ITMAIS-m could provide test information more than 10, and the reliability of ITMAIS-m could reach 0.90 or higher accordingly by conversion.

In Stage 2, analysis based on CTT was used to verify the psychometric properties of ITMAIS-m with a separate specific sample. By exerting the superiority of CTT in evaluating external construct validity of a scale, the relationship of ITMAIS-m with LEAQ, age, as well as clinical characteristics were evaluated. Apart from the high correlations with LEAQ, ITMAIS-m was significantly correlated with PTA. The older the children, the higher the correlations between ITMAIS-m and PTA. This phenomenon can be seen from previous studies where the increase of ITMAIS scoring slows down when children grow older, implying that age also affects ITMAIS scoring and EPLAD [42]. However, when children grow older, the effect of age on ITMAIS is minor, and the relationship between ITMAIS-m and PTA becomes more robust. This is also the reason why ITMAIS-m simultaneously correlates with age in children with different hearing grades, except those with profound hearing loss. In the approach of known-group validity evaluation, ITMAIS-m could efficiently discriminate different hearing grades in different age groups, especially distinguishing severe-profound hearing loss from other hearing grades. Considering the high correlation with PTA and significant discriminative power in hearing grades, the value of ITMAIS-m in predicting hearing grades, especially in children with severe and profound hearing loss who are crying for auditory diagnosis and intervention, could be further investigated.

There are a few limitations in the present study. The number of participants in Stage 2 within 3 months and 16 months or larger is relatively limited, which results in an instability of parameter evaluation in the subgroup of 0–3 months and 16–24 months. In view of the main purpose of analyzing construct validity by evaluating the relationships between ITMAIS-m and hearing grades, the sample included in Stage 2 only concerns individuals without auditory intervention. In future, larger samples with different clinical characteristics, e.g., different forms and periods of auditory intervention, could be included to further verify the validity of ITMAIS-m.

Conclusions

With the comprehensive and complementary approach of combining IRT and CTT, the modified ITMAIS is developed to have robust psychometric properties. This important result indicates the significance and benefit of using IRT in combination with CTT in modifying auditory outcome measurement scales. Moreover, the ITMAIS-m obtained from the present study will provide a useful clinical tool to evaluate EPLAD for young children more precisely and efficiently. Further research is currently underway to validate the clinical applications of ITMAIS-m in predicting young children’s hearing grades when audiometry was unavailable.