1 Introduction

Rating surveys have gained significant prominence as crucial tools for evaluating service quality, having been established as pivotal instruments for the quantitative assessment of respondents’ satisfaction regarding the received services. This practice is not only widespread, but has also become an integral part of performance appraisal in various public sectors (see Poister and Gary 1994, among others).

The higher education sector is noteworthy, where evaluation surveys serve as powerful tool for sha** the narrative; see e.g., Kanwar (2022) and reference herein. Surveys enable a multifaceted approach to comprehending the dynamics within educational institutions. By actively involving students, faculties, staff, and other stakeholders, these surveys open up valuable channels of feedback on crucial aspects such as teaching quality, available resources, campus facilities, and the overall environment; see e.g., Bejan et al. (2015). This feedback, in turn, provides an overview of an institution’s strengths and areas in need of improvement.

Structured and standardized evaluation surveys facilitate the systematic collection of data on various courses, programs, and departments. This methodology empowers institutions to discern trends and patterns over time, laying the foundation for informed decision-making. Additionally, the feedback from these surveys spotlights specific areas requiring attention and improvement, ranging from teaching techniques and course content to the quality of facilities, availability of support services, and the general climate of the surveyed institution.

Using evaluation survey data, institutions gain the opportunity to assess their own performance in relation to similar facilities or industry benchmarks. This comparative analysis yields valuable insights into the institution’s position and serves as a compass for identifying areas of excellence and those in need of further development. Moreover, this data-driven approach supports evidence-based decision-making, providing administrators with empirical evidence to guide choices regarding resource allocation, program development, and policy implementation.

By actively seeking feedback through evaluation surveys, institutions demonstrate a commitment to accountability and transparency. This practice underscores respect for the perspectives and experiences of stakeholders and signals that these contributions are valued and will be utilized to bring about positive changes within the institutions.

Understanding students’ opinions through evaluation surveys leads to improvements in various facets of education. This encompasses refining teaching methods, adapting curriculum design, and enhancing student support services, ultimately resulting in greater student engagement, higher levels of satisfaction, and increased retention rates.

As noted in QAA (2018), engaging students in quality assurance and improvement practices reflects the fundamental principles of higher education. Students becomes not simply recipients of university services; they also serve as instrumental participants in evaluating their effectiveness.

The regular administration of assessment surveys fosters a culture of continuous improvement within higher education institutions. This practice signifies a dedication to continuous self-evaluation and a willingness to adapt and evolve in response to feedback. It serves as an example of an institution’s commitment to the constant pursuit of excellence.

Furthermore, evaluation survey data are of great importance in accreditation processes and external assessments; see, e.g., Brennan and Shah (2000). They function as an additional source of evidence, demonstrating an institution’s compliance with established quality standards and its proficiency in achieving desired educational outcomes. In this manner, evaluation surveys play a pivotal role not only in understanding the current state of an institution but also in charting its future trajectory in higher education.

In the higher education sector, university curriculum counselling has emerged as a notable domain. In this context, the evaluation of students’ feedback has become a cornerstone for enhancing service quality. This reflects a broader trend towards prioritizing user-centric approaches in service delivery, highlighting the significance placed on the perspectives and experiences of the recipients of these services. Related surveys aim to establish a virtuous cycle of student-centered education support by identifying areas with low satisfaction and improving them. Moreover, they play a crucial role in future government financial support projects and the establishment of plans for universities. These surveys measure the level of satisfaction or awareness of current students, and provide a wide range of information about the students’ educational experience, see Shin et al. (2020).

We focus on this kind of surveys analysing a set of data resulting from a study fulfilled at University of Naples Federico II from 2002 to 2004, and replicated in 2007 and 2008. The purpose of the related surveys was to produce basic data for establishing development strategies such as improving service quality and satisfaction by identifying educational or infrastructural problems and points for replanning or further development.

The early years of the 21st century were transformative for the Italian higher education system, marked by substantial reforms designed to align it with European standards, elevate quality, and promote innovation and internationalization. This context underscores the enduring relevance of the insights gleaned from our data analysis in the contemporary academic landscape. The investigation aimed to improve the quality of the service according to the needs of students for the necessity of strengthening university competitiveness. In Sect. 3 we provide a brief description of the data; further details can be found in Capecchi and Piccolo (2010), and reference herein.

To assess the quality of the service, students reported scores to items on ordinal rating scales. Frequently it is assumed that an ordinal response is the discretization of an underlying unobserved (latent) continuous variable, with every possible value (score) for the ordinal response corresponding to an interval of the latent variable. The response to an item is selected on a discrete ordered scale ranging from one qualitative end point to another one (as in the mentioned survey). It does not have metric information, although numerically labeled options may be considered. When a single item question is considered, the main model to analyse the obtained scores is the cumulative model with the proportional assumption for the odds, scaled by the model’s link function; see Agresti (2010). Under this model, the effect of the explanatory variables is the same across all cumulative odds defined over the response categories.

The cumulative model represents an unconditional framework, unlike other ordinal regression models such as the adjacent categories model and the whole class of hierarchically structured models (see Tutz 2022 for further details). The assumption that effects of covariates are not category-specific makes it a simply structured model that allows to interpret parameters in terms of cumulative odds (alternative structure are in Boes and Winkelmann 2006, among others). However, if unobserved heterogeneity related to the latent variable is present, scale effects in the regression structure with ordinal responses are needed.

The unobserved heterogeneity can be determined by several causes. It can be triggered by the sampling system; in this case, observed outcomes differ between studies even though they estimate a common underlying parameter. In other cases, observed outcomes are more variable than expected on the basis of sampling variability alone. This is typically interpreted as evidence of the presence of variability in the underlying outcomes. When heterogeneity is present, one can try to examine whether certain predictor variables (also known as moderators or effect modifiers) are able to explain at least part of the heterogeneity in the outcomes. This is the objective of the present research work that aims to detect category specific covariates effects (see Bürkner and Vuorre 2019). In particular, here we rely on the location-scale models (McCullagh 1980) and we present simple ways for effects interpretation for these types of models.

There is a wide literature regarding ordinal data analysis both from a frequentist and Bayesian perspective, see e.g., Johnson and Albert (1999) for an insightful comparisons of the two approaches. While frequentist methods for the analysis of location-scale models are available (see e.g., McCullagh 1980; Tutz 2022), here, following Bürkner and Vuorre (2019); Bürkner (2017), we adopt a Bayesian framework, gaining flexibility in specifying the model and enhancing richness and accuracy in providing parameter estimates. In a Bayesian setup it is possible to combine in a natural way prior information and new data. Once the posterior distribution of the parameter of interest is obtained, one can easily derive the distribution of functions of interest of the examined parameters. Another advantage is the possibility to use the same framework and approach both in case the sample size is small or large. For a discussion of the advantages of the use of a Bayesian approach for ordinal data see, e.g., Liddell and Kruschke (2018) and references therein. For some milestones regarding Bayesian approach in the context of ordinal data see Albert and Chib (1993), Albert and Chib (1997), Dellaportas and Smith (1993), Johnson (1996), Johnson and Albert (1999).

The focus of the contribution lies on the implementation of the proposed model, the use of marginal effects to address the interpretation of the results on the extreme categories of the rating scales, and the introduction of Bayesian probability-based measures for comparing clusters on ratings. The obtained results favour the proposed model compared to the location baseline one and show significant subject heterogeneity.

The paper is organized as follows. The next section provides a brief description of the parametric ordinal models in their location-scale version, presenting also marginal effects and ordinal superiority measures for this class of models. Section 3 introduces university curriculum counselling data and reports the main results of the model’s implementation. The paper ends with some concluding remarks.

2 Model description

We consider the following setting. Let \({\varvec{Y}}=(Y_1, Y_2,\ldots , Y_n)'\) be a sample of size n, generated by an ordinal random variable \(Y \sim G(y)\) on the support \(\{1,\ldots ,k\}\), where k is a known integer. We indicate with \(Y^*_i\) the underlying (continuous) latent variable such that when \(\alpha _{j-1} < Y^*_i \le \alpha _{j}\), then \(Y_i=j\), \(j=1,2,\ldots ,k\). Here \(-\infty =\alpha _0< \alpha _1<\cdots < \alpha _k=+\infty \) are the thresholds of \( Y^*\). Let also \({\varvec{X}}\) be a \(n\times p\) real matrix that includes all covariates (\(p \ge 1\)) that are relevant for explaining \(Y^*\) and characterize the students’ profile. The i-th row of \({\varvec{X}}\) is the row vector \(\varvec{x}_i\) with the covariates values for the i-th student, \(i=1,\ldots , n\).

In our context, \(Y_i\) is the rating expressed by the i-th student on a specific question concerning the evaluation of the university curriculum counsellor service. For each student, we collect information \(\mathcal{I}_i=(y_i, {\varvec{x}}_i)\), \(i=1,2,\ldots ,n\), where \(y_i\) is the observed value of the rating. The latent regression model behind the process of response is \(Y_i^*= {\varvec{x}}_i {\varvec{\beta }} + \sigma _i \epsilon _i\), \(i=1, 2, \ldots , n\), where \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)'\) are the covariates coefficients. This structure yields the location scale model from which we can obtain the standard cumulative link model (see Agresti 2010) by setting \(\sigma =1\). In the latent regression \(\sigma _i\) is the standard deviation of the noise variable \(\epsilon \sim F_{\epsilon }(.)\), and it may depend on covariates, yielding \(\sigma _i=\exp ({\varvec{z}}_i{\varvec{\gamma }})\). Here, \({\varvec{z}}_i\) is a row vector of the matrix \({\varvec{Z}}\) which includes all the \(q\ge 1\) relevant covariates and \({\varvec{\gamma }}=(\gamma _1,\ldots ,\gamma _q)'\) are the related covariates coefficients. Note that, the vectors of covariates \({\varvec{x}}_i\) and \({\varvec{z}}_i\) may have a non-empty intersection.

The location scale model can be seen as a collection of binary response models postulating that intercepts are ordered and the effect of explanatory variables captured in the predictor is the same in all of the models-proportional assumption-see Tutz (2022).

The probability mass function of \(Y_i\), for \(j=1,2,\ldots ,k,\) is then

$$\begin{aligned} \hbox{Pr}\left( {Y_i=j\mid {\varvec{\theta }}, {\varvec{x}}}\right)= & {} \hbox{Pr}\left( {\alpha _{j-1} < Y_i^*\le \alpha _{j}}\right) \\= & {} F_{\epsilon }\Big [({\alpha _{j}-{\varvec{x}}_i{\varvec{\beta }}})/{\sigma _i}\Big ]-F_{\epsilon }\Big [({\alpha _{j-1}-{\varvec{x}}_i{\varvec{\beta }}})/{\sigma _i}\Big ]\\= & {} F_{\epsilon }\Big [({\alpha _{j}-{\varvec{x}}_i{\varvec{\beta }}})/{\exp ({\varvec{z}}_i{\varvec{\gamma }})}\Big ]\\{} & {} -F_{\epsilon }\Big [({\alpha _{j-1}-{\varvec{x}}_i{\varvec{\beta }}})/{\exp ({\varvec{z}}_i{\varvec{\gamma }})}\Big ]. \end{aligned}$$

Among the alternatives for \(F_{\epsilon }(.)\), which can be any strictly increasing distribution function, we focus on the logit link function \(F(\eta )=exp(\eta )/(1+\exp (\eta ))\) (i.e., the canonical link) for reasons of robustness, see Iannario et al. (2017). More specifically the various link functions, which provide the relationship between the linear predictor and the probabilities of the response categories, differ in terms of sensitivity to anomalous data. Extreme design points and/or anomalous responses can negatively affect likelihood based inference in ordinal response models. For this reason, recent studies emphasise the need for a robust method or, alternatively, for link functions that ensure that maximum likelihood estimators have a limited influence function to eliminate problems associated with outlier data, such as the logistic link (see Scalera et al. 2021, for details).

The threshold values \({\varvec{\alpha }}=(\alpha _0,\alpha _1, \ldots ,\alpha _{k})\) are assumed to be controlled by the anchor labels of the response prompt (e.g., ‘very satisfied’, ‘mildly satisfied’, and so on), which in our context are the same for all faculties. Because of algebraic trade-offs, the outer thresholds of the ordered data models are fixed, and only the interior thresholds are estimated from the data.

Furthermore, as already mentioned in the introduction, a proportional assumption yields the regression parameters to be invariant to the outcome category. As alternatives, parallel (non-proportional) and semi-parallel (partial proportional) odds models has been proposed (see Peterson and Harrell 1990, among others) generating an increasing level of complexity and difficulty in interpretation. The strength of the selected model for our study is the possibility of interpreting parameters intuitively, even when referring to a complex structure.

Since we do not have relevant prior information, following the approach prosed by Bürkner and Vuorre (2019), Bürkner (2017), we use non-informative priors on all parameters of interest, letting the data guide the behaviour of the posterior distributions. More precisely on the covariate coefficients we assign improper uniform priors, \(unif(-\infty ,+\infty )\), while on the intercepts we consider Student-t priors with 3 degrees of freedom. This ensures that the tails are very wide, while the distribution is still proper with finite mean and variance (here set—without loss of generality—equal to 0 and 2.5 respectively). In order to obtain posterior samples we rely on Markov Chain Monte Carlo (MCMC) method. In particular we use the R package brm (Bürkner 2017). It implements a Hamiltonian MCMC using Stan; see e.g., Betancourt and Girolami (2013), Kreuzer et al. (2023, 2022), Neal (2011). The ordering of the intercepts is ensured via the order class in Stan. More precisely, the joint prior distribution is truncated to support over points satisfying the ordering constraints.

We then consider the Watanabe–Akaike information criterion (WAIC), proposed by Watanabe (2010), for model selection among different possible models for the data under consideration, see also Gelman et al. (2014), Stander et al. (2019), Watanabe (2013).

WAIC is a method for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. WAIC is obtained by adding to the log pointwise posterior predictive density a correction for the effective number of parameters to adjust for overfitting. In its original formulation WAIC (see Eq. 6 in Watanabe 2010) was defined as

$$\begin{aligned} WAIC=TL+\mathbb{V}/n, \end{aligned}$$
(2.1)

where TL, the training loss, is given by

$$\begin{aligned} TL=-\frac{1}{n}\sum _{i=1}^{n}\log \mathbb{E}_{{\varvec{\theta }}}[\hbox{Pr}(Y_i\mid {\varvec{\theta }})], \end{aligned}$$

with \(\mathbb{E}_{{\varvec{\theta }}}[\hbox{Pr}(Y_i\mid {\varvec{\theta }})]\) being the expectation value with respect to the posterior distribution of \(\theta \) and \(\mathbb{V}\) being the functional variance

$$\begin{aligned} \mathbb{V}=\sum _{i=1}^n var \log (\text{Pr}( Y_i\mid {\varvec{\theta }}))=\sum _{i=1}^{n} \left\{ \mathbb{E}_{{\varvec{\theta }}}[\log \hbox{Pr}(Y_i\mid {\varvec{\theta }})^2]-\mathbb{E}_{{\varvec{\theta }}}[\log \hbox{Pr}(Y_i \mid {\varvec{\theta }}) ]^2 \right\} . \end{aligned}$$

Both the training loss and the functional variance can be approximated by simulations, replacing the expectations by averages over the posterior draws of \({\varvec{\theta }}\); see Gelman et al. (2014) for further details. For alternative Bayesian model assessment procedures we refer to Burnham and Anderson (2002) and Piironen and Vehtari (2017), and references herein. Their presentation and comparison to WAIC are beyond our scope.

2.1 Effect measures for covariates in location-scale models

For the examined models, as a consequence of the nonlinearity of the link function, model parameters are not as simple to interpret as slopes and correlations for the standard metric models (i.e., ordinary linear regression). This enhances the need to introduce simpler ways to interpret the effects of the covariates as advocated in Agresti and Tarantola (2018) for the cumulative models. Extending the mentioned approach, we evaluate the effect of each explanatory variable, on the location and on the scale components, using the so-called Marginal Effect (ME) measures, see e.g., Greene (2008). The latter measure how a change in a specific covariate \(x_{il}\) (\(z_{il}\)) affects the response variable when all other covariates are fixed at certain values \({\varvec{x}}_{i\setminus l}^{*}\) (\({\varvec{z}}_{i\setminus l}^{*}\)). For further details and interpretation of ME in ordered response models, see e.g., Greene (2008) and Greene and Hensher (2010).

In this section we report the ME measure of a continuous variable \(x_{il}\) involved only in the location component of the model. The ME on \(\hbox{Pr}(Y_i=j)\) is given by the partial derivative of \(\hbox{Pr}(Y_i=j)\) with respect to \(x_{il}\).

$$\begin{aligned} ME_{\{Y_i=j, x_{il}\}}=\frac{\partial \hbox{Pr}(Y_i=j\mid {\varvec{x}}_i = (x_{il}, {\varvec{x}}_{i\setminus l}^{*}), {\varvec{z}}_i)}{\partial x_{il}}. \end{aligned}$$
(2.2)

In Eq. (2.2) the partial derivative of \(\text{Pr}({Y}_{i}=j)\) with respect to \(x_{il}\) indicates the rate of change in \(\hbox{Pr}(Y_i=j)\) with respect to \(x_{il}\) when other covariates are fixed at value \({\varvec{x}}_{i\setminus l}^{*}\) and \({{\varvec{z}}_i}\). It is obtained by

$$\begin{aligned} \begin{aligned}&\frac{\partial \hbox{Pr}\left[ Y_i=j\mid {\varvec{x}}_i = (x_{il},{\varvec{x}}_{i\setminus l}^{*}),{{\varvec{z}}_i} \right] }{\partial x_{il}}\\&\quad =\left\{ \begin{array}{ll} -\beta _l \times \frac{f_j}{\exp ({\varvec{z}}_i{\varvec{\gamma }})} &{}\quad j=1 \\ -\beta _l \times \frac{f_j}{\exp ({\varvec{z}}_i{\varvec{\gamma }})}+\beta _l \times \frac{f_{j-1}}{\exp ({\varvec{z}}_i{\varvec{\gamma }})} &{}\quad j=2,\ldots ,k-1 \\ \beta _l \times \frac{f_{j-1}}{\exp ({\varvec{z}}_i{\varvec{\gamma }})} &{}\quad j=k \end{array}\right. \end{aligned} \end{aligned}$$
(2.3)

where \(f_{j}=f\left[ \frac{(\alpha _j-{\varvec{x}}_i{\varvec{\beta }})}{\exp ({\varvec{z}}_i{\varvec{\gamma }})}\right] \) and \(f_{j-1}=f\left[ \frac{(\alpha _{j-1}-{\varvec{x}}_i{\varvec{\beta }})}{\exp ({\varvec{z}}_i{\varvec{\gamma }})}\right] \) are the density function corresponding to the examined cumulative model.

If \(x_{il}\) is a categorical variable, we need to calculate the discrete change. For a dichotomous variable, it is given by

$$\begin{aligned} ME_{\{Y_i=j, x_{il}\}}= \hbox{Pr}\left( {Y}_{i}=j\mid {\varvec{x}}=(1,{\varvec{x}}_{i\setminus l}^{*}), {{\varvec{z}}_i}\right) -\hbox{Pr}\left({Y}_{i}=j\mid {\varvec{x}}=(0,{\varvec{x}}_{i\setminus l}^{*}), {{\varvec{z}}_i}\right) . \end{aligned}$$

If the number of possible categories is greater than two, the discrete change indicates how the probability of assuming a particular level adjusts when we move from the examined level to the reference one.

When the same covariate influences also the scale component (\(x_{il}=z_{im}\)), ME measures become

$$\begin{aligned} \begin{aligned}&\frac{\partial \hbox{Pr}\left[ Y_i=j\mid {\varvec{x}}_i = (x_{il},{\varvec{x}}_{i\setminus l}^{*}),{{\varvec{z}}_{i\setminus m}^{*}} \right] }{\partial x_{il}}\\&\quad =\left\{ \begin{array}{ll} -\frac{\left[ \beta _l+\gamma _m \times (\alpha _j-{\varvec{x}}_i{\varvec{\beta }})\right] }{\exp ({\varvec{z}}_i{\varvec{\gamma }})} \times f_j &{}\quad j=1 \\ -\frac{\left[ \beta _l+\gamma _m \times (\alpha _j-{\varvec{x}}_i{\varvec{\beta }})\right] }{\exp ({\varvec{z}}_i{\varvec{\gamma }})} \times \left( f_j- f_{j-1} \right) &{}\quad j=2,\ldots ,k-1 \\ \frac{\left[ \beta _l+\gamma _m \times (\alpha _j-{\varvec{x}}_i{\varvec{\beta }})\right] }{\exp ({\varvec{z}}_i{\varvec{\gamma }})} \times f_{j-1} &{}\quad j=k. \end{array}\right. \end{aligned} \end{aligned}$$
(2.4)

According to the way we fix the values of the covariates, we can obtain three different types of ME measures. Among the alternatives, extensively reported in Iannario and Tarantola (2021), Long (1997), Long and Freese (2014), Sun (2015), the Average Marginal Effect (AME) are obtained calculating the mean of the marginal effects evaluated on the n sample values; Marginal Effects at Representative values (MER) are obtained calculating the marginal effect for specific values of interest in the examined study; Marginal Effects at the Mean (MEM) are obtained by computing the marginal effects setting all covariates equal to their mean value. In our analysis, we analyse MER (ME hereafter) measures. Specifically, the mean is used for continuous variables and the reference category is considered for factor ones. Marginal effects and the corresponding posterior distribution can be obtained via the MCMC sampler following Bürkner (2017). MCMC standard errors are also computed following the main content in Flegal and Jones (2011).

2.2 Bayesian ordinal superiority measures

Ordinal superiority measures for group comparison were introduced by Agresti and Kateri (2017), and further discussed in Iannario and Tarantola (2021). We will now present an extension of these measures in a Bayesian context. An application to our location-scale model will be presented in Sect. 3.

Given a categorical variable d playing a role in the scale parameter, we use these measures to compare the probability that an observation from one group \(g_0\) is scored above an independent observation from the alternative group \(g_1\). Let indicate with \({\varvec{w}}_{\setminus d}\) the set of all covariates except for d. The following two ordinal superiority measures \(\Delta \) and \(\gamma \) were proposed in Agresti and Kateri (2017)

$$\begin{aligned} \Delta ({\varvec{w}}_{\setminus d}^{*})= & {} \hbox{Pr}(Y_{g_0}>Y_{g_1})- \hbox{Pr}(Y_{g_1}>Y_{g_0});\\ \gamma ({\varvec{w}}_{\setminus d}^{*})= & {} \hbox{Pr}(Y_{g_0}> Y_{g_1})+\frac{1}{2}\hbox{Pr}(Y_{g_0}=Y_{g_1}). \end{aligned}$$

These two measures are functionally related and they differ by their reference value. A positive value of \(\Delta ({\varvec{w}}_{\setminus d}^{*})\) indicates that it is more likely to obtain a higher rating in \(g_0\) than in \(g_1\). The same consideration holds for \(\gamma \), if it is greater than 0.5.

Their Bayesian estimates can be obtained generating a set of S samples for the posterior predictive probability distribution. At a generic value s (\(s=1, \ldots S\)) and for a specific value \({\varvec{w}}_{\setminus d}^{*}\) the ordinal superiority measure \(\Delta \) with respect to the variable d is given by

$$\begin{aligned} \Delta ^s({\varvec{w}}_{\setminus d}^{*})= & {} \hbox{Pr}^s(Y_{g_0}>Y_{g_1})- \hbox{Pr}^s(Y_{g_1}>Y_{g_0})\\= & {} \sum _{l>k}\pi ^s_{0l}({\varvec{w}}_{\setminus d}^{*})\pi ^s_{1k}({\varvec{w}}_{\setminus d}^{*}) - \sum _{k>l}\pi ^s_{0l}({\varvec{w}}_{\setminus d}^{*})\pi ^s_{1k}({\varvec{w}}_{\setminus d}^{*}) \end{aligned}$$

where \(\pi ^s_{0j}({\varvec{w}}_{\setminus d}^{*})=\hbox{Pr}^s(Y = j\mid d = 0, {\varvec{w}}_{\setminus d}^{*})\) is the predictive probability distribution obtained from the examined model for \(g_0\); \(\pi ^s_{1j}({\varvec{w}}_{\setminus d}^{*})\) is obtained in a similar way for \(g_1\), that is \(\pi ^s_{1j}({\varvec{w}}_{\setminus d}^{*})=\hbox{Pr}^s(Y = j\mid d = 1, {\varvec{w}}_{\setminus d}^{*})\).

The Bayesian estimates of \(\Delta \) is given by

$$\begin{aligned} \widehat{\Delta ({\varvec{w}}_{\setminus d}^{*}})=\frac{1}{S}\sum _s \Delta ^s({\varvec{w}}_{\setminus d}^{*}). \end{aligned}$$

A value of \(\widehat{\Delta ({\varvec{w}}_{\setminus d}^{*}})\) greater than zero indicates that it is more likely to obtain a higher rating in \(g_0\) than in \(g_1\). Alternatively, one can calculate the \(\gamma \) measure having null value equal to 0.5. Its Bayesian estimate under the squared error loss, is given by

$$\begin{aligned} \widehat{\gamma ({\varvec{w}}_{\setminus d}^{*}})=\frac{1}{S}\sum _s \gamma ^s({\varvec{w}}_{\setminus d}^{*})\quad \hbox{with}\quad \gamma ^s({\varvec{w}}_{\setminus d}^{*})=(\Delta ^s({\varvec{w}}_{\setminus d}^{*})+1)/2. \end{aligned}$$

MCMC standard errors can be easily obtained from the estimated values, providing an indication of the variability of the considered measures.

3 University curriculum counsellor data analysis

The model discussed above and its implementation is illustrated next on a real case study. The associated data refer to University of Naples Federico II curriculum counsellor and are the result of a well thought-out experimental design. They consist of sample information stratified with respect to Faculty, gender, day of the week. The survey has been replicated in different years even if the research design does not involve repeated observations of the same variables over periods of time; a posteriori check confirmed that proportions of selected students for interviews in different subgroups have been substantially consistent with expected proportions in the strata. The interviewed students were asked to reply to questions regarding the orientation services provided by the 13 faculties. The purpose of these questions was to discover improvement tasks such as in education service and curriculum composition and explore strategies for improving efficiency in increasing the students’ satisfaction. The sampled students considered in the analysis were only users of the evaluated service. Respondents were asked to express on a seven-point scale, ranging from 1 = ‘Completely Unsatisfied’ to 7 = ‘Completely Satisfied’, opinions with regard to satisfaction. Questionnaires have been filled anonymously and are composed by two main sections. The first one concerns some demographics and students related topics (i.e., Gender of the respondent, a dichotomous variable (1= Female, 0 \(=\) Male); Age, on continuous scale; Freq serv, a factor with levels: 0 \(=\) Not-Frequent users, 1 \(=\) Frequent users; Diploma of admission, a factor with levels: 1 \(=\) Classic Studies, 2 \(=\) Scientific studies, 3 \(=\) Linguistic, 4 \(=\) Professional, 5 \(=\) Technical/Accountancy, 6 \(=\) Others; Full-time position, a binary variable with levels: 1 \(=\) Full-time 0 \(=\) Otherwise; Area of study, a factor variable related to the different Faculties: 0 \(=\) Scientific, 1 \(=\) Health Science and 2 \(=\) Humanistic). The second section is about Likert-type item questions (ratings) related to respondents’ satisfaction towards 5 different items: Information, Willingness, Opening Hours, Competence and Global satisfaction. The reported analysis refers to 2002, 2003 and 2004 waves because of homogeneous sampling. For waves 2007 and 2008, three more items have been added with reference to Usefulness, Structure and Advertising of the Service: for an indepth study analyses for data concerning the last waves, see Corduas et al. (2010), Iannario (2008).

The analysed data consist of 2179, 2535, 3183 complete responses for the three waves, respectively. The main information is reported in Table 1 whereas Fig. 1 shows the distribution of the Area of study in the three waves.

Table 1 Summary statistics concerning the characteristics of the respondents of the surveys
Fig. 1
figure 1

Distribution of the Area of study (0 \(=\) Scientific, 1 \(=\) Health Science and 2 \(=\) Humanistic) in the three waves: left (2002), middle (2003), right (2004)

Among the ratings, we point out results on the evaluation of the Office hours (Office)—the duration of office hours (see Table 2). The selection is related to the comparatively larger heterogeneity for this item than the others (see Capecchi and Piccolo 2010 for further details on this issue). Table 2 also reports summary statistics concerning the global satisfaction expressed in the three waves.

Table 2 Summary statistics concerning Office hours and Global satisfaction

From a preliminary analysis of the data, we identified the following relevant covariates: Gender of the respondent; Age; Freq serv, and Area. The dependence between the ratings concerning Office hours satisfaction and Area of study in the three waves is clearly evident in the mosaic plots displayed in the three panels of Fig. 2. Area represents a covariate which affects both location and scale components in the estimated models.

Fig. 2
figure 2

Mosaic plot for Office hours satisfaction and Area of study in the three waves: upper row (2002), middle (2003), lower (2004). Each mosaic plot reports information on the seven-point satisfaction scale versus the area of study (0 \(=\) Scientific, 1 \(=\) Health Science and 2 \(=\) Humanistic)

The Bayesian estimates of the location and scale parameters are reported in Table 3 (posterior mean, MCMC Standard Error and 95% credible intervals). These results are obtained via the R package brms (Bayesian regression model using “Stan”); see Bürkner (2017). Standard convergence diagnostics have been considered. Possible interactions among covariates were tested, but they yield posterior support to zero.

Table 3 Bayesian estimation for the location-scale model; i.e., posterior mean estimates, standard deviations and 95% Credible Intervals (CI) for its parameters

We run in parallel 4 chains of 2000 iteration with a burnin period of 1000 iteration each. The Bayesian estimate of the standard deviation is obtained from the posterior samples of log-disc (log-discrimination) with disc corresponding to the inverse of the standard deviation. More precisely, for every iteration t, \(t=1, 2, \ldots , T\), we transformed \(\log (disc)^t\) to \(sd^t\) with \(sd^t = 1/\exp (\log (disc))\); the Bayesian estimates of sd is then obtained as the average of all \(sd^t\).

To assess the better performance among models with or without scale effect, we compute the WAIC index reported in (2.1); see Table 4 for the results. Here, with exception of the third wave (2004), where WAIC indexes are closely related, the mentioned indexes support the presence of heterogeneity in the data.

Table 4 Watanabe–Akaike information criterion (WAIC) for the location-scale and cumulative models in the three waves

In order to further evaluate the effect of heterogeneity we rely on conditional effects on specific covariates with respect to the response variable. First of all we concentrate upon the variable Freq serv affecting only the location component. In Fig.  3 we provide a visual representation of the estimated relationship between Freq serv and Office hours. Each panel displays the estimated probabilities of the seven response categories for the two groups in the three waves. We notice that a higher evaluation is provided by students using more the service with an increasing distance in the three waves, that is students who frequently used the service were more satisfied of Office hours than the others. In Fig.  4 we report the evaluation for the different Area of study, a covariate affecting both the location and the scale component. First of all, it is possible to observe a reduced evaluation of the service in the three waves checking for the extreme values (pink colour \(=\) 7). Health Science students provide the highest reduction of the maximum evaluation in the three waves. Focusing on each wave, students of Scientific area tend to provide a higher evaluation than students of Health Science and Humanistic area. The latter, by inspection of scale effects in Table 3, present for the first wave a lower level of heterogeneity with respect to the cluster of students attending Scientific area. More precisely, the standard deviation of the latent variable for wave 2002 was lower both for Health science (\(Sd=0.68\)) and Humanistic area than Scientific one (\(Sd=0.71\)) for whom the standard deviation was fixed to 1. For waves 2003 and 2004 the difference with respect to Scientific area was not substantial. Generally a reduction of scale effect is assessed during the 3 years. Fig. 5 summarizes the main findings. The size of the arrows displays the reduced heterogeneity in the three waves, whereas the comparison among Area in the three waves supports the comprehensive best performance of the Scientific Area.

Fig. 3
figure 3

Marginal effect for Freqserv in the three waves: upper (2002), middle (2003), lower (2004). Points indicate the posterior mean estimates and error bars corresponds to the 95% Credible Intervals

Fig. 4
figure 4

Marginal effect for Area in the three waves: upper (2002) middle (2003), lower (2004). Points indicate the posterior mean estimates and error bars corresponds to the 95% Credible Intervals

Fig. 5
figure 5

Summary results concerning Office hours satisfaction and Area of study in the three waves

All these results are underlined by means of the main findings of the Bayesian ordinal superiority measures in Table 5. Here, the different evaluation of Office hours depending on the Area of study is further stressed signalizing the need of a changing timing for the counsellor services in area different from Scientific one in order to improve the evaluation. The only exception is reserved to Health Science students in the first wave (as also remarked in Fig. 5 and in Table 5). More precisely, for 2002 the order superiority measures, \(\widehat{\Delta _{01}}=-0.1380\) and \(\widehat{\gamma _{01}}=0.4310\), indicate a higher probability to observe an elevate evaluation of the response variable in group 1 (Health Science) respect group 0 (Scientific area).

Table 5 Bayesian estimation of ordinal superiority measures; i.e., posterior mean estimates, standard deviations and 95% Credible Intervals (CI) for its parameters

Looking for other covariates, taking into consideration the variable Age, in Table 3, we observe that a higher evaluation of the service is provided by older people, whereas for Gender, male seems more critical with the service.

4 Discussion and conclusion

The overall level of satisfaction with university curriculum counsellor focused on students interpreted as ‘consumers’ can accurately diagnose the current status of university services and contribute to improving the quality of them. Our contribution fits this area, analysing data collected in 2002, 2003 and 2004 at University of Naples Federico II. The analysis allowed to detect some trends in the respondents’ behaviour and to relate observed patterns to the subjects’ characteristics. The study has been pursued employing of location-scale models in a Bayesian framework taking into account the presence of heterogeneity and improving the accuracy in providing parameter estimates.

Main results show different evaluation of Office hours depending on the Area of study signalling the need of a changing timing for the counselling services in area different from the Scientific one. The only exception is reserved to Health Science students in the first wave. Furthermore, male and young students seem more critical with the service. The scale effect remarked on the Area for the first two waves reduced in the third ones (2004) where the WAIC index do not discriminate in favour of location-scale framework. The Bayesian analysis allowed improving the flexibility in specifying the model and providing model summaries. Estimates of the proposed order superiority measure, together with their credible intervals, can be easily obtained via the MCMC output. The model can also be easily extended to include prior information deriving from other studies if it becomes available.

Further analyses concerning other aspects of the service may improve the institution overall service strategy. Furthermore, an analysis in which we scrutinize the complex relationships between the latent variable on different levels (faculties), exploiting a multilevel framework, may allow us to study how group membership is expected to influence data analysis results. This represents a future research hypothesis following the framework proposed in Doroshenko and Liseo (2023), Hedeker et al. (2008). Furthermore, the adoption of a Bayesian nonparametric approach (such as for example Barone and Dalla Valle 2023; Turek et al. 2021) for rating data and its comparison to the current results in terms of this case study is a possible follow-up work.