1 Introduction

For many years, competitions have allowed significant advancements in the field of timetabling, moving the research toward more and more practical problems. The International Conference on the Practice and Theory of Automated Timetabling (PATAT)Footnote 1 supported many timetabling competitions over the years. A recent survey on educational timetabling benchmarks and competitions is available at Ceschia et al. (2023).

In 2002–2003, the First International Timetabling Competition considered simplified randomly generated course timetabling problems. The post-enrollment problem was solved, where course enrollments of students are pre-defined, and courses must be assigned in timeslots and rooms without any course overlap for students. For more details, see the paper by Kostuch (2005) presenting an improved version of the winning competition entry or related article written by one of the organizers (Lewis 2008).

The second competition in 2007 (McCollum et al., 2010) had two course timetabling tracks. One of them slightly extended the post-enrollment problem from the first competition (Lewis et al., 2007), and the other introduced curriculum-based timetabling based on real-world problems from the University of Udine in Italy (Gaspero et al., 2007). The curriculum contains a set of courses, which must be assigned into time slots with no overlaps. A more practical approach was also taken by the third examination timetabling track (McCollum et al., 2007), where particular problems were taken from existing institutions.

The third competition organized in 2011 related to educational timetabling was focused on high-school timetabling problems (Post et al., 2016). Again, there is a clear transition toward real-world problems, and we can see a complex common problem faced by thousands of educational institutions worldwide. An archive of around 50 real-world high-school timetabling data sets was created (Post et al., 2014), where all the problems are defined using an XML format for benchmarks in high-school timetabling (Post et al., 2012).

This paper discusses the Fourth International Timetabling Competition (ITC 2019) (Müller et al., 2018). It introduced a variety of real-life university course timetabling problems coming from different parts of the world. A novel model of a complex course timetabling problem allows the specification of problems from various universities. In the competition, representative problems from ten universities worldwide were considered. However, they introduce a fraction of the institutions using UniTime,Footnote 2 a non-commercial software from which the instances for the competition were taken. Thirty competition and six test instances are available on the competition websiteFootnote 3. The website allows for solution validation and provides a repository of existing solutions.

We will discuss the characteristics of the real-world course timetabling problems considered in the competition and demonstrate that the proposed model allows the encapsulation of very different features. We aim to provide a clear view of this competition and comprehensive information about its importance and results. To analyze particular instances, we have selected representative features to study their complexity with the help of existing upper and lower bounds (DSUM, 2023) of the solution quality. The final part of the paper concentrates on discussing and comparing the published solvers, including the description of the best computational results and the results achieved with the specified runtime. For a detailed specification of the competition problems, we refer to the paper (Müller et al., 2018), published at the PATAT 2018 conference, where the plenary talk about ITC 2019 was given.

The following section provides a summary of the competition organization. Section 3 contains an overview of the competition problem, with more details on the competition website. Section 4 introduces timetabling problems for each contributing university, and Sect. 5 discusses the different characteristics of these real-life problems. Section 6 provides a characterization of the benchmark instances, institution by institution, with the help of selected representative features and solution bounds. Section 7 describes and compares the existing solvers and their experimental results. The paper is concluded with a summary in Sect. 8.

2 Running the competition

The competition was announced at PATAT 2018, the 12th International Conference on the Practice and Theory of Automated Timetabling, where the competition framework and technical descriptions were published (Müller et al., 2018). The competition has its website, https://www.itc2019.org, maintained at Masaryk University by competition organizers and authors of this paper. The website describes the competition, the timetabling problem, the rules, the benchmark instances, and a few smaller test instances. It also includes a web service that validates the solutions for the competition instances and allows the valid solutions to be uploaded to the website. The solution validator is based on the UniTime solver. The best-uploaded solution for each competitor and instance is used to rank the competitors.

Three groups of benchmark instances have been published during the competition. While the order of the competitors was based on all the published benchmark instances, the instances released later in the competition had a much higher weight. Points were awarded to each solution based on the rank among the competitors and the type of the instance (early, middle, or late) using the Formula One ranking scheme. Two milestone submissions of the results for the early benchmark provided feedback to competitors during the long run of the competition. Since the competition problems were very complex, the competitors had one year to submit their solutions. Nowadays, the competition website is running, and solutions are updated. There are about 490 registered users from 66 countries, and we can see increases in these numbers all the time. In January 2020, at the end of the competition, we had 200 registered users, and interestingly, 60 new users appeared in the last six months. Twenty teams uploaded solutions to one or more competition benchmarks, and over 50 users successfully validated at least one solution as of October 2023.

3 Problem description

Our problems use a hierarchical course structure to model the presence of students in different parts of a course. A course may contain one or more course configurations, each with one or more classes that can be of different types and have an optional parent–child relationship. These classes are to be timetabled into rooms and (time) periods for a limited number of students. A class may occupy multiple periods, possibly spanning multiple days and weeks. It allows us to model classes with multiple meetings at the same time and room (e.g., Monday–Wednesday–Friday 9 am) and/or classes taught only during certain weeks of the semester (odd or even weeks, for instance). All benchmark instances have five-minute periods, going from midnight to midnight, seven days a week, and running for a given number of weeks (between 13 and 21). It permits a flexible time organization and supports various irregularities and other exceptions in class placements. Rooms have defined capacity and possibly unavailable periods. Each room may have specific travel times to other rooms to ensure attendance of classes at distant rooms. Each time and room for a class may have a penalty specifying how good or bad a given placement is.

Students are enrolled in courses and will be assigned to classes based on the defined course structure. A student must get one class of each type from a single course configuration, following the parent–child relationship when defined. For example, each student must get a lecture and a seminar, where only some lecture–seminar combinations are allowed. Sample course structure taken from UniTime interface is available in Fig. 1.

Fig. 1
figure 1

Sample course structure with the lecture, recitation, and laboratory in parent–child relationship

Finally, there are soft and hard distribution constraints of nineteen types defined on subsets of classes. Most constraints can be validated on pairs of classes, i.e., each pair that does not satisfy the constraint incurs a penalty. Examples are SameDays (all classes must be taught on the same day), NotOverlap (classes should not overlap), or SameAttendees (classes cannot overlap in time and must be placed so the attendees can travel between them). Some constraints may include parameters, for example, WorkDay(S) constraint penalizes placement of classes during the day longer than S time slots, or MinGap(G) penalizes the placement of classes closer than G time slots. Four types of constraints must be validated on the whole subset of classes. MaxBreaks constraint limits the number of breaks between classes daily. MaxBlock constraint limits the time scheduled without a break. MaxDays constraint requires classes not to spread over more than a certain number of days in the week. Finally, the MaxDayLoad constraints the number of class time slots each day.

There are four essential optimization criteria. The goal is to minimize penalties for time and room assignments of classes, penalties for unsatisfied soft distribution constraints, and the number of student conflicts. Minimizing the number of student conflicts is a fundamental part of the problem, which is crucial for university course timetabling. A student conflict exists if the student cannot attend a pair of his/her classes. The conflicts are not only between classes that overlap in time but also between classes that students cannot attend due to travel distances between assigned rooms.

Detail descriptions of particular features and XML specifications are available in Müller et al. (2018) and on the competition website.

4 Real-world problems from universities

All the competition instances in ITC 2019 represent real-world problems that have been collected from various universities around the world. These universities have been using UniTime for course timetabling in production for years. Out of the 100+ universities that are using UniTime in practice, we have tried to pick a decent sample of institutions that vary in size, country/continent, level of support they needed, and other aspects like how they were collecting student demand data. We have asked each institution for at least two data sets from at least two different semesters (e.g., one Fall and one Spring semester) that have already been timetabled using UniTime. The potential contributors were also provided with simple instructions on exporting UniTime’s course timetabling solver data in an anonymized XML format.Footnote 4

There were three institutions that we have worked closely with in the past and that have readily agreed to provide us with data sets (Masaryk, Purdue, and AGH). These are discussed in more detail below. The remaining institutions are summarized in the last subsection, as we have less information about them.

4.1 Masaryk University

Masaryk University (Czech Republic) uses UniTime for seven out of its ten faculties. We have included problems from three faculties with considerable differences (see Table 1).

Table 1 Competition instances from Masaryk University

The timetable for the Faculty of Informatics can be generated based on pre-enrollments of students into courses. Otherwise, it is a relatively standard mid-size problem with about 500 classes, each scheduled weekly or sometimes bi-weekly. Most courses only have one lecture and/or multiple seminars from which the students can choose. Classes are mostly scheduled once a week for a longer time, such as 2 h, representing a typical example of a European institution.

The Faculty of Sports Studies timetables are significantly influenced by traveling to various sports facilities across the city. It necessitates considering travel time between every two rooms to handle the attendance of classes with the same students. They also offer sports for students of other faculties of the university. These are not included in the problem as they are timetabled separately but may influence the availability of certain sports facilities.

In the Faculty of Education, we can see specific curricula patterns, typically composed of a pair of specializations, each representing one field of study such as Mathematics, Physics, English, or Music. Typical combinations, such as Mathematics–Physics or English–History, may involve many students. Some other, less popular combinations, like Music–Chemistry, or Art–Mathematics, have only a few students. This results in a very diverse set of student course demands. These pairs may result in many student conflicts because it is impossible to satisfy all the possible combinations.

There are two different types of problems for the Faculty of Education and Faculty of Sports Studies, representing (1) the present regular form of study and (2) distance learning (Müller and Rudová 2016). The Faculty of Education and the Faculty of Sports Studies construct separate timetables for distance learning. Here, the lifelong and combined study forms are included so students can combine studies with work duties. This problem is uncommon and complex. The students only come to school once a week or once every two weeks (e.g., every Friday or Saturday) and have all their classes during that day. Each distance learning course may have only two or three meetings during the semester, which occur on different weeks. Each of these meetings is usually modeled as a single class and linked with the other meetings using distribution constraints (e.g., required DifferentWeeks). It results in a very irregular pattern of timetables being different each Friday or Saturday during the semester. Consequently, these problems are often very hard, which is also reflected by the high gap between the best-known solution and its lower bound (DSUM, 2023) (47.53 % for muni-fsps-spr17c, 50.38 % for muni-pdf-spr16c, and 64.34 % for muni-pdfx-fal17).

4.2 Purdue University

Purdue University uses automated timetabling for all its departments together (Rudová et al., 2011), which means that we can see a large-scale problem representing all courses of a large public university with about 40,000 students. The timetable construction starts with timetabling for the large lecture rooms and large active learning classrooms (LLR), which the university shares and are timetabled centrally. Active learning rooms have specific features to facilitate active learning, like variable seating arrangements, dual projections, or additional equipment. In the LLR problem, there are only a few courses for each student, but the room utilization is very high since large rooms for hundreds of students represent a scarce and expensive resource. Individual departmental problems are solved later, on top of this large lecture room problem. As the last one, the problem with shared computer laboratories is solved. In this problem, there are a lot of symmetries as many computer laboratories share the same characteristics, so it is easier to find a good solution.

At Purdue, we can also see a typical example of an American university where classes are taught several times a week at the same time and room, for instance, Monday, Wednesday, and Friday at half-hour (7:30 am, 8:30 am, ... 4:30 pm). Also, courses may be taught using different patterns, e.g., two times a week for ninety minutes or three times a week for one hour.

At Purdue, courses have a rich structure. For example, an introductory Biology course is offered to most first-year students at the university. Such a course may have a couple of large lectures needed to cover the student demand. Besides a lecture, each student may need to take a laboratory and a seminar, which are typically taught for much smaller groups of students. The parent–child relations may be used to link students of particular laboratory and seminar pairs together so that the same instructor can teach them. The course can also be offered in multiple configurations.

In contrast with many other problems discussed in this chapter, there are neither curricula nor pre-enrollments. The timetable is constructed based on last-like semester’s course enrollments (e.g., timetable construction for Fall 2019 used real course enrollments for Fall 2018 to ascertain student demand and enrollment patterns).

Purdue University buildings are located on a large campus, so timetabling needs to consider non-negligible travel times, which are estimated based on the GPS coordinates of each room.

The competition instances contain just the large lecture and large active learning rooms problem (LLR, instance pu-llr-spr17), five departmental problems built on top of the LLR problem (instance pu-d5-spr17, with relevant LLR classes fixed in time and space), nine departmental problems including LLR (instance pu-d9-fal19), and the whole problem including all departments (pu-proj-fal19) that was used for projection simulation for Fall 2019.

4.3 AGH University of Science and Technology

AGH University of Science and Technology from Poland builds course timetables separately for each faculty. Still, they share some resources, and some faculties provide many courses for students outside their faculty. For instance, in our benchmarks, the Faculty of Humanities has almost two-thirds of the classes for students of other faculties. The data are structured so that the courses for students from these faculties can be managed and timetabled separately.

AGH uses relatively rigid curricula, only containing mandatory and elective courses. In the original (UniTime) problem, students of the same curriculum are kept together and attend the same classes. There are no student conflicts allowed to be created by the solver. Curriculum reservations are used to restrict students of certain curricula to particular classes. In recent years, AGH has been moving away from this model, making use of pre-enrollments and student scheduling capabilities of UniTime. They still use curricula for students with no pre-enrollment, and curriculum or student group reservations allow them to direct students to specific classes in necessary cases.

The competition data contain four Spring 2017 instances, each for an individual faculty (combining courses for both internal and external students). The Faculty of Mining Surveying and Environmental Engineering (agh-ggis) and the Faculty of Geology, Geophysics, and Environmental Protection (agh-ggos) also have a mix of regular and part-time students. The Faculty of Physics and Applied Informatics (agh-fis) and the Faculty of Humanities (agh-h) have many courses for students of other faculties (offering 46% and 73% of classes for outside students, respectively). The Fall 2017 instance (agh-fal17) contains six faculties (the four faculties from Spring plus the Faculty of Drilling, Oil, and Gas, and the Faculty of Energy and Fuels), all loaded together, offering 78 student programs and allowing for some cooperation between faculties.

4.4 Other Universities

The Maryville University in the USA (prefix mary) presents problems of another American university, but it is much smaller than Purdue and introduces a more typical user of UniTime. All courses are timetabled simultaneously, with a similar organization of times and courses.

The Universidad Yachay Tech, Ecuador, represents one of the smallest problems in the competition, but it has the highest room utilization in our set of instances. It is based on curricula, with most classes meeting weekly for 90, 120, or 180 min during the semester. There are only 28 rooms shared between 5 departments.

Four institutions from Asia are in the competition, three of which have no students in the data. These universities decided to model student course requirements using the SameAttendees or NotOverlap distribution constraints. These are the Turkish-German University (prefix tg) and the İstanbul Kültür University (prefix iku) from Turkey, and the Lahore University of Management Sciences (prefix lums) from Pakistan.

The Turkish-German University also has a few courses, but each course contains many classes that meet only once during the semester. It is a consequence of many teachers coming from Germany, so each class meeting is timetabled individually based on the instructor’s availability.

The Lahore University of Management Sciences allows most classes to meet once or twice a week, but most classes that meet two times a week must have one day between the meetings (Monday–Wednesday, Tuesday–Thursday, etc.). Most classes meet at the same time and room for the whole semester.

The İstanbul Kültür University provides two of the largest instances in the competition besides Masaryk, Purdue, and AGH instances. While there are no student demands and no special distribution constraints, their size makes them exceptional.

The Bethlehem University (prefix bet) from Palestine represents another mid-size university with course timetabling based on student pre-enrollments and classes meeting multiple times a week for the whole semester. While all the competition solvers provided similar results for the two instances from this university, they are among the instances with the highest gap between the best-known solution and the lower bound (DSUM, 2023).

For the University of Nairobi (prefix nbi), we have the only instance from Africa in the competition. It is a curriculum-based problem, with most classes meeting once a week and scheduled for the whole semester.

5 Benchmark instances

Early, middle, and late instances were published during the competition, introducing thirty benchmark instances. Summarized information about particular benchmark instances is available in Tables 2, 3, 4, and 5. The first column of each table specifies the Instance of the data set. Further characteristics are discussed in the following sections.

5.1 Size of the problem

Problem size is a prominent distinguishing characteristic. Instances considering only one school or faculty may involve about 500 classes, 2000 students, and 50 rooms. Other problems that represent timetabling for a large part of a university may consist of about 2800 classes, 35,000 students, or 200 rooms, for example. The biggest problem in the competition with almost 9000 classes covers courses of a whole university. Such problems are rarely timetabled as a single data instance in practice, except for planning, projections, and enrollment simulations.

Table 2 Base information about benchmark instances

Table 2 provides base information about the number of Courses, Classes, and Rooms in each instance. We can see that some of the classes are fixed, having only one possible time and, if they need a room, only one possible room available to them, which could be due to a strong relationship to other courses of the school that have been timetabled earlier. Such already timetabled classes, e.g., those that share an instructor, are a part of the data set with fixed placements. For example, muni-pdf-spr16c has almost half of the classes fixed. It consists of combined and distance learning courses being timetabled on top of the regular form of study (Müller and Rudová 2016). We can see the number of classes without a room (no r.). The number of rooms ranges from 15 to 768, resulting in significant differences in problems.

5.2 Students and course demands

The real-life data may have the student course demands collected from various sources. Some problems include pre-enrollment of students to courses. These can be very diverse and may introduce a high violation of student requests as there can be a few students found for almost any conceivable combination of two courses. Another option is to consider last year’s student course enrollments (called last-like enrollments), which reflect the past year’s situation, rather than providing information independent of the past timetable and courses.

On the other hand, student course requests can be based on curriculum requirements, which are typically easier to satisfy (Müller and Rudová 2016). Here, large groups of students are taking the same or very similar set of courses as they are following the same program of study. Such schools’ curricula may be standard with some compulsory and optional courses. However, some instances, such as muni-pdf-spr16, may contain hundreds of curricula. Many students are expected to have two specializations, with some combinations being far less popular than others and having only a few students. Such curricula may be more challenging to satisfy.

Table 2 also provides information about students. For each instance, it shows the number of Students, the average number of courses a student is requesting (St.courses), and the average number of classes a student is to be enrolled into (St.classes). Instance iku-fal17 represents an example of a school where no student data are available, and student needs are specified using the SameAttendees or the NotOverlap constraints instead. In the remaining instances, there are typically a few thousand students in each of them. The highest numbers of students are in Purdue University problems. The pu-llr-spr17 instance has 27,018 students, but it also has a small average number of courses and classes per student because it only contains large classes that are timetabled centrally (Rudová et al., 2011). All Purdue problems are combined in the pu-proj-fal19 instance with the highest number of students, 38,437. The very high number of classes of a student (17.35 for muni-pdf-spr16c, 29.92 for agh-ggis-spr17) results from a small number of weeks per class, as it is seen in Table 4. In these problems, many courses are split into many individual classes that meet irregularly during the semester.

5.3 Distribution constraints

Instances may differ by the importance and the amount of distribution constraints. Some schools may use many distribution constraints, while others do not rely on them so heavily. For example, some problems have the WorkDay, the MaxBlock, and the MaxDayLoad constraints to provide good schedules for instructors, while other problems may not contain these three particular constraints at all.

Table 3 Statistics about distribution constraints and domains

Table 3 provides information about the distribution constraints. The total number of hard and soft distribution constraints is in the columns Hard distr./Soft distr.. The columns Hard cl.pairs/Soft cl.pairs show the number of pairs of classes with a (hard or soft) distribution constraint that can be binarized. We can see that distribution constraints are uncommon in some problems (pu-llr-spr17 given its size), while others rely heavily on them (tg-fal17 with constraints used to handle students). The numbers of special constraints that cannot be binarized (MaxDays, MaxDayLoad, MaxBreaks, and MaxBlock) are summarized in the column Spec. const.. Here, hard and soft constraints are specified using “/”. It is clear that these special constraints are used only at some institutions, and their numbers are not very high. However, they are still essential to express institutions’ specific needs, resulting in the acceptance of the generated timetables.

In practice, many distribution constraints are usually set on an instructor among the classes the instructor is expected to teach. Besides the SameAttendees that is used to ensure that the instructor can teach all his/her classes in the competition problem, there are sometimes additional requirements provided by the instructor, like the SameRoom, SameDays, MaxDays, MaxDayLoad, or MaxBlock. Institutions with classes meeting multiple times a week following a standard set of time patterns (such as pu-llr-spr17) usually have far fewer distribution constraints, as the instructor load is already spread more evenly over the week by having each class meeting multiple times. Also, having a large number of classes that do not meet for the whole semester leads to additional distribution constraints as, for instance, a course that only meets three times a semester usually requires all three meetings to be on different weeks, preferring to start at the same time and be placed in the same room (muni-pdf-spr16c).

5.4 Domains of variables

The second part of Table 3 specifies statistics related to the domains of decision variables. We can see the average available class times (Avg.times of a class). While some problems are very limited in the time placement (specifically the USA problems represented by instances pu-llr-spr17 with 9.28 and mary-fal18 with 11.37 available times), some others have many options (agh-h-spr17 with 236.35 or muni-fsps-spr17c with 124.74). The same information is provided for rooms in Avg.rooms of a class. There is, again, interesting diversity, e.g., muni-fsps-spr17 has 3.15 available rooms on average while having 44 rooms in total. It shows specific characteristics of rooms in the Faculty of Sports Studies (Müller and Rudová 2016) representing various sports facilities. The diversity is even more evident, considering the average domain size representing the combined time and room placement. Some problems have more than a thousand values available (agh-fis-spr17), while others have only tens values (muni-fsps-spr17). The average availability (Avg. availab.) is the percentage of class placements (time \(\times \) room) that are available. A lower percentage indicates that some rooms are unavailable for timetabling at otherwise suitable times.

Most lectures at Purdue (e.g., see pu-llr-spr17 containing lectures mostly) follow either the \(3 \times 50\) or the \(2 \times 75\) time pattern, meaning that a class meets three times a week on Monday, Wednesday, and Friday for 50 min, or two times a week on Tuesday and Thursday for 75 min, respectively. Moreover, there are only ten available times for the \(3 \times 50\) and seven for the \(2 \times 75\) time pattern, which do not overlap. It leads to small domain sizes and resultant schedules that do not have gaps that cannot be used for additional classes without any need for additional constraints. On the other hand, at Masaryk, a regular lecture is two hours, and it can meet any day of the week with six to twelve possible start times depending on the faculty (six in muni-fi-fal17, twelve in muni-pdf-spr16). Furthermore, the distance learning classes can meet only once a semester in any available week (muni-fsps-spr17c). It makes for far larger domains in the problem. Again, this is another factor why these instances are more challenging to solve, having the high gap between the best-known solution and the lower bound (see the end of Sect. 4.1 for additional discussion).

5.5 Times

Table 4 provides information about patterns and utilization. The number of Weeks mostly ranges from 13 to 21. However, the number of minutes per meeting (Minutes per mtg.) can differ significantly, ranging from 63.52 to 140.5. The Days per class is higher for the USA schools, averaging 1.90 for pu-llr-spr17 (Rudová et al., 2011) and 1.56 for mary-fal18. The Weeks per class is one of the important parameters. We can see benchmark instances where this number is rather close to the number of weeks, but it can also be minimal for classes held only a few times per semester (muni-fsps-spr17c with 1.0, tg-fal17 with 1.16, agh-ggis-spr17 with 4.18, or muni-pdf-spr16c with 5.90).

Table 4 Statistics about date and time patterns, and utilization

5.6 Room utilization

For some benchmark instances, it is not the size of the problem as such but the high room utilization that makes it difficult to solve. In some of these instances, it is not the overall utilization, but clusters of rooms are in high demand. Large classrooms are a typical example of rooms with high utilization. In other instances, utilization may not be the critical component, and the optimization criteria are more emphasized.

The last three columns in Table 4 represent the utilization. The Minutes per class is the average number of minutes a class is taught during the semester (number of weeks \(\times \) number of days of a week \(\times \) minutes per meeting). The number of weeks is included because many classes may be taught for a few weeks only (see the extreme at muni-fsps-spr17c with 110.9, tg-fal17 with 159.8, or agh-ggis-spr17 with 450.2; the other extreme is mary-fal18 with 2809.7 or lums-fal17 with 2572.8). The Minutes per room is the average number of minutes a room is occupied during the semester (minutes of all classes in a room). The differences may be again relatively high. Instances muni-fsps-spr17 and muni-fsps-spr17c have 6008.9 and 2287.8 min, respectively (specific sports facilities are not so much used by the school), while bet-fal17, iku-spr18, mary-fal18, and pu-llr-spr17 occupy rooms heavily with about 20,000 min. The average goes up to 30,000 min in yach-fal17. Also, there may be very high differences in occupancy of different rooms, e.g., muni-fi-spr16 has very high utilization of standard classrooms. At the same time, specialized laboratories are almost empty because they are not primarily used for teaching. Another example is muni-pdf-spr16c with distance learning courses (Müller and Rudová 2016) where classes are scheduled on Saturdays and Fridays, and rooms are almost empty on other days.Footnote 5 The muni-pdfx-fal17 gives a much better picture as both regular and distance learning courses are included in the problem.

The Minutes per student is the average number of minutes a student is in a class during the semester (minutes of all classes a student attends). Some instances do not have actual students (iku-fal17, tg-fal17), while others may include only some student classes such as pu-llr-spr17 (Rudová et al., 2011). Also, some curricula may only include mandatory and elective courses (muni-fsps-spr17), leaving optional courses for the student to pick after the timetable is done. On the other hand, students may be allowed to pre-register for more courses than they will end up enrolling in (muni-fi-fal17), or a curriculum may contain additional courses to ensure no conflicts between possible options for the students (agh-fal17 or yach-fal17).

5.7 Optimization weights

Table 5 specifies the weights for a particular optimization

Table 5 Weights of optimization criteria

criteria, i.e., Time preferences, Room preferences, soft Distribution constraints, and the number of Student conflicts. The weights reflect different institutional needs of each problem and some of the differences in the data. For example, students in muni-fsps-spr17 are following their curricula with only a few choices in elective and optional courses. It is, hence, desired to avoid student conflicts between two mandatory courses or between a mandatory and an elective course. Therefore, overlap** any two classes with many students would result in a very high penalization.

In general, soft distribution preferences are weighted quite highly. However, due to a few soft distribution constraints, this is also because their average penalty is lower than the average penalties for times and rooms. The time preferences are usually more important than room preferences. Student conflict weights vary a lot as they also depend on whether the institution uses curricula, last year’s student course enrollments, or pre-enrollments for their course timetabling.

Since there are some differences between the competition model and the model we have in UniTime, from which the data were collected, the optimization weights were also adjusted to ensure the competition solutions correspond with the results in UniTime. That is, the weighted average penalties for time, room, distribution, and student conflicts, respectively, should reflect the settings from the original instance in UniTime and, through that, the real needs of the institutions.

6 Representative features of benchmark instances

In the previous section, we have discussed instances’ characteristics feature by feature. This section will concentrate on the representative critical features discussed institution by institution. We want to explore what features make some benchmark instances harder to solve than others. To aid with this, we borrow the lower bounds provided by the winning team (DSUM, 2023) and compare them with the quality of the best existing solution (see the detailed information about results in Sect. 7). The gap between the value of the best solution and the available lower bound provides valuable information on how hard the problem is to solve. While there is only one solver capable of providing lower bounds, we believe that the provided bounds are significant, especially since the relative gap ranges from zero (proving that five instances have already been solved to optimality) to 94.46% for agh-fal17 which is the second-largest benchmark instance in the competition in both the number of classes and rooms.

6.1 Representative features

We have constructed graphs with the representative features and the gap for each benchmark instance. Also, it is essential to consider (heavy) weights of particular optimization criteria in Table 5.

The top graph in Fig. 2 shows the Gap (computed as \(1 - \frac{\textrm{lower bound}}{\textrm{best cost}}\) and provided as a percentage), the number of Unfixed classes as a difference between the number of classes and fixed classes, the average number of classes a student is to be enrolled into (St.classes), and the number of pairs of classes with a hard distribution constraint that can be binarized (Hard cl.pairs). The middle graph shows the number of Students, the number of pairs of classes with a soft distribution constraint that can be binarized (Soft cl.pairs), the total number of special constraints MaxDays, MaxDayLoad, MaxBreaks, and MaxBlock in Spec.const. sum, and utilization as the average number of minutes a room is occupied during the semester (Minutes per room). The bottom graph contains information about the average number of available times of a class (Avg.times of a class), the average number of Days per class, the average number of Weeks per class, and the number of Weeks in each benchmark instance. In order to show the values well, the features were split into three graphs, each using a different scale.

Fig. 2
figure 2

Summarizing information about benchmark instances

Let us justify the selection of particular features to demonstrate in Sect. 6.2 how their values influence the problems’ complexity. The Unfixed classes represent the number of classes whose placement needs to be found, so it is more representative than the number of all classes. The numbers Students and St.classes represent the dimension of problems regarding the students and their duties. Hard cl.pairs and Soft cl.pairs provide valuable information about the distribution constraints, which can be binarized. Spec.const. sum takes a look at the remaining distribution constraints. These three features allow us to see problem characteristics regarding distribution constraint requirements. Hard cl.pairs together with the Minutes per room characterize constraint satisfaction component of the problem. Similarly, Avg.times of a class are related to the number of options available for each class’s time placement. Here, the time placement was selected (rather than the room placement or the corresponding domain size) since the time assignment is more critical than the room assignment. It is also clear from Table 5 where the time placement almost always has a higher weight. The difference between Weeks and Weeks per class shows the schedule regularity of the constructed timetable. If both values are close, the same timetable is constructed each week. Significant differences demonstrate that a different timetable is constructed each week, increasing the problem size and complexity. Days per class may demonstrate that a class is taught several times a week, which introduces specific patterns in the constructed timetables (e.g., the same room and time meetings each Monday–Wednesday–Friday). It may simplify the construction of the timetables without gaps and wasted spaces unavailable for other classes. Finally, high differences among weights of optimization criteria in Table 5 may strongly emphasize student sectioning when student conflicts are heavily weighted or course timetabling when the time weight is strongly supported.

6.2 Benchmark instances by institutions

We will discuss the above-proposed representative features in the context of particular benchmark instances and provide information collected for institutions using the same structure as Sect. 4.

6.2.1 Masaryk University

The first three instances in Fig. 2 demonstrate how changes in the same problem may lead from an instance that is almost optimally solved (muni-fi-spr16) to instances where the gap is still relatively high (muni-fi-spr17 and also muni-fi-fal17). In muni-fi-spr16, we can see less than half of the special constraints, a smaller utilization, and fewer unfixed classes, resulting in a ten times smaller gap.

The next three instances from Masaryk University represent the Faculty of Sports Studies. The instance muni-fsps-spr17 is now optimally solved. There are fewer unfixed classes and students, very few hard and soft class pairs, low utilization, and no special constraints. Even though we have some discrepancies between the number of weeks and weeks per class, it does not outweigh the other characteristics. On the other hand, muni-fsps-spr17c has a very high gap representing the problem of the distance learning study. Each class is only once a week, resulting in a different timetable each week. Even though there are few students, they have many classes, and their conflicts are heavily weighted. In addition, there are many possible time placements for each class, resulting in many possible options for optimization. The instance muni-fspsx-fal17 combines features of the two earlier instances, covering both study forms. By their combination, the gap is significantly smaller than for muni-fsps-spr17c (21.42 % vs. 47.53 %), corresponding to a combination of an easier problem with a harder one. It indicates that the two problems only interact with each other a little.

For the Faculty of Education, there are three benchmark instances as well. The problem of the regular form of study muni-pdf-spr16 belongs to the ones with a smaller gap (20.05 %), corresponding to the values of the representative features, which do not show any critical issues. The gap is relatively high for the lifelong study instance muni-pdf-spr16c (50.38 %). A high discrepancy between the number of weeks and weeks per class corresponds to the need for lifelong study. The utilization is relatively high, and the average number of times of a class also allows many options for optimization. For the combined problem muni-pdfx-fal17, the gap belongs to the highest ones (64.34 %). The problem has 3,405 unfixed classes and many hard class pairs. The utilization is very high; there are many soft class pairs and special constraints, and the domains are also large (see the average possible class times). Finally, there is still a high discrepancy between the number of weeks and the average number of weeks per class.

6.2.2 Purdue University

The first instance, pu-llr-spr17, representing the large lecture room problem, is optimally solved. There are 683 unfixed classes. We have many students (27,018), but there are only 3.41 classes per student. Also, there are classes with regular patterns with 1.9 days per class.

Having pu-llr-spr17 solved to optimality was a bit of a surprise. The LLR problem at Purdue was the first problem UniTime was used for, and it was used in the original research more than two decades ago (Rudová and Murray, 2003). On the other hand, it is clear from practice that fewer available times make for easier timetabling and better timetables. More options do not make the problem easier to solve.

On the other hand, the instance pu-llr-spr17 is one of those problems that are harder to fit all the classes in, given the very high utilization of the largest rooms, and weaker on the optimization criteria, as it has a low number of classes per student and soft distribution pairs. As further discussed in Sect. 7, the instance is more suited for exact methods, such as the MIP solvers than the others.

On the contrary, the instance pu-d5-spr17 has a very high gap of 54.02 %. It looks for the placement of classes from five departments, so there are many students. The number of soft class pairs is much higher, there are more special constraints, and the regularity of the patterns is smaller.

The instance pu-d9-fal19 has a small gap of 16.77 %, while the number of unfixed classes and students is much higher. There are many soft class pairs and some special constraints. It is becoming more regular, with the average class times equal to 1.74.

Comparing the two instances, note that pu-d5-spr17 has five departmental problems to be timetabled on top of an LLR problem, whereas pu-d9-fal19 has eight departmental problems and LLR solved together. Also, pu-d5-spr17 has different weights than the other Purdue problems, see Table 5.

The last instance, pu-proj-fal19, with all Purdue University classes, has the largest number of classes, students, and a very high number of soft class pairs, resulting in a rather large gap.

6.2.3 AGH University of Science and Technology

The AGH University of Science and Technology from Poland has quite hard instances. They all have a high discrepancy between the number of weeks and weeks per class, resulting in different timetables for different weeks in the semester. The harder ones have additional special constraints. The instance agh-fal17 with the highest gap is the big problem with many classes, including many classes for each student. There are many hard class pairs, the largest number of soft class pairs, and special constraints.

6.2.4 Other universities

Other universities are represented mainly by two instances each. The benchmarks from Maryville University (mary) are easier to solve. It has a smaller number of classes, and there is a significant pattern of regularity (the average days per class are around 1.5). For the spring instance mary-spr17, there are many hard and soft class pairs but relatively few options for the time placement.

The instance yach-fal17 from Universidad Yachay Tech represents one of the smallest benchmarks. Still, the gap is very high. We can see the very high utilization and the high number of special constraints, especially concerning the size of the problem.

Both Turkish-German University (tg) instances are now optimally solved. We see mid-size problems with no explicit students and many hard constraint pairs representing them. There are (almost) no special constraints, and the utilization is low. Timetables are different each week since there are one or two weeks per class on average, but it is not an issue when combined with a small number of options for the time placement (many classes are restricted to a specific week).

One Lahore University of Management Sciences (lum) instance has a very high gap. However, it is essential to realize that the best solution has a cost of only 95, so it is not so significant in the absolute value. Both lum instances represent smaller problems with no explicit students and a few hard class pairs. No special constraints are present, and timetables have regular patterns with around 1.8 days per class.

Two İstanbul Kültür University (iku) instances have many unfixed classes. Again, there are students represented using hard class pairs only, but now there are significantly fewer than in the two earlier problems. There are no special constraints, few soft class pairs, and timetables are the same each week. The gap is minimal, even though the instances belong to the largest ones.

The Bethlehem University (bet) instances have a very high gap. We can see a high number of soft class pairs and special constraints, especially considering that they are not large instances. In addition, the utilization is relatively high, also. Finally, we need to consider higher weights of time preferences and distribution constraints, resulting in higher costs in the order of magnitude.

The last nbi-spr18 instance, from the University of Nairobi, has a medium size. There are 741 unfixed classes and 2293 students. We cannot see any specific demands in the considered features, and the problem is solved to optimality.

7 Results of competition

Solvers of various types have approached the competition instances. The first two winning results were computed by matheuristic solvers (Mikkelsen and Holm, 2022; Rappos et al., 2022), generally applying the fix-and-optimize approach (Lindahl et al., 2018). The hybrid constraint-based metaheuristic solver UniTime ITC 2019 can now compute the second-best results (Müller, 2022). Another metaheuristic solver (Sylejmani et al., 2022) applying simulated annealing (SA) (Kirkpatrick, 1984) provides the fourth current best results. The fifth best solver relies on reformulation to MaxSAT (Lemos et al., 2022). The last successful competition solver is described in the competition paper (Er-rhaimini, 2020) only. It is based on a hybrid search with features of beam search and hill climbing (Russell and Norvig, 2020).

Thanks to the open-source prize, the source codes of three solvers are now publicly available. See GitHub repositories for the SA solver,Footnote 6 the MaxSAT-based solver,Footnote 7 and UniTime ITC 2019 solver.Footnote 8

The authors of the winning solver also provide a reduction procedure for the ITC 2019 instances (Holm et al., 2022) since there are various redundancies in the data. These reduced instances are available from their website (DSUM, 2023), where lower bounds are also provided, demonstrating that five instances are now optimally solved.

The next part of this section describes the main ideas about solvers which are already published. In Sect. 7.2, we will compare solvers and their characteristics. Finally, Sect. 7.3 provides computational results.

7.1 Existing solvers

The winning matheuristic solver was submitted by Holm et al. (2020) and further refined by Mikkelsen et al. (2022). It relies on modeling using different graph structures in conflict graphs and a reduction algorithm, which removes redundancies in the input data by reducing fixed vertices and cliques in the graph. This graph-based formulation was published in Holm et al. (2022) where authors solved all instances with the exclusion of pu-proj-fal19. For this instance, the model did not fit into 256 GB of RAM. The work (Mikkelsen and Holm, 2022) applies the fix-and-optimize matheuristic (Lindahl et al., 2018) with adaptive updates of the neighborhood size to be fixed during the search. The parallel runs of a fix-and-optimize solver are processed under different initial solutions, with different neighborhoods, and sharing current best available solutions. Initial timetables can be generated without students, and the student enrollments are processed in a separate phase to handle problems with many students, such as the instance pu-proj-fal19. As an underlying solver, GurobiFootnote 9 was used.

The second matheuristic solver by Rappos et al. (2022) also applies reduction strategies in their mixed integer programming model (MIP). In the pre-processing phase, variables with a fixed value and always satisfied constraints are eliminated, and similar constraints are aggregated. The initial MIP solver run concentrates on finding a feasible solution. It starts with a subset of constraints first. Consequently, other constraints are iteratively injected, and several strategies are used to fix part of the variables. The next stage relies on fixing several variables to values from previous iterations, aiming to improve the objective function while kee** feasibility. The authors have only published the competition results. They solved all instances except for agh-fal17 due to the competition deadline. For the computation, the authors altered between two solvers, IBM CPLEXFootnote 10 and Gurobi, sometimes resulting in minor improvements.

The solver available from the competition organizer Tomáš Müller (2022) uses the UniTime solver adjusted to the ITC 2019 setting. The solver relies on the constraint-based model (Rudová et al., 2011) and consists of multiple phases to find a solution. Initially, a constructive student sectioning algorithm assigns students to classes to keep students with similar course demands together. It allows the solver to compute the number of students shared between pairs of classes. Consequently, the iterative forward search algorithm with conflict-based statistics constructs a feasible solution. Hill climbing is used to find a local optimum in the next phase. Further improvement in optimization results is processed by great deluge (Dueck, 1993), using bound restarts when a better solution cannot be found. Finally, student sectioning is improved by the great deluge with moves and swaps of single students between alternative classes of a single course.

Metaheuristics are represented by the simulated annealing solver by Sylejmani et al. (2022). The pre-processing phase precomputes possible class combinations for each course, and the worst penalties are calculated for the objective normalization. An initial solution with times and rooms for all classes set to 0, and a simple assignment of students to classes is computed in the first phase and used as input for SA. In SA, the penalty for the violated hard constraints, soft constraints, and current student conflicts is maintained separately. Initially, when hard constraints are violated, subtle changes in soft and student penalties are discouraged by the objective function. SA is processed with stochastic tunneling to normalize the evaluation function and restarts when no solution is found within several iterations. The search also focuses on constraints that are hard to satisfy during several restarts. A random walk search phase is entered to minimize their penalty if such a constraint exists.

Lemos et al. published an improved version of their competition solver in Lemos et al. (2022). The pre-processing phase includes several procedures. First, the independent sub-instances are identified. Next, students with the same course enrollment plan are merged, taking class capacities into account. Also, class domains are reduced, and redundant constraints are removed. The main phase consists of the separate course timetabling and student sectioning, completed using the MaxSAT solver TT-Open-WBO-Inc (Nadel, 2019). This decomposition reduces the size of the problem, which is critical for instances with a large number of students. Symmetry breaking is introduced for MaxBlock and MaxBreak constraints, which would otherwise make it impossible to solve even small instances. The iterative calls of the MaxSAT solver allow it to handle high memory demands of the exactly one constraints, which is used in the proposed MaxSAT encodings (e.g., in the iku instances). The final phase processes hill climbing to improve student sectioning by swap** and moving student clusters.

7.2 Common and different features of the solvers

When considering the characteristics of the existing solvers, they often rely on various forms of pre-processing to identify and remove redundancies in the data. As mentioned, the instances reduced by pre-processing are available at DSUM (2023). However, these may not be helpful for other solvers, such as the UniTime-based solver, which has no explicit pre-processing. There have been no measurable improvements for the UniTime-based solver when used on the reduced instances from DSUM (2023).

Given the complexity of the problems, all solvers process several search phases to decompose the problem. One of the essential components is the feasibility problem, which concentrates on the satisfaction of hard constraints in the problem. The separated or interleaved handling of course timetabling (assignment of times and rooms to classes) and student sectioning (assignment of students to classes) is often used, even though it may remove optimal solutions.

The difference between matheuristic solvers consists in their constraint handling. While Mikkelsen et al. (2022) rely on complex graph structures (Holm et al., 2022), Rappos et al. (2022) solve a relaxed version of the problem, adding new restrictions each time a found solution does not satisfy all hard constraints of the problem. It allows for a smaller model as the addition of constraints resulting in millions of equations is avoided. Different memory demands of both solvers also reflect it. While the winning solver needed 756 GB of RAM, Rappos et al. (2022) with the second best competition results used 32 GB of memory only.

As another consequence, Rappos et al. (2022) identified that their approach does not perform well for problems with a large number of students, and it is hard to obtain a feasible solution when a large number of distribution constraints are combined with a large number of classes.

Given the increasing demands of the matheuristic solvers with the increasing problem size, we can see that the UniTime-based solver, which relies on heuristic methods, performs better on the largest benchmarks (pu-proj-fal19 and agh-fal17).

The metaheuristic methods are naturally stronger in approaching the optimization than the satisfaction component. Based on the computational results (see next section), it is also reflected by the SA solver, whose performance is weaker when there is a high number of hard class pairs (or high utilization), showing a weaker performance in constraint satisfaction.

Lemos et al. (2022) stated that their solver performs worse for instances where student conflicts are strongly penalized (compare results and weights of student conflicts in Table 5, especially for mary, muni-fsps, muni-pdf). It is related to the fact that the significant improvement in the student sectioning is processed in the final phase.

7.3 Computational results

The final results of the five finalists are presented in Table 6, where the quality of the best-found solution is presented together with the achieved number of points using the Formula One ranking scheme.

Table 6 Results of the competition

Nowadays, most of the competition results are published in the papers. We provide more detailed results in Table 7. The column LB contains the lower bound provided by the winning solver (Mikkelsen and Holm, 2022). Note that the lower bound available from the paper was updated by the authors on their website (DSUM, 2023). We can see that five instances are now optimally solved (costs written in bold). The column Gap contains the gap concerning the best available result. For each publication, we provide its best available results (Now), the competition results (columns Comp.), and the results provided with the specified runtime (Time) to allow a comparison from the limited runtime point of view.

The computational setup, as well as the runtimes of solvers, differs a lot. Mikkelsen et al. (2022) used a farm of high-end server computers, each equipped with up to 756 GB of RAM, two CPUs (each having 16 cores), and a Gurobi MIP solver. For the last ten competition days and the runtime-limited setup, they run the parallelized setup on each instance once for the entire ten days, providing results after 24 h as well (24-hour results are available in the Time column). Their setup consisted of a solver running a full MIP model focusing on the bound using 16 threads and six fix-and-optimize processes, each using a different configuration and four threads. Two additional Gurobi MIP solvers were used to produce a pool of initial solutions, each using four threads. Rappos et al. (2022) solved instances on virtual machines with 32 GB RAM and 4 CPU cores. They provide the competition results only, and the average runtime among all instances was 83.3 h. Müller (2022) used a computer with a single CPU and 10 cores with 64 GB memory. However, each solver run was limited to a single CPU core, 4 GB of RAM, and 2 h of runtime. The average results from 10 independent runs are provided in column Time. Sylejmani et al. (2022) used 2 CPUs, each with 16 cores and 96 GB of RAM. The average results of 10 runs, each taking 24 h, are provided in column Time. Finally, Lemos et al. (2022) performed experiments on a computer with 128  GB of RAM using a single CPU core. They provide only the best result among 24 runs (8 different algorithmic configurations, three different encodings), each taking 100 min, which are available in column Time.

Table 7 The detailed computational results: Lower Bound from DSUM (2023); Mikkelsen and Holm (2022), Gap wrt. the best result, current results (columns Now), competition results (columns Comp.), results with specified runtime (columns Time), for Mikkelsen and Holm (2022) and Sylejmani et al. (2022) within 24 hours, for Müller (2022) within 2 hours, for Lemos et al. (2022) within 100 min; Rappos et al. (2022) provides competition results only

Mikkelsen and Holm (2022) improved their competition results for 21 instances. There are no competition results for Müller (2022) since he was not allowed to participate in the competition as its organizer. Sylejmani et al. (2022) were able to improve results for three instances only. Rappos et al. (2022) could not solve one instance (agh-fal17), and the results were not further improved. The most significant difference has been achieved by Lemos et al. (2022), who can now solve all instances compared to 18 instances for the competition. The progress is also clear based on the first (full) paper published at PATAT 2022 (Lemos et al., 2021) and the journal publication (Lemos et al., 2022), where they have concentrated on the iterative MaxSAT solver processing.

When comparing the current best results with the competition results, the competitors’ order differs by the better position of the MaxSAT-based solver. It now only slightly loses against the SA solver. The UniTime-based solver became the second-best solver. Regarding computational demands, the UniTime-based solver is superior, providing results within 2 h and using 4 GB of memory per solver run. All other solvers needed much longer runtime and memory.Footnote 11

We also provide additional graphs comparing runs with the specified runtime (see Fig. 3). The comparison must be taken carefully since we can see results from different setups. Still, it is a valuable comparison providing results for the limited computational resources. It is clear that the SA solver has significant limits in a shorter time frame; there are eight unsolved instances. We can also see that the results of the MaxSAT-based solver are sometimes relatively weak, especially for instances with a strong emphasis on student conflicts. For (Rappos et al., 2022), the results for only five instances were computed within less than 24 h, even though some lower-quality results with shorter runtimes could have probably been provided for more instances. It is good to see that the winning matheuristic solver can provide reasonable results within 24 h. Its results are still better than the UniTime-based solver, now in 21 instances (for the best result, Mikkelsen et al. are better in 24 instances), but kee** in mind that the matheuristic solver needs a longer runtime, more memory, and more CPU cores.

8 Conclusion

The International Timetabling Competition 2019 was aimed at solving common university course timetabling problems from practice. A wide set of features was considered. The key novelty lies in the combination of student sectioning, together with standard time and room assignment of events in courses. As a part of the competition, we were able to collect an interesting set of data which has enriched further research.

While the competition is long over, ITC 2019 competition problems are still very much alive. The number of people registered on the competition website increased from 200 to 490 since the competition ended in January 2020. Before the competition deadline, 6 teams submitted their solutions. Some results are now published by 20 teams on the competition website, and we believe that these numbers will increase much more given the continuing interest of the community. Also, 3 finalists, including the winning team, published better results than they had in the competition. 25 out of 30 benchmark instances wait for their optimal solutions.

Fig. 3
figure 3

Comparison of the results with the specified runtime: Mik Time (Mikkelsen and Holm, 2022) within 24 h (one run), Mül Time (Müller, 2022) within 2 h (average run), Rap Comp. (Rappos et al., 2022) with an average competition time 83.3 h, Syl Time (Sylejmani et al., 2022) within 24 h (average run), and Lem Time (Lemos et al., 2022) within 100 min (the best run)

It is great to see the development of educational timetabling studied with the help of competitions supported by the PATAT conference. In 2002, the First International Timetabling Competition considered very basic timetabling problems generated by the computer. Over the years, new competitions came with more and more realistic timetabling problems, with 2011 introducing real-life high school timetabling problems. We are proud to allow for advancements in university course timetabling, providing a diverse set of real-world problems with complex characteristics.