Keywords

1 Introduction

Today’s unpredictable business environment requires organizations to be flexible and adaptable, especially in the IT sector [34]. As a result, many agile methodologies, e.g., Scrum, emerged [9]. The potential benefits of agile methods regarding faster delivery and customer satisfaction led to their wide adoption [7]. While agile practices proved successful in contexts characterized by small, co-located teams [8], limited system criticality, new developments, and frequent releases [23], applying them in other contexts without or only little adaption increases the risk of failure [17]. Still, the success on a small scale inspired many organizations to apply agile methodologies outside of their initially intended context [7, 8]. One example is Enterprise Resource Planning (ERP) rollout projects [16, 20, 35]. ERP systems are large-scale integrated systems covering most of a company’s business processes [14]. In contrast to classical development projects, off-the-shelf ERP solutions by providers are rolled out and adapted to customer needs [14]. Those projects are often large [16] and involve high risk and costs [14, 29].

One success factor in satisfying customers and delivering within plan and budget is accurate predictions like effort estimations [4, 15, 20]. However, estimating effort in an agile environment can be difficult [14]. Common techniques, like expert judgment and planning poker, are based on experts’ opinions [28] and, thus, error-prone due to, e.g., human bias [15, 30]. Also, constantly changing requirements complicate estimating accurately [27]. In scaled agile settings, estimating gets even more complex since, for example, coordination and dependencies of multiple teams [3, 33] and the distribution of teams (e.g., increased communication effort) [17, 32, 33] become relevant. While effort estimation is well-researched in classical software development and small-scale agile contexts, little research has investigated this topic in large-scale agile settings [33]. In particular, how effort estimation is conducted in this context, potential challenges that can arise, and how they could be addressed have been barely investigated. Furthermore, settings that do not develop a new product but roll out and customize an existing ERP solution have yet to be considered. Thus, we conducted a case study investigating how effort is estimated within a large agile ERP transformation program. We describe the estimation process, present faced challenges, and make propositions to mitigate them. We defined the following research questions (RQs) to guide this study:

RQ1: How is effort estimation performed in a large-scale agile ERP program?

RQ2: What effort estimation challenges exist in a large-scale agile ERP program?

RQ3: How can the effort estimation challenges of such a program be addressed?

2 Background and Related Work

While ERP projects are classically organized with traditional approaches [14], the use of agile methods increases [16, 20, 35]. The differences between agile and traditional practices require new approaches to effort estimation [5]. Aligned to the incremental development in iterations, planning and effort estimation are done progressively and iteratively [5]. According to several studies investigating effort estimation in agile contexts [12, 28], expert judgment, analogies, and planning poker are widely used techniques. The most commonly used unit for estimating is story points [12, 27, 31]. Next to studies investigating how effort estimation is conducted in agile software development, authors have studied existing challenges [13, 21]. Those challenges include, for example, large project sizes and different understandings of requirements. In addition, Mallidi and Sharma [21] identified mitigation propositions to challenges in story point estimations, such as support by tools and Scrum Masters. Zia et al. [37] propose a framework to overcome challenges, and Tanveer et al. [30], who investigated multiple agile teams, focused on improving effort estimation, e.g., by tracking estimates.

Scaling agile practices adds complexity to effort estimation as, for instance, coordination and dependencies of multiple teams must be considered [3, 33]. Several researchers have studied planning and effort estimation in large-scale and global agile software development to gain more insights. Usman and Britto [32] compared effort estimation in co-located and distributed agile software development. In both cases, expert judgment and story points are used most frequently, and effort is estimated mainly on iteration and release planning levels. The authors identified the distribution of teams as a cost driver, e.g., due to the required communication effort. Evbota et al. [11] investigated planning in a large-scale agile organization and identified challenges related to doing long-term estimates, requirement size, unclear requirements, and team commitment. Moreover, inefficient estimation events and unclear requirements were issues. Usman et al. [33] investigated effort estimation in a large-scale agile project. The authors found that the estimation is carried out in a two-stage process using expert judgment, estimating requirements more granularly in the second stage. Furthermore, the authors identified that the distribution of development teams and the requirement size and scope influence the estimation accuracy. Bick et al. [3] studied coordination challenges and misaligned planning in a development unit combining agile and traditional approaches. The effort is estimated on two levels: rough estimates on the inter-team level and highly granular estimates on the team level, which are adjusted if required. Core issues were a lack of dependency awareness, in-transparent estimates, and misalignment between top-down and bottom-up estimations. The authors propose, e.g., to hold cross-team planning workshops or unit-wide retrospectives. Finally, Kula et al. [15] investigated factors affecting on-time delivery in a scaled agile setting and highlighted, e.g., task and technical dependencies.

As presented, effort estimation in large-scale agile contexts was the subject of several studies. While Usman and Britto [32] focus on the impact of having distributed teams, other authors investigated effort estimation as part of related topics like coordination [3], planning [11] or on-time delivery [15]. Only Usman et al. [33] place a primary focus on effort estimation in large agile environments. Still, the authors [33] primarily investigate factors influencing the estimation accuracy instead of considering challenges in the effort estimation process and how practitioners can address those. Overall, empirical research providing detailed insights into estimation in scaled agile settings is still little [33], particularly on the potential challenges and related mitigations. Also, effort estimation in the context of large agile ERP projects, which, in contrast to the settings investigated so far [3, 11, 15, 32, 33], do not develop or maintain a product but roll out and customize an existing solution, has yet to be investigated.

3 Methodology

We conducted a holistic single-case study [36] and analyzed the collected data to answer our RQs. We chose this research methodology as case studies are a means to explore a “contemporary phenomenon within its real-life context” [36], like effort estimation in a scaled agile ERP roll-out program. To ensure a rigorous research design, we followed the guidelines of Runeson and Höst [25]. The case selection was intentional and aimed to identify a case typical for a large-scale agile ERP roll-out. The case program applies agile methods at scale as multiple agile teams work together on customizing and develo** new functionalities for the ERP solution [10]. We conducted 16 interviews with 20 experts with various roles from all companies collaborating in the program (see Table 1).

Table 1. Interview partners

We conducted the case study between November and December 2022, mainly via semi-structured interviews. At the beginning of each interview, we ensured a shared understanding of relevant concepts (e.g., large-scale agile software development). Two researchers participated in each interview to enhance observer triangulation [25]. All interviews followed the same outline. First, we asked questions regarding the interviewees’ experience and role within the program. Second, we asked questions exploring the program’s effort estimation process. Third, we asked about the challenges experienced related to effort estimation within the program. The questions of the last two sections were open, allowing the interviewees to go into detail. We conducted all interviews using videoconferencing tools, recorded, and transcribed them. Next to the interviews, we included documents like presentations to facilitate the triangulation of data sources.

The collected data was analyzed and coded following the guidelines of Miles et al. [22] and Saldaña [26], applying a two-cycle approach and a combination of deductive and inductive coding. We resolved conflicts through discussion and mutual consent to increase the results’ validity. In case of uncertainties regarding the data interpretation, we contacted the interviewees to resolve ambiguities.

4 Results

In the following, we present the results of our case study.

4.1 Context

The case study was conducted within a large transformation program at a German energy company (EnergyCo). After a merger, this program aims to standardize the ERP systems of the affected organizations by introducing a standard ERP cloud solution of a German software provider (SoftwareCo). Next to providing the ERP solution, SoftwareCo actively supports the program’s leadership and implementation. In addition, a consulting company (ConsultCo) supports the program leadership. In total, 350 people are involved. The program applies an approach designed by SoftwareCo, supporting customers during the Preparation, Realization, Deployment, and Roll-out of the ERP solution. We focused on the program’s Preparation and Realization Phase, in which agile practices are applied. The program’s overall transformation time scope is five years (E8), including roll-outs in several hundred sub-companies of EnergyCo, each planned for 15 months. The program performs multiple roll-outs in parallel (E8, E9).

The program consists of the Program Leadership, Scrum teams grouped into Workstreams, and Integration and Technology teams. The Program Leadership is responsible for the program organization and coordination, progress monitoring, and steering. Implementation and customization tasks are performed by 20 Scrum teams, grouped into ten different Workstreams, each responsible for a business area (e.g., Reporting). Each Workstream has a designated Product Manager, acting as the link to the Program Leadership, providing assistance, tracking progress, and managing dependencies to other Workstreams. Each Scrum team consists of seven to twelve members: a Product Owner (PO), defining and prioritizing the Backlog and accepting work done, a Solution Architect, two to five team members responsible for technical design and implementation, and two to five team members responsible for documentation, functional specifications, and testing. A Scrum Master is coaching and supporting the team. Multiple cross-functional Integration and Technology teams support the Scrum teams in topics like training, change, or identity management.

4.2 Effort Estimation Process

This section describes the case program’s effort estimation process (see Fig. 1) to answer our first RQ.

The Preparation Phase. This phase is performed at the end of a year to prepare for the Realization Phase in the upcoming year (E6, E9, E10, E16). Next to organizational preparations like staffing, the sub-companies, whose transformation is planned for the next year, gather Requirements, reflecting their custom needs for the ERP system based on the standard functionalities (E2, E7, E9, E10, E16, E18). The identified Requirements are then classified (E18), depending on whether they are part of the standard solution, require customization, must be developed (“gaps”), or are non-functional, are approved, and prioritized. According to E13, this classification influences the effort required as “gaps [...] normally are related to real developments and not only to customizing or changing settings on the systems. So it’s more to do for these objects and more to test and also to document.” An initial backlog is created (E16), and its items are assigned to the Workstreams. This backlog forms the basis for the following year (E6, E9, E10, E16) and the initial effort estimations in the Roadmap Planning (E5, E9, E19).

Roadmap Planning. This event happens once for each rollout sub-company (E1, E4, E5, E10, E11, E19). The effort for the prioritized Requirements from the initial backlog is estimated in person days (E1, E4, E13, E20). The persons involved in the planning event discuss the Requirements and agree on a rough estimate (E1, E2, E4, E6–8, E13, E16) based on experience and gut feeling. To simplify the estimation process, only a few people participate (E14), usually including the Product Manager(s), PO(s), Solution Architect(s), and some IT members of the responsible Workstream(s). The resulting Product Backlog, with prioritized and estimated Requirements, is the basis for the Realization Phase.

Fig. 1.
figure 1

Estimation process at the case program

The Realization Phase. This phase lasts nine months, split into three so-called Waves (E8). A Wave consists of three regular Sprints, in which Requirements are implemented, and a fourth one to test them and plan the next Wave (E11, E13). The effort is estimated in the Wave Planning, Sprint Planning, and Product Backlog Refinement by the Scrum teams (E1–20). POs are not actively estimating in these events but prioritize Backlog items, moderate the estimation, and support from a business perspective (E5–7, E9, E11, E15, E18). Estimations are done in normalized story points. One story point equals one person day (eight hours), and a “full” team member implementing Requirements has a capacity of 13 story points per Sprint. All teams use expert judgment to estimate.

Wave Planning. The first Wave Planning takes place before the Realization Phase officially begins. The second and third Wave Planning occur in the last Sprint of the first and second Wave. The basis for the first Wave Planning is the roughly estimated and prioritized Requirements in the Product Backlog. Each Requirement is broken down into n Work Packages (E1–20), functional subsets a team can implement in one Wave (E4), and estimated by the team responsible for it. Each Requirement’s initial estimate is divided among the n Work Packages, based on their complexity, risk, and effort, by comparing them (E2, E5, E7). The teams adjust the Work Package estimates if the Requirement’s estimate is too high or too low (E7, E14). Then, the Work Packages are allocated to the three Waves and ranked based on their priority in the so-called Roadmap Plan. Each Work Package must have a brief description, an effort estimation, and a dependency overview. In the second and third Wave Planning, the Roadmap Plan is refined. During each Wave Planning, the Work Packages of the upcoming Wave are further broken down into Work Items, complete and working functional subsets that a team can implement in one Sprint (E2, E4, E11). This breakdown into n Work Items and their estimation is performed the same way as the breakdown of Requirements (E1, E5, E7). The Work Item estimates, adjusted if the Work Package’s effort was under- or overestimated, are the most accurate. The Work Items are then ranked based on priority and allocated to the three Sprints within the Wave, the so-called Wave Plan (E1, E2, E5, E7). In the following Sprint Plannings, the Wave Plan is refined. Depending on the Sprint allocation, each Work Item must have an initial effort estimation, dependency overview, and a brief or functional description. Each Wave Planning results in an updated Product Backlog, an updated Roadmap Plan, and a Wave Plan for the upcoming Wave. After each Wave Planning, each Scrum team’s PO presents their planned Wave to all other POs to discuss dependencies and potential changes.

Sprint Planning. During this event, at the beginning of each Sprint, the whole Scrum team discusses and (re-)plans the Sprint (E1–20). The Work Items defined during the Wave Planning serve as a starting point, are discussed in detail, and then refined (e.g., ranking, allocated Sprint, estimation, description). The result of each Sprint Planning is an updated Sprint Backlog and Wave Plan.

Product Backlog Refinement. At least one six-hour refinement meeting is held in each Sprint. The teams adjust the estimations, prioritization, and breakdown into Work Packages or Items. Further, the teams discuss and estimate new Requirements (E3, E4, E7, E9, E12, E15, E19). This event ensures quick reaction to changes, that dependencies and wrong estimates are addressed, and that the Product Backlog is up-to-date (E3, E4, E7, E9, E12, E15, E19). Some teams have multiple, shorter refinements (E2, E11) or use their daily meetings (E20).

Tool Support. The primary tool used in the context of effort estimation within the program is designed by SoftwareCo and used as “single source of truth” (E2, E3, E5, E7, E9, E13, E15), e.g., for estimate and backlog documentation. Based on this tool, ConsultCo built an Excel reporting dashboard to visualize metrics and program progress (E1, E8, E14, E17). Excel is also used to document Requirements, including their estimation (E5–7, E9, E11, E16, E19, E20).

Table 2. Effort estimation challenges

4.3 Effort Estimation Challenges

To answer our second RQ, we investigated the effort estimation challenges in the case program. We identified 14 challenges (C1–C14), which at least three interviewees mentioned [6] (see Table 2). This limitation is intended to ensure the criticality of the found challenges.

We grouped the challenges into four categories: Program setting, Collaboration, Expertise, and Information deficit. Most challenges are related to the Program setting and Collaboration. The most frequently mentioned challenge is Unclear and incomplete requirement specifications (C13), which do not provide sufficient information about what has to be implemented and estimated and, thus, hinder accurate estimations of Work Packages and Items. E15 illustrates the challenge with an example: “We are just handed over some sentences, [...] as an example, we need to have that button in blue. But we do not know where does that button now need to be positioned? What functionality does that button have?”

4.4 Propositions to Mitigate Effort Estimation Challenges

To answer our third RQ, we reviewed academic literature and our interview data to identify mitigation propositions addressing the presented effort estimation challenges. In total, we found 19 propositions (M1–19), presented in Table 3.

Table 3. Mitigation propositions and addressed challenges

4.5 Evaluation of the Proposed Mitigations

Our evaluation aimed to assess to which degree practitioners agree with our propositions and collect qualitative insights into their opinions. Twelve experts who participated in our initial interviews evaluated our proposed mitigations in three semi-structured interviews and via a survey. We asked the participants to which degree they agreed with each proposition using a five-point Likert scale [19] and to provide qualitative feedback. We coded the qualitative data consistent with the coding in our case study. Figure 2 illustrates the evaluation results.

Fig. 2.
figure 2

Evaluation results of the mitigation propositions

The experts agreed with most propositions (M1, M3–15, M19). M13 received the highest agreement, followed by M15. The evaluation results of M1 and M2 diverged. Despite most experts agreeing with Adding a buffer in case of uncertainty (M1), some disagreed, expressing concerns, e.g., regarding their lack of traceability. Also, the opinions regarding a Pre-implementation phase to check requirements in detail (M2) diverged. While most experts think such a phase is valuable, several disagree. One expert claims that “if you spend too much time over talking about design and not going forward, then you can also lose a lot of time.” M18, the use of T-shirt sizes as an estimation unit, received the lowest agreement. While some respondents see T-shirt sizes as useful, e.g., pointing out the potential to simplify the estimation process for team members and the potential usefulness in an early phase, i.e., for higher-level estimates, others have a different opinion. These experts criticise, e.g., the imprecise nature and the additional work required to relate the estimates to budget resources.

5 Discussion

Next, we answer our RQs by discussing our study’s key findings and limitations.

5.1 Key Findings

To shed light on how scaled-agile ERP programs estimate effort (RQ1), we investigated the effort estimation process within our case program. The program estimates effort at three levels with increasing granularity and decreasing time scope. A selected group performs the initial, rough estimation of high-level requirements, building the basis for the roadmap and following estimations. The Scrum teams are responsible for the more accurate estimations of the more granular functional sub-sets of the requirements, with medium to short-term time scopes. Also, Usman et al. [33] and Bick et al. [3] report on estimation processes in scaled agile settings with two phases building on each other. Large requirements are estimated roughly by a selected group [3], and teams estimate highly granular requirements more accurately during Sprint Plannings [3, 33]. Our case program estimates low-level requirements iteratively and refines estimates regularly. Likewise, Bick et al. [3] found that the teams adjust estimates if required. Compared to classical ERP roll-outs [14], our case program estimates effort primarily in story points, typical for agile contexts [12, 21, 27]. Like in other small [27] and large agile settings [32, 33], expert judgment is used to estimate. Overall, as described by Bick et al. [3], our case program’s effort estimation approach combines long-term with agile aspects.

To answer RQ2, which effort estimation challenges exist in a large agile ERP program, we investigated the challenges our case program had to deal with concerning effort estimation. Our case program struggled with several challenges (C1, C2, C5, C6, C8, C9, C11-14), which were already reported as challenging or as factors that negatively influence estimation accuracy in other agile (e.g., [21, 27, 30, 31]) and large-scale agile settings [3, 11, 15, 32, 33]. In line with this finding, Sandeep et al. [27] confirm that effort estimation in agile settings is challenging and Usman et al. [33] found that project scale complicates it further. The main challenge in our case program is unclear or incomplete requirement specifications (C13), hindering accurate estimation of the required effort, which multiple authors report [11, 27, 31, 33]. In addition, we identified challenges, which, to the best of our knowledge, are barely or only partly reflected in existing studies of (scaled) agile settings (C3, C4, C7, C10). For example, we found a lack of knowledge and experience in estimating (C10) to be a challenge. Tanveer et al. [30], e.g., only report estimation experience being considered when estimating, Mallidi and Sharma [21] highlight its relevance, and Evbota et al. [11] mention that the practitioners in their scaled agile study had lacking expertise in estimating.

For answering our third RQ, we reviewed related literature and our interviewee data to identify mitigation propositions to address the found effort estimation challenges. Many of our propositions perceived as valuable align with agile values and principles [2] as they foster communication and collaboration (e.g., M4), transparency (e.g., M14), and continuous improvement and learning (M6). In general, open and early communication [35] and continuous feedback [20] are success factors for agile ERP projects. Having exchange opportunities (e.g., M4 & M5) and sufficient tool support (M15) can increase time efficiency (C2), help identify dependencies (C8), counteract the restrictive program setting (C1), and, ultimately, increase estimation accuracy. These insights align with other researchers’ findings that tools can positively influence estimation accuracy [30] and that cross-level exchange is helpful regarding dependencies [3] in scaled agile settings. Our results show that, in particular, Scrum Masters and Agile Coaches can support the estimation process by assisting and motivating teams (M13). These findings reinforce Mallidi and Sharma [21], who highly recommend involving Scrum Masters, and Usman et al. [31], who claim that lacking the guidance of Scrum Masters can affect estimation accuracy negatively.

5.2 Limitations

We applied the assessment scheme of Runeson and Höst [25] to address the following potential validity threats. To mitigate threats to construct validity, we collected data from multiple sources (e.g., documents next to the interviews) to achieve data triangulation, gathered insights from interviewees with diverse roles and experiences, and clarified any ambiguity with the interviewees. Data triangulation also helped to address possible threats to internal validity. To mitigate threats to reliability, three researchers designed the interview questions to minimize reliance on any individual. Moreover, we made the data analysis and coding transparent by conducting them aligned to guidelines [22, 26] and describing them. In regard to external validity, our findings are specific to the case program. However, the comprehensive description of our case program and the estimation process allows for an understanding of how, e.g., the identified challenges, relate to this context and may also be relevant in other programs.

6 Conclusion and Future Work

Our research was motivated by a need for more empirical studies on effort estimation in scaled agile contexts, highlighting existing challenges and approaches to mitigate those. In particular, settings that do not develop a new product but roll out and customize a standard ERP solution have yet to be considered. To make a first step towards filling this gap, we conducted a holistic single-case study at a large agile ERP transformation program to identify how effort is estimated in such programs. We presented identified challenges, proposed how to mitigate them, and evaluated those propositions. The case program combines long-term with agile aspects. The effort estimation takes place on different granularity levels, starting with high-level requirements on the roadmap level, broken down into functional subsets on a medium-time scope and Sprint level. A requirement’s initial estimate is the basis for the estimations on a more granular level. Adjustments are made in case of over- or underestimations. Overall, the estimation accuracy increases with each breakdown step. The main estimation unit is story points but normalized to person days. While the requirements are initially estimated by a selected group, Scrum teams do the subsequent estimations. We found multiple effort estimation challenges common for agile and large-scale agile settings. The most mentioned effort estimation challenge is unclear and incomplete requirement specifications (C13). Overall, the evaluation participants agreed with most of our mitigation propositions. In particular, support from Scrum Masters and Agile Coaches (M13) was seen as valuable in counteracting existing challenges. Future research could validate our findings and test their applicability in other large agile ERP projects, scaled agile projects develo** new products, or other agile organizations. Additional evaluations with practitioners could confirm the identified challenges. The reported mitigation propositions could be applied in practice to confirm their effectiveness.