Keywords

1 Introduction

The importance of FLOSS is increasing day by day, not only for personal use but also for enterprises to incorporate it into parts of their products. With the increase in the number of FLOSS users, use cases have also been expanding. As a result, lots of bugs are being reported to FLOSS projects. In the field of software engineering, many methods have been proposed to help FLOSS developers predict faults in modules, localize and repair faults in source code, and so on. However, these methods are too diverse for FLOSS developers to follow and effectively adapt to their projects as needed. Although several studies [28] provided a systematic review to mitigate the difficulty in understanding existing methods, they are neither beyond grou** existing studies nor necessarily in line with FLOSS developers’ needs for support. Ideally speaking, we should try to thoroughly understand which bugs cause the most trouble for FLOSS developers and propose solutions to fix the bugs effectively and efficiently. In this study, we try to reveal the gap between the way that FLOSS developers and researchers perceive bug finding and fixing, through interviews with 322 notable GitHub developers. In the interviews, we do not directly ask the developers about what troubleshooting tools are highly in-demand, but instead we ask them about what kinds of bugs causes them severe troubles, in order to enable us to precisely understand the problems they face. After the interviews, we discuss what kinds of high impact bugs are able to be solved or not by existing studies, in order to provide researchers with insights to come up with new solutions. This work is separate from general bug categorizations, and to the best of our knowledge, our work is the first bug categorization focusing on only high impact bugs. In this paper, we address the following research questions;

  • RQ1: What kinds of high impact bugs are mainly considered high impact by FLOSS developers?

  • RQ2: What kinds of high impact bugs do FLOSS developers encounter most frequently?

  • RQ3: What kinds of high impact bugs should be studied newly by the SE community in order to support FLOSS developers?

Our contributions to the study are as follows;

  • We revealed symptoms and causes of bugs considered high impact by FLOSS developers, through interviews and manual inspections of bug reports.

  • For FLOSS developers, we identified which kinds of high impact bugs are supported by previous studies.

  • For researchers, we showed the area where FLOSS developers’ needs are still not fulfilled.

2 High Impact Bugs

Over the past two decades, the SE community have dedicated considerable efforts to help software developers to predict faults in modules, localize and repair faults in source code, and so on. Although the existing, traditional studies had not thoroughly considered the characteristics nor the impacts of bugs, in recent years they began to tackle with the impacts of bugs on users and the development process. In what follows, we introduce some existing studies on finding and fixing high impact bugs.

A Security bug [8] can cause serious problems which often impacts on uses of software products directly. Since Internet devices (e.g., smartphones) usage is increasing every year, security issues of software products are of interest to many people. In general, security bugs are fixed as soon as possible.

A Performance bug [22] is defined as “programming errors that causes significant performance degradation.” The performance degradation contains poor user experience, laggy application responsiveness, lower system throughput, and waste computational resources [18]. [22] showed that a performance bug needs more time to be fixed compared with a non-performance bug.

A Breakage bug [30] is a type of functional bug which is introduced into a product when the source code is modified to add new features or to fix existing bugs. Though it is well-known as regression, a breakage bug mainly focuses on regression in functionalities. A breakage bug causes problems which make available functions in one version unusable after releasing newer versions.

A Blocking bug [7] is a bug that prevents other bugs from being fixed. It is often caused due to a dependency relationship among software components. Since a blocking bug inhibit developers from fixing other dependent bugs, it can highly impact on developers’ task scheduling since a blocking bug takes more time to be fixed [7] (i.e. a developer would need more time to fix a blocking bug and other developers need to wait for being fixed to resolve the dependent bugs).

A Surprise bug [30] is a new concept of software bugs. It can disturb the workflow and/or task scheduling of developers since it appears at unexpected times (e.g., bugs detected in post-release) and locations (e.g., bugs found in files are rarely changed in pre-release). As a result of a case study of a proprietary, telephony system which has been developed for 30 years, [30] showed that the co-changed files and the amount of time between the latest pre-release date and the release date can be good indicators of predicting surprise bugs.

A Dormant bug [4] is also a new concept on software bugs and is defined as “a bug that was introduced in one version (e.g., Version 1.1) of a system, yet it is not reported until after the next immediate version (i.e., a bug is reported against Version 1.2 or later).” [4] showed that 33% of the reported bugs in Apache Software Foundation projects were dormant bugs and were fixed faster than non-dormant bugs.

3 Study Design

3.1 Overview

In this study, we e-mail and ask notable developers in GitHub [9] to answer a questionnaire about high impact bugs. After aggregating collected responses, we show the developers’ demographic information (Q1), and the distribution of the bugs that are considered high impact (Q2-1). As the questionnaire includes an open question (Q2-2) to tell us actual bug reports which caused troubles in the past, we manually inspect the bug reports and categorize them by symptoms.

3.2 Participant Selection

In order to select notable developers to invite to our interview in this study, we use contribution which represents the amount of the developer’s commit activity to GitHub repositories and can be calculated with GitHub API [10]. First, we make a list of all repositories in GitHub and calculate the total number of contributions for each repository. Note that we only calculate contributions for the most committed repositories if the repositories have the same name, since forked repositories partly (sometimes largely) include the same commits from original repositories and we need to avoid multiple counts for the same contributions by the same developer. Next, the total contributions of each developer is calculated based on the selected repositories above, and we choose candidates who mark over 100 contributions. Finally, we sent e-mails to 22,228 candidate developers to ask them to participate in our interview.

3.3 Questionnaire

We prepared Google Forms for our interview which consisted of three questions to know the developers demography (Q1), one closed question to reveal a distribution of high impact bugs considered important by developers (Q2-1), and one open question to collect and further analyze actual reports on high impact bugs (Q2-2). The questionnaire has six more questions, but in this paper, we only focus on the five questions above due to the space limitations.

[Q1. Profile]

  • Q1-1 Your main project

  • Q1-2 Your experience with FLOSS development

  • Q1-3 Your motivation to participate in FLOSS development

[Q2. High impact bugs]

  • Q2-1 What kind of bug would be much more important to be fixed?

    • a bug threatening systems’ security (Security bug)

    • a bug deteriorating system’s performance (Performance bug)

    • a bug blocking other bug fixes (Blocking Bug)

    • a bug found in unexpected timing and location (Surprise bug)

    • a bug introduced in older releases and found in a newer releases (Dormant bug)

    • a bug introduced in a newer release and breaking functions of older releases (Breakage bug)

    • others [free text]

  • Q2-2 Please tell us high impact bug(s) you encountered in the past.

3.4 Categorization of Bug Symptoms

Based on the responses of Q2-2, we collect actual bug reports from developers’ projects and confirm the symptoms of the bugs, in order to discuss what techniques have been already proposed or that will be required to find and fix those high impact bugs. The first and second authors independently and manually inspect symptoms of actual high impact bugs and classify them by the card sort technique [23]. After the independent classification, the two authors discuss each classification result together and merge the results by mutual consent to make one classification. Here, the inspectors include one Ph.D. student (first author), who worked at a software company for two years as a full-time developer, and one professor who has been studying FLOSS development over ten years.

4 Interview and Classification Results

4.1 Developer Demography (Q1)

As we described earlier, we invited 22,228 developers to join our interview. During the two weeks interview period, we got responses from 322 developers. Table 1 shows the product domains where the developers participated. We can see “web application” is the most popular domain (7%) but it does not stand out from the others. The product domains spread across a wide area. We can assume that the results of our interviews reflect a wide range of situations across FLOSS development. Table 2 shows the developers’ experience with FLOSS development. The majority of the developers have over five years experiences. It is no surprise because we only invite active developers who have made at least over 100 commits to GitHub repositories. Table 3 shows developers’ motivations to FLOSS development. 59% (126 + 64) of the developers contribute to FLOSS projects as part of work.

Table 1. Product domains where GitHub developers join (Q1-1)
Table 2. Experience with FLOSS development (Q1-2)
Table 3. Motivation to participate in FLOSS development (Q1-3)
Table 4. A distribution of high impact bugs in Q2-1

4.2 RQ1: What Kinds of High Impact Bugs Are Mainly Considered High Impact by FLOSS Developers?

In Q2-1, we asked the developers to select one of the six kinds of high impact bugs which are introduced in the previous section and have been well studied in the SE community. Table 4 shows the responses from the developers. We can see the FLOSS developers from GitHub attach greater importance on security bugs (53%) and breakage bugs (22%). It was unexpected for us that the other four bugs attract less attention from the FLOSS developers. It partly indicates the perception of gaps between researchers and FLOSS developers. Some researchers in the SE community might misunderstand FLOSS developers’ actual needs.

Researchers in the SE community have been studying to help developers find and fix bugs especially in terms of impacts on users’ satisfaction and during the development process (release) [13]. Although high impact bugs have been studied individually so far, it was not clear if FLOSS developers are mostly concerned with particular high impact bugs. From the result of Q2-1, we now answer RQ1 as follows.

figure a
Table 5. Symptoms described in actual bug reports

4.3 RQ2: What Kinds of High Impact Bugs Do FLOSS Developers Encounter Most Frequently?

In Q2-2, we asked the developers to describe the high impact bugs they have encountered in the past. Many of the developers described characteristics of high impact bugs in the free text format and also gave us direct links to actual bug reports which present symptoms discussed among developers and users.

Table 5 shows symptoms of bugs considered high impact by the respondents. In the table, we count multiple times if a developer described several high impact bugs. The percentage in the table is the ratio of developers’ answers in each category, but the total percentage does not become 100% due to the above reason. As we described earlier, we manually inspected and categorized the information on high impact bugs by symptom. In what follows, we summarize the classification result.

We had 249 valid answers from 192 developers about symptoms of high impact bugs which actually get FLOSS developers in trouble in the past. In Table 5, the most common symptom was “unexpected processing” responded by 17% of developers (42 cases). As regards “unexpected processing”, we could confirm less in common with bug reports. They ranged from different calculation results to unexpected rendering. The next most frequent was “sudden stop” responded by 16% of developers (39 cases). The corresponding bug reports shown by the developers suggested us that it often happened due to null pointer exception, run-time error exception, segmentation error, and overflow. Although the above two are related to “Behavior” of a program, the third and fourth most common symptoms were “Security” concerns such as “vulnerability” and “unauthorized access.” About “vulnerability,” the corresponding bug reports suggested the developers especially concerned with XSS and SQL injection attacks. The OpenSSL problem (i.e., Heartbleed) and the hidden way of leaking user data were included in bug reports about “unauthorized access.”

In RQ1, 53% of developers think that security is the biggest concern among high impact bugs in the previous studies. However, the result in RQ2 shows that the developers come across high impact bugs about Behaviour more often than the one about security. In fact, one developer said, “Since the mentioned project is (mostly) a client-side javascript library, security problems aren’t common.” Based on the results here, we answer RQ2 as follows.

figure b

4.4 RQ3: What Kinds of High Impact Bugs Should Be Studied Newly by the SE Community in Order to Support FLOSS Developers?

The percentages in Table 5 are indicated by boldface if the corresponding symptoms account for about 80% of all the symptoms (i.e., the developers frequently encounter the symptoms with boldface.). For the majority of the symptoms, we surveyed existing studies which have tried to find and/or resolve the symptoms and showed references as “Ref” in Table 5.

The percentages of the symptoms in “Behavior” category are relatively high and these have been well-studied by the SE community [1, 2, 5, 25,26,27, 30,31,32,33,34, 36]. For instance, “never start function” is well studied breakage bugs [30] so-called regressions which disable existing functions due to additional changes to software products. Although in the paper we did not introduce this as a high impact bug, “unexpected processing” is well studied as a functionality bug or feature bug [32]. “disable start” and “sudden stop” are also studied as build bugs [36] and crash bugs [2] respectively.

As we confirmed “vulnerability” and “unauthorized access” achieved relatively high attention from the developers, security bugs are also considered high impact by researchers and have been well studied [6, 12, 16, 17, 19, 29, 35]. “Lower performance” in “Effect” category is also well studied [20,21,22, 35] as performance bugs. However, to the best of our knowledge, there is no study on “data loss” in “Data” category which is of relatively high concern to FLOSS developers (5%). For instance, a bug on “data loss” in “Data” category is observed when deleting data related to the operation under a condition. Other data loss bugs occurred due to executing the wrong processing of multi-transaction or by using variables not multi-threaded (e.g., HashMap in Java). In fact, for instance, loss of users’ data such as their pictures was recently reported in the update of Windows 10 [3]. We regard it is one of the perception gaps between FLOSS developers and SE researchers and should be should be studied, allowing us to address the issue. Based on the results here, we answer RQ3 as follows.

figure c

5 Discussions

5.1 Is the Current Support for High Impact Bugs Enough? How Can We Help FLOSS Developers Fix Bugs?

Many studies focus on symptoms of bugs and observe their characteristics and impacts [4, 7, 22, 30]. We classified bugs based on the symptoms of high impact bugs included in the answers from the interviews. However, it is not enough to support fixing bugs because we can not fix them only by knowing the symptoms of high impact bugs. In this section, similar to the classification of the symptoms, we classify causes of high impact bugs obtained in the answers from the interviews. With the classification of the causes, we discuss how we can support fixing the high impact bugs in each category.

Table 6. Root causes described in actual bug reports

Table 6 shows the causes of bugs considered high impact by the respondents. Here, we had 182 valid answers including the root causes of high impact bugs (from 142 developers ). In Table 6, the most common root cause was “insufficient processing” reported by 35% developers (49 cases). Furthermore, we broke down the 49 cases of “insufficient processing” and found that they consisted of 13 cases of “insufficient checks for inputs by users”, nine cases of “insufficient checks against malicious inputs”, five cases of “insufficient null checks such as arguments and return values”, four cases of “insufficient authority checks “(e.g., database), and “others” (16 cases). The second and third most common root causes were “problem in 3rd party” and “change in 3rd party” respectively which relate to 3rd party libraries and systems.

Figure 1 depicts the relationship between symptoms and causes. The line width shows how symptoms and causes have strong relationships. For ease of reading graph, we only show the relationships that appeared more than two-times (which account for 52% of all the relationship). For further information, we provide the data and the figures showing all the relationships on our online appendix [24]. Looking into the causes of “unexpected processing”, the most frequent causes are “insufficient processing” (11 cases), and “problem in 3rd party” (5 cases), and “usage” (4 cases) are following. Furthermore, we investigated the detail of the bugs and found that nine of 11 cases were happened via regular usages and two cases were occurred by abnormal usage. These bugs should be found by unit test, or developers can utilize many studies (e.g, test case generation) to find them. “Sudden stop” is also caused by “insufficient processing” (11 cases). Eight of the 11 cases are caused by insufficient checks for inputs by users or by insufficient null checks, which are able to be applied by binary analysis [25].

The causes of both “disable start” and “never start function” are related on 3rd parties. The main causes of “disable start” are “changes in 3rd party”. We found two interesting answers indicating difficulty in finding the bugs are brought by “changes in 3rd party”. One developer said, “It was just to update dependencies but finding bug was hard”. Moreover, another developer said, “Because there are too many packages in the repository, nobody can keep an eye on all the packages”. Even though developers roughly know that the causes of the bugs are caused by changes in 3rd parties, it is difficult to specify which changes by third party products created the bug. Although Ma [15] et al. investigated common practice to fix cross-project correlated bugs in the GitHub scientific Python ecosystem, there are no approaches to specify the root causes of cross-project bugs.

“Vulnerability” and “unauthorized access” are mostly caused by “insufficient processing” (6 cases each). Among the subcategories of “insufficient processing”, the most common causes are derived from insufficient checks against malicious inputs (3 cases each). Fuzzing techniques [11, 14] are able to be used to find the insufficient checks.

Fig. 1.
figure 1

Relationships between symptoms (left axis) and causes (right axis)

The cause of “data loss” is “insufficient processing”, which can be broken down into three different cases: “insufficient checks for input by users”, “insufficient checks against malicious inputs”, and “others”. Because of no similarities, we could not find any approaches to address “data loss” bugs, therefore it requires further work in the future.

5.2 Threats to Validity

Internal validity: The categorization of Table 5 and 6 may not be perfect. We have a good deal of knowledge about software, but we also recognize the limitations of our knowledge about specific domains. We also might bias in creating the category. External validity: Although the developer demography consists of developers working in a wide range of product domains, a judgment if a bug is high impact or not would depend on a product domain. Construct validity: To avoid bias in the developers responses, we asked them about high impact bugs without providing rigorous definitions of high impact bugs. Attitudes towards high impact bugs might be different among the developers.

6 Conclusion and Future Work

In this study, we interviewed 322 notable GitHub developers to reveal the perception gaps between FLOSS developers and researchers on bug finding and fixing. We manually inspected and classified actual bug reports which were presented and considered high impact by the developers. As a result, we concluded that security and breakage bugs are highly important for FLOSS developers. We also identified “data loss” bugs should be studied newly by the SE community to support FLOSS developers. Based on the bug report data presented by the developers in this study, we plan to investigate actual impacts of bugs on the size of source code change, resolution time, and so forth in the future.