Keywords

1 Introduction

Many public organizations routinely store large volumes of data. The storage and analysis of this data should benefit society, as it can enable organizations to improve their decisions. Members of the public often assume that the authorities are well equipped to handle data, but, as Thompson et al. [41] illustrate, this is not always the case. Thompson et al. explain that these issues often do not arise from existing business rules or the technology itself, but from a lack of sound data governance. The objective of this article is to derive principles for data governance for develo** effective data governance strategies and approaches.

Many academic sources follow the information governance definition of Weill and Ross [46] and define data governance as specifying the framework for decision rights and accountabilities to encourage desirable behavior in the use of data [18, 28, 49]. Practitioners such as the Data Management Association (DAMA) tend to disagree with this generalization believing that data governance is more than only the specification of a framework, but can also be practiced. According to Otto [25], important formal goals of data governance for public organizations are: 1. to enable better decision making, 2. to ensure compliance, 3. To increase business efficiency and effectiveness, 4. to support business integration [25].

Data governance provides both direct and indirect benefits [20]. Direct benefits of data governance for business processes can be linked to efficiency improvements [13, 15, 20, 35], an increase in revenue and market share [3, 4, 7], reduced risk [25, 28, 49] and a reduction in costs incurred [22, 26, 27, 29]. Reductions in risk can be found in reducing privacy violations [39, 41, 42], increasing data security [18, 29, 41], and reducing the risk of civil and regulatory liability [26, 39]. Indirect benefits of data governance can be found in improving the perception of how information initiatives perform [13, 20, 43], improving the acceptance of spending on information management projects [29, 39, 41], and improving trust in information products [27, 28, 49].

Although scant attention has been paid to this topic by the scientific community, there have been several calls within the scientific community for more systematic research into data governance and its impact on the information capabilities of organizations [25, 28, 49]. Little evidence has been produced so far indicating what actually has to be organized by data governance and what data governance processes may entail [25]. Most research into data governance till now has focused on structuring or organizing data governance. Evidence is scant as to which data governance processes should be implemented, what data governance should be coordinating or how data governance could be coordinated [49]. By means of a systematic review of literature, the principles of data governance we present here attempt to fill this gap. This article is in line with Wende’s [49] call for further analysis of the guidelines and policy aspect of data governance.

2 Research Methods

According to Webster and Watson [45], a methodological review of past literature is important for any academic research, and they criticize the Information Systems (IS) field for having very few theories and outlets for quality literature review. A lack of proper literature reviews can and has hindered theoretical and conceptual progress in IS research [21, 45]. This article follows the method proposed by Webster & Watson and Levy & Ellis and attempts to methodologically analyze and synthesize literature and as such provide a firm foundation to data governance and advance the knowledge base of data governance by providing number of principles for data governance that can be used by researchers to focus on important data governance issues, and by practitioners to develop an effective data governance strategy and approach. There is only limited research on data governance [25, 49] and an elaborate analysis of the interaction of roles and responsibilities, and the principles of data governance is missing. For our research, we therefore also incorporate data governance sources from practitioners (e.g., [9, 13, 20, 35, 37, 38, 43]).

In November, 2015, the keywords: “data governance”, and “principles”, returned 17 hits within the databases Scopus, Web of Science, IEEE explore, and JSTOR. 8 hits were journal articles, 6 were conference papers, 2 were books and 1 hit was an article in the press. OF these articles, only 1 article, [41], was directly related to e-governance. The query [all abstract: “data governance” “principles”] searching between 2000 and 2015 returned 1710 hits in Google Scholar. We found a great deal of these articles covered data governance in general, but few articles included an explicit list of principles for data governance. We then filtered these results and performed a forward and backward search to select relevant articles based on the criteria that they included a theoretical discussion on what data governance is or does. Based on this forward and backward search, 35 journal articles, conference proceedings and books were selected and relevant principles from these sources were listed. Practical sources were only used when the authors provided factual evidence for their assertions.

As the review is concept-centric, the sources were grouped according to concept proposed by Webster and Watson [45]. Webster and Watson recommend the compilation of a concept matrix as each article is read (Table 1). The next step recommended by Webster and Watson is to develop a logical approach to grou** and presenting the key concepts that have been uncovered (Table 2) and synthesize the literature by discussing each identified concept.

Table 1. Long list of data governance key concepts

Following the recommendations of Bharosa and Janssen for principle generation, the long list of concepts seen in Table 1 was reduced to a short list as seen below in Table 2. The articles were categorized based on the types of variables examined, a scheme that helps to define the topic area. Principles constrain the design which ultimately seeks to attain the required business goals. By focusing on the formal goals of data governance which contribute to e-governance (enable better decision making, ensure compliance, increase business efficiency and effectiveness, and to support business integration), which we identified as independent variables, we were able to identify the dependent variables (long list of concepts, Table 1) contributing to these goals and grouped them according to intervening variables (short list of principles, Table 2), which appear in more complex causal relationships. Intervening variables come between the independent and dependent variables and shows the link or mechanism between them. Four concepts related to the goals of data governance were identified in the literature (Table 2). At this stage in our research no unit of analysis is included in the matrix, as the unit of analysis currently used is the organization. Future research can focus on identifying which principles are applicable to the varying units of analysis (organizational, group, or individual).

Table 2. Concept matrix showing the concepts in relation to the authors

3 Foundation and Boundaries

Principles are particularly useful when it comes to solving ill-structured or “complex” problems, which cannot be formulated in explicit and quantitative terms, and which cannot be solved by known and feasible computational techniques [34]. Principles are a set of statements that describe the basic doctrines of data governance [9]. This paper follows the definition of Bharosa and Janssen who define principles as “normative, reusable and directive guidelines, formulated towards taking action by the information system architects” p. 472. In their Architecture Framework (TOGAF), the Open Group [40] lists five criteria that distinguish a good set of architecture principles: understandable, robust, complete, consistent and stable. Van Bommel et al. [4] believe that the underlying tenets should be quickly understood by individuals throughout the organization and according to Khatri and Brown [18], principles should be supported by a rationale and a set of implications. A robust principle should enable good quality decisions to be made, and enforceable policies and standards to be created.

There is much confusion about what ‘data’ really is. Data is a set of characters, which have no meaning unless seen in the context of usage. The context and the usage provide a meaning to the data that constitute information [1]. Most scientific sources use the terms “information” and “data” interchangeably. This generalization has led academic sources to follow the information governance definition of Weill and Ross [46] and define data governance as specifying the framework for decision rights and accountabilities to encourage desirable behavior in the use of data [18, 28, 49]. Practitioners tend to disagree with this generalization as whilst the scope of data governance may include information as well as data, the two are different. The term, “data” is often distinguished from “information” by referring to data as simple facts and to information as data put in a context or data that has been processed [16, 32]. Also, many practitioners prefer to define data governance as a business function. For example, Forrester research defines data governance as being “a strategic business program that determines and prioritizes the financial benefit data brings to organizations as well as mitigates the business risk of poor data practices and quality” [51, p. 1]. DMBOK [17], defines data governance as, “The exercise of authority, control, and shared decision making (planning, monitoring and enforcement) over the management of data assets” p. 37. As such, in the eye of the practitioner, data governance is more than only the specification of a framework, but can also be practiced. Data governance ensures that data and information are managed appropriately. Theoretically, data governance describes the processes, and defines responsibilities. Data managers then work within this framework.

4 Principles of Data Governance

Four principles were identified from the basis of the literature review. These principles are presented individually in detail in the following sections.

4.1 Organization

Most researchers agree that data governance has an organizational dimension [18, 26, 49]. For example, Wende and Otto [49] believe that data governance specifies the framework for decision rights and accountabilities to encourage desirable behavior in the use of data. The first organizational dimension of Otto (2013) relates to an organization’s goals. Formal goals measure an organization’s performance and relate to maintaining or raising the value of a company’s data assets [26]. Functional goals refer to the tasks an organization has to fulfil and are represented by the decision rights defined such as the definition of data quality metrics, the specification of metadata, or the design of a data architecture and a data lifecycle [44]. Otto’s second organizational dimension is the organizational form, such as the structure in which responsibilities are specified and assigned, and the process organization. Issues are addressed within corporate structures [49]. The data governance model is comprised of roles, decision areas, main activities, and responsibilities [49]. However, the organization of data governance should not be seen as a “one size fits all” approach [49]. Decision-making bodies need to be identified for each organization, and data governance must be institutionalized through a formal organizational structure that fits with a specific organization [22]. Decision rights indicate who arbitrates and who makes those decisions [9]. According to Dawes [8], “stewardship” focuses on assuring accuracy, validity, security, management, and preservation of information holdings. Otto’s [26] third organizational dimension consists of a transformation process on the one hand and organizational change measures on the other. Malik [22] indicates the need to establish clear communications and patterns that would aid in handling policies for quick resolution of issues [22], and Thompson et al. [41] show that coordination of decision making in data governance structures may be seen as a hierarchical arrangement in which superiors delegate and communicate their wishes to their subordinates, who in turn delegate their control.

4.2 Alignment

Data governance should ensure that data meets the needs of the business [29]. A data governance program must be able to demonstrate business value, or it may not get the executive sponsorship and funding it needs to move forward [35]. Describing the business uses of data establishes the extent to which specific policies are appropriate for data management. According to Panian [29], if used correctly, data can be a reusable asset as data is a virtual representation of an organization’s activities and transactions and its outcomes and results. Data governance should ensure that data is “useful” [8]. According to Dawes, information should be helpful to its intended users, or should support the usefulness of other disseminated information. While government organizations may want to achieve the goals of data governance in theory, they often have difficulty justifying the effort unless it has a practical, concrete impact on the business [29]. Data governance also provides the framework for addressing complex issues such as improving data quality or develo** a single view of the customer at an enterprise level [29]. Wende and Otto [49] believe that a data quality strategy is therefore required to ensure that data management activities are in line with the overall business strategy. The strategy should include the strategic objectives which are pursued by data quality management and how it is aligned with the company’s strategic business goals and overall functional scope. Data quality is considered by many researchers to be an important metric for the performance of data governance [18, 27, 49].

4.3 Compliance

Data governance includes a clearly defined authority to create and enforce data policies and procedures [50]. Panian [29] states that establishing and enforcing policies and processes around the management data is the foundation of an effective data governance practice. Delineating the business uses of data, data principles establish the extent to which data is an enterprise wide asset, and thus what specific policies are appropriate [18]. According to Malik [22], determination of policies for governance is typically done in a collaborative manner with IT and business teams coming together to agree on a framework of policies which are applicable across the whole organization [22]. Tallon [39] regards data governance practices as having a social and, in some cases, legal responsibility to safeguard personal data through processes such as “privacy by design”, whilst Trope and Power [30] suggest that risks and threats to data and privacy require diligent attention from organizations to prevent “bad things happening to good companies and good personnel” [30] p. 471. Mechanisms need to be established to ensure organizations are held accountable for these obligations through a combination of incentives and penalties [1] as, according to Felici et al. [11], governance is the process by which accountability is implemented. In such a manner, accountability can unlock further potential by addressing relevant problems of data stewardship and data protection in emerging in data ecosystems.

4.4 Common Understanding

According to Smith [36], governing data appropriately is only possible if it is properly understood what the data to be managed means, and why it is important to the organization. Data understanding is essential to any application development, data warehousing or services-oriented-architecture effort. Misunderstood data or incomplete data requirements can affect the successful outcome of any IT project [36]. Smith believes that the best way to avoid problems created by misunderstanding the data, is to create an enterprise data model (EDM) and that creating and develo** an EDM should be one of the basic activities of data governance. Attention to business areas and enterprise entities should be the responsibility of the appropriate data stewards who will have the entity-level knowledge necessary for development of the entities under their stewardship [36]. To ensure that the data is interpretable, metadata should be standardized to provide the ability to effectively use and track information [18]. This is because the way an organization conducts business, and its data, changes as the environment for a business changes. As such, Khatri and Brown [18] believe that there is a need to manage changes in metadata as well. Data governance principles should therefore reflect and preserve the value to society from the sharing and analysis of anonymized datasets as a collective resource [1].

5 Discussion

Data governance is a topic that is attracting growing attention, both within the practitioners’ community and among Information Systems researchers, due to growth of the amount of data. But data governance is a complex undertaking, and data governance projects in government organizations have often failed in the past. There is not one, single, “one size fits all” approach to the organization of data governance. Decision-making bodies need to be identified for each individual organization, and data governance should have a formal organizational structure that fits with a specific organization [22]. An organization outlines its individual data governance configuration by defining roles, decision areas and responsibilities, with a unique configuration, and specialized people need to be hired, trained, nurtured, and integrated into the organization. Researchers have proposed initial frameworks for data governance [18, 27] and have analyzed influencing factors [44] as well as the morphology of data governance [25]. A number of data governance principles have emerged out of this research. These principles are depicted in Fig. 1 below. From the Long list of principles, four principles of data governance for public organizations were distilled. These principles are: 1. Organization, 2. Alignment, 3. Compliance Monitoring and Enforcement, 4. Common Understanding. Data Governance should ensure that data is aligned with the needs of the business. This includes aligning the quality of the data with the quality required by the business. Data quality is often related to “fitness for use” and data governance demands binding guidelines and rules for data quality management [27].

Fig. 1.
figure 1

Long list of key concepts and principles of data governance

Governing data also includes ensuring compliance to the strategic, tactical and operational policies which the data management organization needs to follow. While use of data has significant potential, many policy-related issues must be addressed before their full value can be realized. These include the need for widely agreed-on data stewardship principles and effective data management approaches [15]. Public organizations need to be able to create and share information in a way that is specifically customized for that organization to ensure a common understanding of the data.

6 Conclusions

Data governance is a complex undertaking and many data governance initiatives in public organizations have failed in the past. Principles of data governance include organization of data management, ensuring alignment with business needs, ensuring compliance, and ensuring a common understanding of data. However, the organization of data governance should not be seen as a “one size fits all” approach and data governance must be institutionalized through a formal organizational structure that fits with a specific organization. Data governance should also ensure that data is aligned with the needs of the business. This includes ensuring that data meets the necessary quality requirements. Ensuring alignment can take the form of defining, monitoring and enforcing data policies (internal and external) throughout the organization. Establishing and enforcing policies regarding the management of data is important for an effective data governance practice. But governing data appropriately is only possible if it is properly understood what the data to be managed means, and why it is important to the organization.