1 Introduction

In the era of information, it is indisputable that data has become one of the most valuable assets(Perrons & Jensen, 2015). The extensive construction of smart cities further promotes the use of new data, technologies and analytics to assist in efficient urban management. Data-driven urban management can optimize the allocation of urban resources and detect unrecognized needs, enabling urban management to enhance the welfare of citizens while meeting the needs of different agents in time. As the operation of mega-cities becomes more and more complex, data provides new tools to improve urban social governance. Through data-oriented analysis, modeling and decision-making technologies, as well as data-based management, service and policy formulation models, government agencies have gradually improved their governance capabilities and governance levels.

With the further development of information and communications technology, making use of big data has brought about new opportunities and thus attracted much attention from enterprises, resulting in a concern for data governance (Zhang, 2020). Currently the main problems in data management are rooted in the confusion or lack of data governance. Data governance has been increasing stressed because of its importance and advantages in managing data usage in organizations both internally and externally (Hagmann, 2013; Khatri & Brown, 2010; Weber et al., 2009) This concept has gained high attention in industries where data is the core business, such as banking, insurance, and telecom. Data governance is a critical issue for every organization that relies on data to drive business value (Al-Badi et al., 2018), and in the context of urbanization in the digital age, it will be widely promising and meaningful to improve the ability of solving practical problems in city management and development through data governance.

Efforts have been devoted to data governance and its support for urban governance in pioneering research and practices. About technical research, scholars have explored collaborative approaches to data governance based on blockchain technology, which is used to solve problems such as inconsistent data standards and data security (Liu et al., 2020; Lu et al., 2022; Song et al., 2018). There are also studies exploring the framework design of data governance, which is used to ensure the real-time and reliability of data usage and enhance the efficiency of data utilization (Abraham et al., 2019; Al-Badi et al., 2018; Q. Zhang et al., 2022). About practical application, data governance techniques are applied to multiple city-related fields including epidemic prevention and control (Li et al., 2022), public security (Wu et al., 2016), and proposed a "data-centric" smart city technology system in the smart city architecture (Wang et al., 2014). It basic concept is to design the smart city as a large information system, in which the data governance function is realized through a series of data processing procedures from data collection to its application.

**ong et al., (2010) suggested that the volume and complexity of current data sets are growing dramatically, leading to many new challenges and problems. The study proposed a new paradigm of data vitalization. Data can be organized into a set of vitalization units that not only store data, but also maintain the logical organization and physical storage structure of the data, with the abilities to sense, learn, memorize, and communicate. The highly-flexible paradigm can facilitate large-scale dataset analysis by adapting the system to complex analytical applications. (Silva et al., 2018) pointed out that due to the exponential growth of data volume brought by smart city development, data processing and analysis have become cumbersome and challenging. The paper designs a smart city three-tier architecture that integrates data aggregation, data manipulation and service management tasks. The data management component is considered the brain of the architecture as it performs data filtering, analysis, processing and storage of valuable data. Wu et al., (2020) proposed that the decentralization, tamper-proof and anonymity features of blockchain technology provide a new trusted computing paradigm for mobile edge computing systems, being a way to secure data during collection, transmission, storage and computation. Moreover, (Silke et al., 2019) connected data governance tasks in urban environment management with entities involved in urban data management, provision, and utilization (e.g., data owners, data managers, IT system administrators, data strategy coordinators, and data strategy officers), and studied the requirements for communication and interaction among these entities. These entities and their interactions are embedded in the decision-making process for data quality management, data access management, general data lifecycle management, and metadata management of data sets in urban environments.

Generally speaking, smart city data computing technology system frameworks above have some common characteristics. Data acquisition and management technology usually provides the underlying support, and data computing (mining, processing and analysis) technology acts as the core composition of the whole framework, on top of which diverse service applications are provided to city users (Wang et al., 2014). In these systems, data governance is integrated into the whole computing process as the main content of data processing.

2.2 Data governance framework for e-government and smart cities

After the concept of "data governance" was introduced in 2014, it has been developed almost simultaneously with the concept of "smart city" and big data technology, and has been gradually and widely used in discussions related to urban governance and smart cities (Paskaleva et al., 2017). In particular, since the government is the main subject of traditional urban governance and holds most of the basic data supporting city operation, data governance in government largely shares the same model and objectives with city data governance. Since 2016, there have been very many studies on government data governance in the field of e-government (Mechant & Walravens, 2018).

Ting & Williamson, (2000) make clear that one of the key aspects to consider to achieve collaborative governance of data generated at the city level is the data shared across organizations with their respective decision-making processes and activities. However, these authors do not indicate methods or specific solutions to achieve these goals. With the explosive growth of data, the requirements of data governance are increasing and the need to maintain data consistency and reliability, (Al-Badi et al., 2018) proposed a new data governance framework consisting of identify organizations structure, stakeholders selection, big data scope determination, policies and standards setting, optimize and compute, measure and monitor quality, data storage, communication and data management. The seven core principles of information governance guidelines are also applicable to big data governance. Kim & Cho, (2017) proposed that data governance is not a technical application but about policies, organizations, standards and guidelines. Data leakage and monopoly are the core issues to be addressed by data governance, so there is a need to develop an effective data governance framework. The definition of access, control, and accountability of data can be specified by a framework for data governance. The data governance framework enables a systematic organization of thoughts and communications about complex and ambiguous concepts. Following the methodology of comprehensive and integrated approach to complex city management, (Wang et al., 2021) proposed the GBCP smart city governance data model based on the electronic public service (eGBCP) model, which fully considers the three roles of government (G), public service enterprises (B), and the public (C) around the provision of public goods (P), fully relies on information technology, and develops all-factor public affairs management, and applies it to Bei**g city management comprehensive law enforcement big data platform.

In general, the comprehensive data governance framework system is not limited to the technical implementation of data governance, but expands the vision to various elements of subjects and their related activities inside and outside the government or the city, but the system framework lacks empirical research verification. In the field of government data governance, there are still research on technical tools for implementing government data governance, research on data governance using empirical analysis, and research on legislation of government data governance waiting for further development in the future (Yan & IEEE, 2018).

2.3 Broad data governance scope for digital economy and society

CHYI & Panfil, (2020) argue that the governance of smart city data is not only concerned with the rules surrounding the collection, use, sharing, retention, and disposal of data itself, but also with how these rules are made, contested, and changed (i.e., who is making the rules and what processes are used in rulemaking). To this end, the authors adapt four of Ostrom's eight principles that apply to the context of smart city data. Data governance for government can assist macro decision-making, risk early warning and supervision. It can maximize the value of data by improving the legal identification and technical protection of data governance, comprehensively promote the digital transformation of cities, and achieve sustainable development (Yan & IEEE, 2018). The EU Council approved the Data Governance Act on May 16, 2022. By building an ecosystem for data sharing, circulation and utilization through the Data Governance Act, a data governance policy as well as the coordination mechanism at the national or regional level is established to improve the data governance system (Commission, 2022). The US federal government's big data governance policy takes the achievement of "maximizing the value of big data technology and minimizing the risk of utilization" as the core concept of the policy, including six major areas: data openness, information disclosure, personal privacy protection, e-government, information security and information resource management. A large number of laws, regulations and administrative orders have been promulgated around these six policy areas. The U.S. Department of Defense provides data governance with the principles, policies, processes, and frameworks needed to effectively manage data at all levels, from creation to processing, in order to provide society with more collaborative innovation capabilities (DOD, 2020).

It can be seen that the connotation and discourse on urban data governance is being enriched, from the technical concern of data to the exploration of public interest, and the topic of urban data governance is gradually expanding from computer information science and urban professional fields to many disciplines such as public policy, management, economics, and sociology, and the research shows a situation of a hundred schools of thought and a hundred flowers. A systematic and scientific common knowledge about urban data governance needs to be further explored in practice.

3 Challenges in city data governance practices

The issues of insufficient urban data utilization and sharing have been frequently pointed out by previous literatures (Bello-Orgaz et al., 2016; Figueiredo, 2017). In terms of the current status of urban data governance and utilization, there is still much progress to be made towards the goal of fine data-driven urban governance. Several difficulties remain unsolved in current research and practice.

3.1 Challenge I: disconnect between technology and theory

On the one hand, current urban data or big data solutions are mainly emerged from the field of computer and information technology and rely heavily on data science and computer tools (Ganeshan, 2021). Big data approaches have gained popularity because of their abilities to solve many intuitive problems and obtain immediate and practical effects. However, data governance in this context tends to be limited to narrowly defined technical issues such as information systems and system cluster data while ignoring other factors, which could become the bottlenecks of technical applications. On the other hand, previous works regarding the theoretical construction of data governance framework often neglected technical application methods, and many of them lacked solid theoretical sources and scientific foundations, which brings about much divergence and inconsistency among previous system frameworks (Rascão, 2021). Since a framework generally requires the adjustment of organization and management system, and its ability to solve practical problems cannot be evaluated directly, it is costly to put the framework into practice and its benefits might not be obvious in the short term, which would prevent the framework from being applied.

3.2 Challenge II: decomposition and distribution behind the information island problem

A major goal of data governance is to solve the problem of “information island” and realize the interconnection and synergy of information. The current technical means of data sharing, data integration, data interface specification can improve data connectivity to some extent, but they are still inadequate to meet the actual application needs. This is generally attributed to the defects in institutional mechanism rather than technical problems (Coleman et al., 2009).

The current organizational structure of urban governance is based on a reductionist decomposition of the system. In fact, both the structure of urban social governance and that of information system are developed by decomposing the whole into parts. During the decomposition, the complexity of the system increases dramatically, and interactions arise among different components, including contradictions in resource allocation and the consequent conflict of interests (especially when the data itself is a resource). The difficulty of data decomposition could reduce data traceability, and the lack of data allocation might block data sharing. Given the limited amount of resource, it has been a professional problem to achieve the overall optimization goal by reasonable resource allocation, and it is even more complicated in the complex system environment. However, previous studies on city data governance rarely discussed this issue either technically or theoretically.

3.3 Challenge III: lack of systematic and operational means for verification

After gathering problems and available means, we need to ascertain the specific method for each problem, and an artificial procedure is required to put the theory into practice and achieve our goal. The existing data governance theories and policy studies either lack elaborate methodology about how to construct a data governance system in practice, or merely propose some key concepts, elements or components for data governance system or framework while lacking in plans or programs for its implementation, action or organization. These problems make it difficult to apply data governance theories in practice (Miao et al., 2022). The same is true in the technical aspect, where data utilization is caught in a mismatch between resource and demand – where there is data, there is no way to use it; where there is demand, there is no idea on what data is needed. This leads to the lack of positive feedback of data governance without demand-oriented to break through siloed interests, thus making it difficult to form a spontaneous and benign driving force (Lee & Lee, 2013). The key question is how to effectively connect data and applications, and achieve mutual matching and adaptation of data and needs through an operatable methodology.

4 City data governance as complex system engineering project

4.1 City data governance engineering

Data governance is a manual activity. Based on certain scientific principles and technical means, activities are organized in order to achieve specific goals regarding data-driven smart city governance. This is in line with the engineering concept. From the perspective of the city as a whole, considering the interconnectedness of data and the organizational management nature of governance activities, city data governance should be viewed as a system engineering project. Given the problems in previous research and practice, the purpose of this paper is to design an artificial procedure that could support the requirements of city management for data and transform urban data into desired forms (Y. Zhang, 2022).

Referring to the discussion on data engineering, we consider “city data governance engineering” as a broad data governance engineering activity with urban data as the object. It contains the construction of organizational structure, operation mechanism and related system with regard to data governance, as well as data standard unification, operation specification and data quality control, with emphasis on the process from the original urban data to the final forms for value creation and realization.

4.2 Characteristics of city data governance engineering

Existing discussions tend to underestimate the complex governance dilemma posed by the availability of "big" and "open" data in smart cities (Edelenbos et al., 2018). The characteristics of city complex system, along with those of big data generated during city operations, make urban data governance itself a complex system project. Characteristics of the project may include:

  • Complexity. All the urban elements can generate data in the process of city operation, and the expansion of city population and construction scale also triggers the explosive growth of data scale. Meanwhile, the close interrelationship between data results in the high complexity of data model (He & He, 2014).

  • Uncertainty. Although data aggregation and sharing are generally achieved in urban data governance practice, the accuracy, timeliness, and reliability of currently available urban data are hardly guaranteed, and the common data quality problems make it difficult to obtain effective and accurate solutions to most urban social problems. Thus, city data governance activities are faced with uncertainty most of the time.

  • Openness and dynamics. Considering the accumulation of data and the continuous evolution of the city, the governance of urban data should be long-term and continuous activities with openness and compatibility, and can evolve iteratively based on both internal and external developments. During this process, the demands for data-aid functions change over time, so the data produced from this project should also change continuously to satisfy various needs.

These characteristics determine that the implementation of the project is not finished in one time, but should be constantly improved during the iterative process in practice, so as to approach the goals step by step and obtain "finite solutions to complex problems".

5 City data governance system: elements and framework

In this section, we extract some key elements from urban data governance. Basically, it should include city, data and governance. Besides, data quality, value and security should also be considered. Based on these elements, we establish a four-level system framework covering cognition, methodology, technology and practice. First, the main objectives of the engineering are defined through cognition. Second, the pathway of data governance can be guided by urban planning and design methods. Finally, the requirements for the quality, value and security of data can be met through an integrated technology system.

A conceptual framework should be fully verified through smart city strategies (Sugandha et al., 2022). A feedback cycle is crucial for the stable and balanced operation of the system (Centeno et al., 2022). Currently, urban digital construction and development is still limited to large-scale engineering construction projects, which require the realization of specific functions within a certain period. However, it has not been fully considered what impacts such intervention might have on urban development and evolution after the completion of these projects due to the missing of feedback process. To deal with the problem, an important aspect of this framework is to establish a data governance evaluation method specific for urban conditions to verify the feasibility and benefits of the system, and constantly modify and improve the system based on problem feedback to achieve a virtuous circle. Given the constant change and development of cities and the continuous data generation and accumulation process, urban operation data governance should be a long-term and persistent process adapted to the evolution of cities and data, and constantly verified by practical projects. By combining quantitative evaluation methods with feasibility and benefit analysis at the practical level, an operable feedback iteration mechanism can be established to continuously promote the system design and implementation process, improve the adaptability and applicability of data governance project suiting urban development and change, and further drive the improvement of productivity.

5.1 Fundamental components: city, data and governance

Generally, the system of urban data governance can be composed of objectives, top-level design, objects and methods (Zhang et al., 2017). Data is the direct object of governance. It is the digital descriptions and records of all objects and their relationships involved in the states and processes of urban operation, as further discussed in Sect. 6.1. Governance deals with the organization and its state around “people” in city. Thus, the complete data governance work needs to deal with both of these objects in appropriate ways. In the context of city operation, as data governance deals with the complex system conditions, we should consider the nature, laws and methods of city operation and governance itself (Fig. 1).

Fig. 1
figure 1

Foundational components of city data governance

In this case we suggest that city data governance should contain at least three layers:

  • ① Governance of data: Data is the object of governance. The purpose is to solve the problems of data quality, data security, data usage efficiency, etc., based on specific technical means to better achieve data applications;

  • ② Governance of data agents: The agents refer to the “space” where the data is located or the objects and processes around the data. Similar to the broad definition of data governance system, the agents are a comprehensive set of processes and mechanisms among all relevant subjects of the system. With government, market, and society as three major agents in cities, governance of data agents should include the construction of operational mechanisms and systems of responsibility and rights;

  • ③ Governance of urban systems based on data: Using data to support urban governance, taking urban operational data as the basic elements of governance, and establishing a system of data-aid governance to support fine governance of smart cities based on practical application of massive data. These three aspects cannot be isolated and separated from each other. Therefore, a system framework needs to be developed and implemented from a holistic and integrated perspective.

5.2 Study fields: data quality, value and security

The quality, value and security control for urban data is viewed as the main challenge of urban data governance (Choenni et al., 2021), which is also the key content of urban data governance engineering research (Fig. 2).

Fig. 2
figure 2

Study fields and main content in city data governance engineering

5.2.1 Data governance on quality

Data quality is the core component of traditional data governance activities (Zhang, 2020) and is usually considered as the main goal (Choenni et al., 2021). There has been some clear definition for data quality. For example, the data quality evaluation index suggested by previous researchers contains several metrics: normality, completeness, accuracy, consistency, timeliness, and accessibility (Gibbs et al., 2002; Shanks & Darke, 1998). With data utilization as the purpose, data quality is also considered to be directly related to data availability (Ding et al., 2016), from which a series of technical measures and action mechanisms around data quality evaluation and improvement have evolved. In general, data quality provides important methodological basis for big data techniques and industrial data governance activities.

For urban data objects, an important aspect of data quality is the representation and description of the data, where a valid and clear map** is required from the dataset to a complete description of real physical space (Wand & Wang, 1996). First, data are acquired from different sources and are brought together for query and extraction. Second, the data utilizers should have a comprehensive, clear and accurate knowledge and understanding on the data, which relies on the description and perception of the data. Finally, the understanding and knowledge of the data would enable us to communicate and exchange information. In the above process, a core requirement for data quality governance is a unified language path and common understanding across multiple departments and fields. To form a strong entirety with massive data from the complex urban systems, it is necessary for all subjects and elements in the city to follow unified measurement standards and metrics. A linear and sector-based architecture makes the entire city system a heterogeneous distributed system composed of subsystems with different content structures. Thus, the key is to achieve data consistency and interoperability among various kinds of business of the city subsystems (Choenni et al., 2021), aiming at eliminating ambiguities and obtain cooperations. To make this to happen, the core is to understand the correlations and interactions between city subsystems and map them to data space, while maintaining the integrity of the system and avoiding disruption at the data level. For this reason, urban data standardization is be a necessary task to be implemented, which would be further discussed in Sect. 6.3.

5.2.2 Data governance on value

The governance on data value aims to achieve the optimal exploitation of data value through appropriate ways (services, applications, decisions or products). In fact, the value of data in business is also directly related to data quality (Kwon et al., 2014; Merino et al., 2015).

To make data available, a key step is to link data to application scenarios. From the perspective of urban governance and city operations, data are expected to meet our needs or goals for complex urban systems, which requires a pathway linking data to scenarios. One reason why the current data application in urban governance practice falls short of expectations is that there is no clear mechanism to fully integrate research results with practical applications and qualitative decisions. This could be manifested from two aspects. For one thing, from the experience of new smart city construction, the integration of data aggregation is still task to be finished because there is no clear demand or interest to form spontaneous momentum and drive the work, and thus data are stored in static platforms without effective flow and utilization. For another thing, it is difficult to apply the outcomes of scientific research to the actual urban environment apart from some mature AI technologies like CV and NLP general models. Many complex results regarding urban science, urban computing and computational social science remain in experimental state and are hardly applied to actual decision-making. Meanwhile, due to the lack of risk and effect assessment, the usage of urban data is often limited to data visualization, simple statistical analysis and other preliminary applications, while the goals of intelligence and wisdom are far from being achieved. With simulation as the basic method to obtain cognition and intervene in complex systems, a low-cost and high-efficiency "simulation-experiment" platform system should be developed to match the supply of real data with the demands from real scenarios, and output the optimal strategies from multiple experiments for the real city operation process. In general, the core of data value governance is to establish a complete technology system and application mechanism for data value chain based on the platform.

5.2.3 Data governance on security

Data security has become the core of information security in the era of big data. Both data quality and data value require data security as a prerequisite and guarantee, which mainly includes two dimensions: data security and data protection security. The former is mainly concerned with the integrity, confidentiality and availability of data, protecting the security of the data content itself and ensuring the realization of its value. The latter mainly includes the security of the information infrastructure and system that carries data resources. It protects the security of the data infrastructures and guarantees that the carrier function does not fail.

Since the era of digital economy, maximizing the value of data often relies on the aggregation, transmission, processing and analysis of massive and diverse data. Such dynamic data-intensive activities involve more diverse governance themes, agents and interests. The connotation and extension of the concept of data security are continuously extending, and the concept of "data security governance" has also emerged. Gartner defines data security governance as "a subset of data governance that cuts across all aspects of data governance and emphasizes the security attributes of data, which is used to develop and implement data security policies and thus process and protect data with unified rules". As a basic guarantee and an important part of data governance, "data security governance" emphasizes the security of data, and balances it with data availability by forming organizations and formulating strategic rules, which is highly overlapped with the public policy system of data governance.

5.3 System framework: cognition, methodology, technology and practice

The proposed framework aims to address the common key problems encountered in engineering practice of urban data governance. Key problems in urban data governance are detected and analyzed at the cognitive, methodological, technical and practical levels respectively, which lead to the establishment of a systematic solution (Fig. 3).

Fig. 3
figure 3

Systematic framework of city data governance engineering containing four levels

5.3.1 Cognitive level

At the cognitive level, the first things to be explored are the objectives of constructing and implementing city data governance project, as well as key factors affecting the realization of the objectives. City data governance, as a manually implemented system project, should basically serve an artificial function or purpose.

For one thing, the main objective should be a holistic optimal solution for processing and utilizing urban data. Since city data governance is considered as part of the broader city governance and humanism is always a fundamental principle, this goal should serve the development and well-beings of people live in city. Habitat Science, as an important theory to explain complex urban systems, has set sustainable development as its goal and human harmony as its vision (Wu, 2001). As a specialized topic, the core concept of smart city is also gradually marching towards sustainable development, which emphasizes environmental sustainability, economic sustainability and social sustainability (Toli & Murtagh, 2020). From this perspective, studying data elements and data utilization is a milestone rather than the ultimate goal, and it should finally serve the city sustainable development goals (Paskaleva et al., 2017). Only through this goal setting process can we detect local problems or obstacles that may arise during the implementation of data governance project, and solve complex problems based on the principles of humanistic theory (Guo et al., 2022).

For another thing, this goal should be approached gradually through iterative process. Data governance is a long-term continuous complex process. From the view of urban planning, a goal can be either overall goal or milestone, of which the latter is set in face of the actual status and problems in a certain period of time, or the most urgent needs in the current situation. With the achievement of each milestone, a continuous iterative evolution can be formed to gradually approach the overall goal.

5.3.2 Methodological level

At the methodological level, it’s important to study the effective pathway to implement data governance in the whole city with the participation of multiple agents. Since the presence of multiple agents is a basic feature of many processes including city operation, city governance and data governance, the coexistence and synergy of multiple agents is the prerequisite of implementing city data governance project. Urban agents involved in the project generally include government, market (or enterprise) and society (or public) (Zhang, 2010), each of which has its own positions, interest and demands. Thus, the key problem is to implement data governance based on proper management and coordination of interest distribution among multiple agents to achieve development goals (Chen et al., 2019). This is directly related to the nature of the silo effect. Indeed, research on the planning behavior of complex systems has explored how people can coordinate multiple decisions on different conditions (Lai, 2019).

In the latest practice of system-engineering-based smart city construction, the “top-level design” structure undertakes the overall planning and arrangement of the relevant agents. Top-level design is a comprehensive and systematic planning from a macro perspective covering design idea, design goal, design environment, design process, design content, design method and quality inspection of the design results. This method is based on scientific and explicit ideas and featured with comprehensiveness and systematization, and thus can effectively guide and promote various social, economic and engineering works (Fang et al., 2021). Nevertheless, from the experience and lessons of previous smart city construction, the top-level design approach has also shown its limitations. Cities are still treated as a static IT system, and effective connections haven’t been established between top-level design and urban development goals. Due to the lack of authority and guidance for project construction, as well as the defects of planning system and mechanism, the present planning cannot provide effective guidance and strong constraints on the specific construction work. The problem of system fragmentation and data silos can hardly be solved under the present top-level design.

Nonetheless, design thinking remains an effective tool in dealing with complex issues of urban data governance (Choenni et al., 2021). With the presence of multiple agents and common goals, the planning and design approach usually serves as an effective method to guide the systematic implementation of data governance projects, which integrates technology and management and involves public policies related to data agents. While there is uncertainty in city data governance engineering, planning and design process makes use of knowledge and technology to assist the work that requires the cooperation of multiple professionals in accordance with their common goals (Wu, 2001), so as to balance the interests of all stakeholders while remaining aligned with strategic objectives (Eke & Ebohon, 2020). Planning is particularly important for the successful implementation of a project, and a strong binding plan ensures that the construction is carried out in accordance with the established directions and requirements. Therefore, it is necessary to scientifically prepare and effectively implement a data governance plan for urban operation, to guide the direction of data resource allocation in urban operation, to regulate the behavior of governance agents, and to maintain the coherence and stability of urban data governance goals.

5.3.3 Technical level

At the technical level, there is a need to study problems that related to the application of big data governance and common ICT technologies in the urban domain. Previous studies have focused on engineering technology in city construction, stressing the top-down application of ideal (standardized) technical models and project samples, while lacking substantive thinking on the path of realization (Guo et al., 2022). Besides, the lack of technical application scenarios and the inapplicability of technology in real data environments have become important bottlenecks limiting the further development of smart cities.

Specifically, studies on engineering technology should meet the requirements of improving the efficiency of data governance while ensuring the security of urban data. Specific technologies for data governance may include communication technology, data science, computer hardware and software and so on, each of which has its own application conditions and functions. Considering that real data in city operation often include sensitive information, such as social privacy data and public security data, and that the practical value of scientific research needs to be verified by real data, secure and efficient data supply should be the core of techniques in city data governance project. On the security aspect, safe and reliable cryptography and data security transmission technologies such as blockchain, trusted computing and zero-trust architecture are the key research topics that need to be focused on for more breakthroughs, while the autonomy and control of underlying technologies such as data storage and data retrieval are also important aspects of data security. On the aspect of efficiency, several solutions have been proposed by technical researchers, including the use of crowdsourcing platform, laboratory environment or other technologies as intermediate to reduce data security risks and ensure the control over data processing and its results. This is an important field where technology research should make progress. City data governance is a huge and complex system project, and the combination of multiple technology innovations should be considered to make up for the shortcomings of individual technology or single technology system (Batty et al., 2012), so as to overcome the technical obstacles faced by the project implemented in mega cities.

5.3.4 Practical level

At the practical level, the implementation of all engineering projects should go through the stage of feasibility study. After requirement of technical feasibility is met, the economic feasibility study further evaluates the inputs and outputs of urban data governance engineering activities through scientific methods. Most of the processes in urban data governance, from the construction of data platforms and the preparation of data standards to the promulgation of data regulations, could incur costs, which include not only direct economic costs but also a series of directly or indirect social costs. From an economic perspective, it is a main concern of city data governance projects whether the various costs produce corresponding utility values and economic benefits and maximize the outputs by means of improved decision-making, timely problem detection, effective predictive analysis, etc. For city-scale large datasets, this trade-off between the level of data governance and cost effectiveness becomes extremely important (Abraham et al., 2019), and this is one of the main pathways to achieve sustainable development (Paskaleva et al., 2017).

The problems of economic feasibility in engineering practice arise from the objective conditions of limited resource. Input–output based cost–benefit analysis becomes necessary in practice to deal with the contradiction between the limited resource (including available data, capital, human resources, etc.) for engineering construction and the increasing number of urban operation elements and public demands for urban governance scenarios. To invest limited resource into multiple issues, it is inevitable to make choices and trade-offs on the allocation of resource, clarify the coverage and execution level of data governance, and choose the optimal course of action. How to quantify the inputs and outputs of city data governance engineering activities, measure the costs and benefits of the solution strategies and tools used, and make the whole data governance practice measurable turn out to be the key issue in engineering practice.

6 Urban data governance engineering practice: a case study on smart epidemic prevention

Engineering is the carrier of scientific methodologies and technologies, and project is the specific pathway to smart city and digital government development. Under the conceptual framework, theoretical cognition, methodologies and technologies should be fully verified and obtain sufficient feedback through project construction and operation processes in practice. Facing the two major issues, i.e., data-aided urban governance and development, the project practice should be able to effectively support the functions of megacities to obtain good social orders, public safety and rapid economic growth. Considering the urgent and practical needs of epidemic prevention and control in megacities, we illustrate the basic logic of the implementation of city data governance project with the case of Technology-enabled Epidemic-Resistence Data Foundation (TER-DF) in ShenZhen, China, which supports high-precision epidemic survey and prevention.

6.1 Case background

In the past two years, Shenzhen government has focused on the construction of a smart epidemiological investigation system project for epidemic prevention and control. The system serves the needs of epidemiological investigation and prevention work at all levels of urban administration, across city, district, street, community and grid, with the purpose of ensuring immediate delivery of instructions, tasks and latest information, and enabling information sharing, cooperation and timely responses among multiple civic departments. To support information transmission, decision-making, execution and other related work, the core of the project should be the construction of an unified database based on data governance, turning into the TER-DF project.

6.2 Data governance in TER-DF

TER-DF is initially designed and developed following the conceptual framework with four levels (Fig. 4).

Fig. 4
figure 4

Practice on smart epidemic prevention applied urban data governance system framework

At the cognitive level, first of all, the objectives of data governance should be clarified. With concern for citizens’ feelings, the overall goal of the project is to maximize the detection and control of risk factors while minimizing the impacts on the normal life of most people. The project construction goals include two phases. The short-term goal is to directly support the rapid and effective information acquisition and transmission in epidemiological investigation. "Rapid" means smooth data transmission networks and data exchange approaches, "effective" refers to verified data quality in terms of authenticity and availability. By contrast, the long-term goal is to build a unified urban data infrastructure at the application level, which not only supports the present epidemiological investigation but also can be applied in similar conditions in the future for information processing and public security management, so as to deal with both common and urgent issues, achieve maximum reuse and avoid repeated system construction and information acquisition.

At the methodological level, the system engineering development was combined with the policy system design. For one thing, being a smart city information system project, TER-DF follows the general path of information system development. A workflow was adapted from comprehensive demand analysis, overall system design, structure design for each layer, to specific schemes for organization and implementation. For another thing, in order to guarantee the effective operation of data governance of the project, the main agents, their relationships and operation methods involved in the data governance activities were planned and designed in advance. The project was deployed at both city and district levels, each of which has clear function and authority boundary. System operation and data flow can be clearly divided into stages according to the rights and responsibilities of relevant departments. As a result, four sets of guidelines for epidemiological investigation and prevention compatible with the project operation were proposed to facilitate data collaboration between multiple agents.

At the technical level, architecture design and integrated development of major data governance technologies were conducted in consistent with the big data governance technology stack. Based on integrated and collective technical research and development, an efficient spatio-temporal search engine technology for large-scale concurrent and heterogeneous data was achieved to improve the capacity of data governance and information acquisition. Key technical breakthroughs include spatio-temporal data query and analysis, structured data query, graph association analysis, scheduling of multi-source heterogeneous query tasks, high performance and real-time signal analysis, and accelerated video feature analysis. These techniques have been tested and verified based on real data and scenes during the project construction.

At the practical level, the quantitative evaluation indicators of data governance were taken as the guidance for project operation. The workflow for the project was established with three stages, i.e., data preparation, research and decision-making, and on-site implementation. Evaluation methods and indicators for project operation were defined for each stage, which encompassed technical indicators like automaticity of data generation and precision control, administrative performance indicators such as data reporting rate and error rate of each department, as well as overall effectiveness indicators such as on-site rapid disposal time and epidemic case rate. Based on the costs of project construction and operation, basic threshold was set for each evaluation indicator, with the goal of making the project benefits meet the expected level. Thus, the data governance process can be constantly verified, evaluated and improved through quantitative indicators and the status of benefit, and practical experience can be accumulated to continuously narrow the gap and finally converge to the project objectives.

7 Discussion

So far, by extracting the basic elements of urban data governance, we have described the meaning and characteristics of urban data governance as a complex system engineering, proposed three key research fields including data quality, data value and data security, and established a system framework covering cognitive, methodological, technical and practical levels to form a benign feedback loop. This is the first study to review the comprehensive process of urban data governance from a systematic and holistic perspective, and propose systematic solutions and explicit operation procedure from an engineering perspective. More practical problems might arise in the engineering practice of urban data governance implementation. Thus, we further discuss the research object of engineering, i.e., data, theoretical cognition of governance process and data standardization practice in this section.

7.1 City, administrative and public data

As mentioned earlier, much of the research on urban data governance focused on government data, which brings about the discussion on the content of "urban data". The concept "data" used in the government context mainly refers to public data and government data. According to the current legal definitions, public data can be considered as data generated and managed by the government agencies for administration and public service during their operation, which may involve infrastructure, healthcare, transportation, education, telecommunications, finance, etc. By contrast, government data can be defined as the data that is used by the government to fulfill its duties. The core connotation of government data is data collected, acquired and produced based on legal administrative power for the performance of administrative functions, featuring government agencies as its origin, as well as clear scope and boundary of ownership. Generally speaking, the two concepts are similar in emphasizing data as a kind of asset.

The connotation of urban data needs to be extended to more than public or government data. Considering the openness and integrality of data and system, for the purpose of limiting research content strictly within urban context, and applying the concept of city as a complex system, urban data can be seen as digital description and records of all the objects and their relationships during city operation, including source data and their derived data such as statistics, indicators, etc. In fact, this explanation avoids defining and limiting data from a single perspective (e.g., source). From the whole city’s perspective, the government should act as the coordinator and supervisor of data operation and transmission throughout the whole city construction process and digital economy development, and organize multiple forces to jointly excavate the value of data and maintain rights and interests related with data. Besides, for the processing of heterogeneous data, in addition to adopting different technical methods, the means of hierarchical classification and categorization for data should also be extended. For example, in terms of data quality, data with different quality levels can be used for tasks with corresponding data quality requirements to save the costs in data processing. In terms of data value, rules for data utilization should be set based on comprehensive consideration of rights and responsibilities. In future research, different types of data can also be integrated through “contextualization” or “ scenario” to facilitate more comprehensive data processing (Keith et al., 2020; Othman & Beydoun, 2013).

7.2 Theoretical study on related concept and method

There are very few systematic studies on urban data governance theories. As mentioned earlier, data governance, as a comprehensive activity based on the interests of stakeholders, requires analysis on different interests on urban operational data and urban governance issues, exploration on the inherent laws and the trends of development, as well as summarization and refinement of governance theories to guide deep practice, so as to promote the sustainable development of data-driven cities. Therefore, there’s an urgent need for comprehensive theoretical studies on urban data governance.

A previous study on digital libraries has pointed out that the theoretical basis of data governance is consisted of self-organization theory, stakeholder theory, and humanism theory (Guo et al., 2022). First, self-organization theory answers how cities exchange materials and information with external environment in the process of data governance, how to coordinate, decouple, and guide the relationships between various elements inside and outside of the system, and how to promote the effectiveness of the whole governance system. Second, stakeholder management theory is not only helpful for recognizing the internal and external environment of a system and promote the handling and coordination of various complex relationships inside and outside of the system, but also beneficial for deeper governance practice. Last, the theory of humanism can be a comprehensive principle to drive the progresses of multiple governance agents, such as unlocking their creativity, improving the design of information service and products, and promoting spatial reengineering and environmental optimization, etc. For example, the theory can provide solid basis for the establishment of data governance mechanisms such as incentive mechanism, constraint mechanism and target responsibility mechanism, and transform passive agents driven by external forces into active ones with spontaneous actions.

Urban planning, as typical multi-disciplinary and interdisciplinary activities managing urban objects, could provide much useful methodological experience for data governance. As the guidance and core of a development blueprint, urban planning must directly serve the purpose of urban development, which emphasizes its guiding nature. Theoretical planning research usually interacts with government’s decision-making processes. On the one hand, the results of planning research can be fed back to the government and provide support or suggestions for its decision-making, especially at the local level. On the other hand, the government's guiding opinions on urban development also influence the directions of theoretical research (Cao & Zhang, 2019). Previous research has revealed the uncertainty and complexity in urban planning (Christensen, 1985; Ekman, 2018; Lai, 2019), and some researchers have pointed out that methods such as scenario planning can be used to cope with the problem of uncertainty (Abou Jaoude et al., 2022). In conclusion, the methods and systemic structure of urban planning might provide adequate guidance for the complex practice of urban data governance planning and implementation.

7.3 City data standardization as an important approach

Existing urban data standards can be categorized into four topics: urban data resource system, urban data model description, urban data governance, and urban data integration and services. These standards are often part of the smart city standards series, and they provide macro guidance on urban data definitions, data scope, classification methods, organizational structure, etc. However, these existing standards have not yet played effective roles in practical applications, with many problems to deal with. First, they haven’t fully satisfied the demands for the interconnections of application system and data sharing and exchange. Second, there are still duplicate construction of data resources and data management tools, which would hinder experience accumulation and continuous development. Third, many divergences remain unsolved on the definition of urban data and the standardization of data structure. Last, the evaluation of data quality is still faced with the problem of missing standards (Steffek & Wegmann, 2021). While there is a long way to go for a complete standard system, the role of government in the establishment of standards has rarely been discussed (Gal & Rubinfeld, 2018).

From the perspective of data utilization, data standardization can improve data quality in terms of portability and interoperability, which makes it the key to facilitate and improve data usage. In fact, standardization is a prerequisite for the operation of systems like smart cities where the exchange of data across domains and industries is crucial (Gal & Rubinfeld, 2018). Besides, standardization can also create substantial benefits when the synergy of data is essential. Without standardization, no further progress can be made to solve the problem of isolated information island resulted from the separation of systems. Data standardization and data governance should be integrated and reinforcing each other (Gal & Rubinfeld, 2018). On the one hand, data governance can enhance the level of quantification of standards and facilitates their implementation and discernment. On the other hand, data governance will become a mandatory part of data standardization. While the standardization process requires sufficient data and information as support, data governance can provide high-quality data resource for standard development and improve the level of information feedback during standard implementation.

8 Conclusion

The paper introduces a system framework on city data governance, which describes the key issues and solutions from cognitive, methodological, technical, and practical levels respectively.

The conceptual framework is proposed to address the practical problems of data utilization and data governance in the urban governance process in Shenzhen for a long time, and these problems are also commonly found in other large and medium-sized Chinese cities under smart city development. While the goal of the framework is to better realize data-driven city governance and development, it should also be pointed out that it requires the appropriate infrastructure, policy system and technical capacity together to support accomplishment. Different cities should also develop data governance practice that are suitable for their own purposes and values based on their distinctive strategic objectives, development conditions and actual status.

Data governance is a complex and comprehensive topic, which covers a series of specialized technical areas in the whole data lifecycle from data collection, data storage and data processing to data analysis, data application and data security. It also encompasses practical issues such as data standardization, data sharing, data opening, and data services. In addition to professional IT and big data technology, many non-technical matters are also involved, such as public policy, institutional norms, and even social and humanistic issues. Since multidisciplinary and interdisciplinary fields are intertwined and interrelated in the complex urban system, data governance projects require both in-depth theoretical research and extensive field practice. For this reason, more diverse departments, institutions, enterprises and organizations from the field of urban development and data technology are expected to be involved, and form a scientific and technical community on data governance. Members of such a community can jointly draw up plans, participate in both theoretical and technical research, and ultimately put data governance project into practice, so as to make creative and substantial progress in urban governance and development, and create real value from the practice of digitalization to benefit all participants and every citizen.