Public entities around the world are increasingly deploying AI and algorithmic decision-making systems to support public services or use their enforcement powers. The rationale for the public sector to use these systems is similar to the private sector-increase efficiency and speed of transactions and lower the costs (UK Government 2016; GSA 2020). However, public entities are first and foremost established to meet the needs of the members of society and protect the safety, fundamental rights, and wellbeing of those they serve. Their existence is justified by the promise of such service and protection. People agree to abide by rules knowing they will be served in return, the decisions will not be arbitrary and there are means of redress and assigning responsibility when a harm occurs. Therefore, public entities are held to a higher level of accountability and transparency than private ones which are profit-driven and might not necessarily have the interests of public as a priority.

Currently AI systems are deployed by the public sector at various administrative levels without robust due diligence, monitoring, or transparency. This results in a growing entanglement between the private vendors and public actors, and a blurring of the lines of accountability and responsibility. Public sector actors are also keenly aware of the gap between their existing internal capability and capacity compared to what is needed to properly procure and manage these systems (Executive Order 2020; OECD 2020). This paper critically maps out the challenges in procurement of AI systems by public entities and the long-term implications necessitating AI-specific procurement guidelines and processes. This dual-prong exploration includes both new complexities and risks introduced by AI systems, and the institutional capabilities impacting the decision-making process. AI-specific public procurement guidelines are urgently needed to protect fundamental rights and due process.

1 Literature review

When a public entity deploys an AI system to provide a public service or to enforce its powers, the choice for the individual members of the public to opt-out from the use of the system or being subjected to its use is limited. An individual who is in an unbalanced power relationship against a government entity cannot easily challenge the procurement and implementation of a system. Individualizing the harms and impact of a system can also make it difficult to distinguish between personal experience and group-level collective harms. To a certain extent, this power imbalance is corrected with transparency and accountability mechanisms available in public procurement process which obliges the public actor to provide access to information. The entity may be required to conduct assessments, disclose the details and findings, be ready to share further information if requested and answer to the public. The public and civil society, on the other hand, can use this information to understand the impact of the system, on certain groups, society, or environment. Such insight can help public to also challenge the system’s fairness, request modification or termination. This evaluation may also result in consequences for the public actor. However, the ability of the public entity to effectively share information and the society to benefit from the process and hold the entity accountable can be significantly impacted with the introduction of complex algorithmic systems. This impact is compounded when these AI systems are proprietary systems.

In United States, after civil society and legislators voiced concerns over privacy and bias in facial recognition technology (Buolamwini and Gebru 2018; NAACP 2022), Internal Revenue Service (IRS) limited the use of ID.me, a biometric identity verification software. News headlines show algorithms found to be biased against African American defendants in prediction of recidivism used in sentencing and bail process (Angwin et al. 2016), leading to false arrests (Hill 2020), or downgrading of student results from underperforming schools (BBC 2020).

As these examples continue to grow, accountability concerns grow in parallel. It is now customary to list all the algorithmic bias cases at the beginning of each research paper to draw attention to how ubiquitous algorithmic systems became and how these systems might be biased. However, despite its implications on fundamental rights and due process, the literature covering the nuanced challenges of AI in public systems is still growing slowly. This paper highlights the current research and practice gap focusing on public procurement guidelines for AI systems.

Literature review of this paper covers policy documents, academic research, and civil society reports. Several policy and regulatory developments are envisaged to govern public and private use of AI systems, such as European Commission’s draft AI Act which proposes bans on certain AI systems. The draft bill requires providers develo**, and public entities using high-risk AI systems to assess their AI systems, engage in ongoing risk management and register their assessments and documentation in a public database (European Commission 2021). Council of Europe is also working on a legally binding transversal instrument, which proposes certain AI systems and practices used by public actors to be banned. Council’s Ad Hoc Committee on AI recommends human rights impact assessments to be conducted for AI systems which might have a negative impact on health, safety, and fundamental rights (Council of Europe 2021). Government of Canada requires public entities to conduct impact assessments prior to production of an algorithmic system (Government of Canada 2020), while UK regulator provides guidance to organizations on how to explain AI practices (Information Commissioner’s Office 2020). France parliament requires all algorithms used by the government be made open and accessible to the public (L’Assemblée nationale 2016). United States executive branch establishes principles for the use of AI in the Federal government (Executive Order 2019, 2020), while National Institute of Standards and Technology drafts AI Risk Management Framework (NIST 2022).

In addition to these regulatory discussions, academic researchers surface the impact of algorithmic systems in public sector and call for algorithmic accountability (Barocas and Selbst, 2016; Calo and Citron 2021; Cooper et al. 2022; Crump, 2016; Diakopoulos, 2014; Eubanks, 2018; Kroll et al. 2017; O’Neil 2016; Pasquale, 2015; Richardson et al. 2019; Schwartz, 1992; Veale et al. 2018; Young et al. 2019), and impact assessments to be mandatory (A Civil Society Statement 2021; Ada Lovelace Institute 2021; Kaminski and Malgieri 2019; Reisman et al. 2018). A robust literature identifies the need for transparency and public disclosures. Such disclosures can be in the form of transparent procurement documentation, mandated human rights impact assessments, registries, as well as specification documents detailing the qualities of the datasets used and the design decisions embedded in the AI systems (Bender and Friedman 2018; Gebru et al. 2021; Hind et al. 2019; Holland et al. 1976).

In the context of AI systems used by public sector, this multi-layered complexity can also mean that the public actor itself does not understand the system it is procuring and deploying. Institutional capacity limitations, both on procurement and implementation phases, may result in discriminatory or faulty systems embedded in core function of the entity. A great number of current regulatory efforts as well as the technical research focus on a requirement of explainability of AI systems (Adadi and Berrada 2018; Dwork et al. 2012; Forsythe 1995; Haijan and Domingo-Ferrer 2013; Ribeiro et al. 2016). Explainability usually focuses on technical transparency of the components of AI systems. The assumption is if the behavior of the model and the outcomes can be explained for different parties, then the system can be scrutinized for accuracy, mathematical definitions of fairness and model behavior. Other studies analyze effect of explainability in AI on user trust and attitudes toward AI (Shin 2021). However, technical transparency might not always be available. US Federal Acquisition Regulation (FAR), which is the primary document, and agency acquisition regulations, gives the government unlimited rights in data except for copyrighted works. FAR “specifically excludes the source code, algorithms, processes, formulas, and flow charts of the software” from the Form, Fit, Function data (US FAR 2022). Even if all information was available, as Busuioc remarks, “significant technical expertise asymmetries run to the detriment of [public sector] users, further compounded in the public sector by resource shortages and cut‐back pressures on public services, often driving the adoption of algorithms in the public sector” (Busuioc 2021). In short, the ability of public procurement teams to understand the accurate functioning of algorithmic systems is constrained by informational asymmetries, multiple sources of bias (Hickok et al. 2022; Brown et al. 2021), current procurement guidelines, human biases in perception (Shin 2022) and multi-layered complexities detailed above. These constrains then create a butterfly effect on how the algorithmic systems impact society.

5 Challenges regarding fairness

Fairness in algorithms has been discussed in different public system use cases such as welfare eligibility (Eubanks 2018; Lecher 2018), immigration detention (Koulish 2016), or recidivism (Angwin 2016; Larson et al. 2016). In the absence of AI-specific public procurement guidelines and lower levels of institutional capabilities (Dunleavy et al. 2007), public actors may implement AI systems which result in unintentional, negative impact on individuals or society. Public actors interact with society (Sloane et al. 2021). They might procure a proprietary system developed without consideration for existing policy motivations, values, regulatory rules, or fundamental rights. A four-part formula can help explain how AI systems may magnify or deepen the existing inequities and biases within society.

$${\text{Values + Data + Algorithmic Models = Outcomes}}$$

Humans encode their values within all the systems and structures they build. Value encoding, which does not consider the diversity of perspectives and experiences, results in empowering and privileging one group’s values and perspectives over others. Value misalignment, on the other hand, means what we want the AI systems to do and what AI system does may be very different, leading to serious unintended consequences (Birhane et al. 2022). So even when we intentionally try to code certain values, we might get it wrong.

Data which trains the AI models are collected by humans, shaped by humans and they are about humans. “Every data set involving people implies subjects and objects, those who collect and those who make up the collected. It is imperative to remember that on both sides we have human beings (Onuoha 2016)." Every such dataset reflects historical and structural inequities.

Algorithmic models work on mathematical definitions and functions. They optimize the given functions. There are multiple definitions of algorithmic fairness (Verma and Rubin 2018). Shin points out the meaning of algorithmic fairness is context dependent and that there is no widely accepted definition (Shin 2020). Sometimes different definitions of fairness cannot be simultaneously achieved (Berk et al. 2018; Chouldechova 2017; Friedler et al. 2021; Kleinberg et al. 2017; Mitchell et al. 2021). The issue is compounded due to dependency on ‘only’ mathematical formulations of fairness. What cannot be formalized to ensure fairness or equity cannot be part of an automated system. A corporate vendor develo** technological solutions will end up simplifying the problems. Public policies will be translated into what can be quantified and what can be coded.

A public entity must have the means to interrogate an AI system and understand the consequences of deployment. These risks must be examined at procurement stage. However, AI systems are sociotechnical systems, in other words, they are made up of both social and technical elements. They interact with their environments. Their behavior and outcomes are shaped by their interactions with humans and environment. In return, they shape their environments and change the behavior of those around them (Dobbe et al. 2018).

The incentive for the AI vendor, in such cases, is the ability to collect data, or train its AI models, or use the organization as a reference for a further sale, or simply to establish itself within the organization (Laperruque 2017) for a prolonged contract. The more connected an AI system becomes within the organization, the harder it becomes to decouple it later. Palantir is used as an example for multiple angles of the same question in this paper. The vendor is transparent in suggesting ‘The systemic failures of government institutions to provide for the public will continue to require both the public and private sectors to transform themselves’ and it wants to become ‘the central operating system not only for individual institutions but for entire industries.’ (Palantir 2020). However, the company is by no means the only example where theoretical concerns about AI systems in the public domain turn into reality. As Marietje Schaake, the director of Stanford’s Cyber Policy Centre, warns “We’re building a software house of cards which is sold as a service to the public but can be a liability to society. There’s an asymmetry of knowledge and power and accountability, a question of what we’re able to know in the public interest. Private power over public processes is growing exponentially with access to data and talent.” (Howden et al. 2021).

Public disclosures are recommended to enhance transparency. However, as detailed above, they are not always available. Effective oversight and enforcement mechanisms are crucial to enforce transparency and shed light on the actions of public actors. However, we also need to treat such transparency as a means to an end. While very useful in its own right, the focus on technical parts and outcomes misses an understanding of the social elements (Wieringa 2020). We also need to deliberate values and choices, enforce responsibility and accountability.

8 Challenges for accountability

When a private vendor is engaged to provide public services or access to public services, two issues emerge. First, such contracting means the public entities, intentionally or unintentionally, transfer some of their responsibility to private company. Such public entity which should carry a higher duty of care outsources its services to a profit-driven entity through AI systems, except some of the obligations to the public disappear in the transfer. Mulligan and Bamberger refer to “procurement as policy” whereby the algorithmic systems “frequently displace discretion previously held by either policymakers charged with ordering that discretion, or individual front-end government employees on whose judgment governments previously relied…When the adoption of those systems is governed by procurement, the policies they embed receive little or no agency or outside expertise beyond that provided by the vendor: no public participation, no reasoned deliberation, and no factual record. Design decisions are left to private third-party developers. Government responsibility for policymaking is abdicated.” (Mulligan and Bamberger 2019). Through procurement conditions and contractual arrangements, a public entity can ensure the vendor carries responsibility and liability for system outcomes or malfunction. However, this still leaves the vendor only answering to the public entity and leaves the affected individuals and communities having to deal with private entities. In cases where a vendor does not have competition in the market, it can also use its power to deflect any accountability and liability if a harm occurs. In 2021, Internal Revenue Service (IRS) signed an $86 million contract with ID.me to provide biometric identity verification services. The arrangement required taxpayers submit their biometrics in the form of a selfie to authenticate their identity. ID.me claims to already serve 27 states and multiple federal agencies (Rappeport and Hill 2022). If the service does not perform equally and equitably across different demographics due to skin tone, age or gender, a taxpayer might be penalized for the error. National Institute for Standards and Technology observes that rates of false positives for Asian and African American faces relative to images of Caucasians can range from a factor of 10 to 100 times (NIST 2019). Alternatively, if ID.me databases are breached, the taxpayer might be subject to identity theft at highest level since one cannot change their biometric identifiers (Buolamwini 2022). Although this arrangement between IRS and ID.me was put on hold after advocacy groups pushed back, the vendor still has contracts across multiple jurisdictions as a public entity partner to verify unemployment insurance applications and still impacting millions of individuals (ACLU 2022a; Metz 2021).

The second emerging issue is the ability of the private vendors to hide behind IP protections. In the absence of a regulation or a contractual requirement which mandates disclosure, a private company does not have any incentive to share its design decisions or code with any actor. This makes it impossible to analyze how these systems work, audit their validity, reliability, or accuracy, or have an ongoing debate about whether they should be in use. Busuioc, analyzing limitations of algorithmic systems and the implications such limitations pose for public accountability, calls on the emerging accountability gap. Busuioc, referring to Pasquale’s work, highlights how he traces ‘a shift in this context from “legitimacy‐via‐transparency to reassurance‐via‐secrecy”” (Busuioc 2021; Pasquale 2011).

As Moss et al. argue even “voluntary commitments to auditing and transparency do not constitute accountability… [as] they do not meet the standard of accountability to an external forum (Moss et al. 2021).” Currently, most of the public entities are not subject to any governance mechanisms which requires a transparent internal and external management of all AI systems used by an entity. Although several policy examples are emerging globally (Central Digital and Data Office 2021; City of Amsterdam 2020; City of New York 2020; Executive Order 2020; Government of Canada 2020; Government of New Zealand 2020; L’Assemblée nationale, 2016; Seattle, 2017; UK Office for AI 2020) in most cases even a public entity itself does not have a full picture of its entanglement with a private AI vendor. For example, even a city level law enforcement entity may not know exactly all the systems used across its different departments, how data is integrated, how the outcomes are sha** their practices and policy. This makes it hard for public and civil society to engage with the right partners, find information and hold anyone accountable. In his definition of accountability, Bovens requires five integral parts: (1) an actor, (2) a forum, (3) a relationship between the two, in which the (4)actor is obliged to explain and justify its conduct, the forum can pose questions and pass judgement, and the actor might face (5) consequences (Bovens 2007). In a situation where the actor(s) cannot be properly assigned due to distributed and transferred responsibilities, and vendors are not obliged to explain the behavior of their AI systems, it becomes extremely hard to assign any accountability and consequences when AI systems harm individuals or groups, or infringe upon human rights. In their 2017 article, Kroll et al. write “accountability mechanisms and legal standards that govern decision processes have not kept pace with technology” (Kroll et al. 2017).

9 Limitations and future research directions

There are several limitations to this research work. The first one refers to the information availability and asymmetry. The research is limited to publicly available documents such as academic literature, government reports and registries, investigative journalism, litigation text, and private discussions with practitioners. Both the public actor procuring the AI systems and the vendor develo** an AI system currently contribute to the unavailability of easily accessible information. The public actor may have an interest in kee** the details of its intelligence or enforcement systems behind a wall of protections. This interest might drive from legitimate concerns about counter actions and possibility for malicious actors to game the system (Veale et al. 2018). Alternatively, the agency itself might not have access to proprietary algorithms due to prior contractual commitments, or current exemptions in procurement regulation. The vendor, on the other hand, contributes to the information asymmetry by benefiting from legal protections for trade secrets (Katyal 2019). The vendor might also be concerned about liability or an employee backlash it might receive if the details of its cooperation with government was to become public (Campbell 2018; Shane and Wakabayashi 2018).

A different set of limitations relate to reproducibility and replicability of the outcomes of AI systems. Even with full access to these machine-learning systems and technical literacy, it might still be impossible to trace back a particular decision of the AI system and reproduce the exact same result. This creates a situation where an individual whose rights are infringed (or an entity acting on behalf of the individual) may not be able to trace back or replicate the discriminatory decisions.

Another limitation refers to the incentives embedded into the design, development, procurement, and implementation of AI systems. Both the public actor and the vendor can state what problem(s) they are solving with the AI system. However, such public statements or disclosures usually do not include the organizational incentives impacting decisions. Developers or procurement officials may be incentivized to complete due diligence in less time, spending less resources, or in ways possibly contradicting with responsible design and development, or in-depth due diligence.

This research mapped out the challenges in procurement of AI systems by public entities and the long-term implications necessitating AI-specific procurement guidelines and processes. Future research can provide an analysis of benefits and limitations of transparency, especially in the form of public disclosures. How do public disclosures contribute to the governance of AI systems? What are limitations of disclosures? Can emerging technology be used in new ways to contribute to meaningful participation of society in the debates impacted fundamental rights and due process? This kind of inquiry and in-depth analysis can be replicated across many jurisdictions globally as every country has different procurement regulations, infrastructure and governance mechanisms.

In any procurement environment, humans ultimately conduct the due diligence and review the available documents and make judgements. Any transparency method, whether in the form of disclosures, datasheets, explainability reports, or notices needs to be understood accurately by its audience. This means capacity, capability, and perceptions of public procurement officials play a crucial role. A robust literature analyzes human biases, communication approaches and requirements for different types of explanations of AI systems. However, future research can also focus on the sense-making and perception of public officials operating within the confines of government bureaucracy and politics and how they judge current AI principles and the social value of the systems they are involved in procuring.

10 Conclusion: recommendations for public agencies

As this paper mapped out the challenges and risks in procurement of AI systems by public entities and the long-term implications, recommendations for what an AI-specific public procurement guideline and process should include is also necessary.

The issue at hand is not a single AI system or device, but the whole ecosystem. AI-specific procurement guidelines and governance must be applicable to devices used by the public entity and its agents; to externally collected and acquired data; to the AI software fed by these datasets; to AI platforms which connect disparate software; to the cloud-based hosting infrastructures. Already public agencies are moving their data and communications infrastructures to cloud-based hosting systems. These systems are owned by a handful of major technology companies. This creates an inevitable dependency on private vendors. Public sector will not be able to sustain its own infrastructure. If not governed with public interest and fundamental rights in mind, this eventual entanglement will mean some vendors will become too big to fail. They will be too powerful and will set the terms of the engagement. The situation becomes more concerning when vendors are involved in very high-stake decisions like law enforcement, border management, intelligence, or health and benefit systems. Even if the public entity is interested in severing its relationship, as exampled in the Europol Analysis System case (European Parliament 2020), or if the system is not working as expected, the entity might not be able to easily terminate its contract and disentangle itself from the relationship.

If private AI systems are deployed within public sector, human rights rule of law, and commitment to principles of fairness, accountability, and transparency must be required. Otherwise, public actors will have embedded systems without an independent capability to maintain these systems, or skills to monitor system’s performance. Alternative oversight and accountability mechanisms will also not be available due to the initial lack of transparency or subcontracting arrangements. A note of caution is necessary here. Most of the issues explained above about corporate AI systems is also applicable to systems built inhouse by public entities. These systems are still sociotechnical systems. The motives and values of the developers and the institution will still be embedded in these AI systems. The need for governance mechanisms and accountability structures is still there. Therefore, the solution to the corporate entanglement and dependency is not building these systems internally. Obligations and documentation within an AI-specific procurement process must be applicable for both external and in-house development cases. A recent case in point was a lawsuit claiming Immigration and Customs Enforcements (ICE) created a “secret no-release policy” and manipulated the risk assessment algorithm to recommend only one decision. Velesaca v. Decker case challenged the automatic and indefinite incarceration virtually all of the thousands of people ICE arrested between 2017 and 2020 for alleged immigration offenses. The algorithm used to recommend an arrestee be released or detained until a hearing was changed in 2015 and again in 2017, removing the ability to recommend release, even for arrestees who posed no threat (Robertson 2020). The detainees were not subject to due process and never had any change at recourse. The settlement in the case in March 2022 secures the right to a fair release assessment for everyone arrested by ICE in New York (ACLU 2022b; Velesaca v Decker 2020). The example of how a risk profiling system forced the Dutch government to resign should be a reminder for all public entities. Systeem Risico Indicatie, SyRi, an algorithm used by Dutch government to detect possible social welfare fraud, was found to be discriminatory against people with dual nationality and low income. The authorities started claiming back benefits from families who were flagged by the system, without proof they had committed such fraud. The claims pushed tens of thousands of families to poverty and separated more than a thousand children from their families into foster care. Some victims committed suicide (Heikkila 2022). District Court of Hague found that “under article 8 of the ECHR, the Netherlands did not strike a fair balance between privacy and the benefits of the use of new technologies to prevent and combat fraud because Syri was “insufficiently clear and verifiable” (Court of Hague 2020). A parliamentary report into the childcare benefits scandal found institutional bias and authorities hiding information or misleading the Parliament about the facts (Dutch Parliament 2020). In response, Dutch parliament adopted a motion in April 2022 to make it mandatory to conduct human rights impact assessment before using algorithms when algorithms are used to make evaluations or decisions about people, and where possible, to make impact assessments public (Dutch Parliament 2022). In May 2022, The Netherlands Court of Audit found that six out of nine algorithms it audited did not meet basic requirements and exposed the government to various risks: from inadequate control over the algorithm’s performance and impact to bias, data leaks and unauthorized access (Netherlands Court of Audit 2022).

Another requirement in the public procurement process is to ensure whether an AI system is the right solution to a need or problem. We need to be aware of techno-solutionism and focus on the structural causes of an issue, not the parts of the issue we can collect data about and patch algorithmic systems over them. Such a determination must be made by engaging, internally and externally, multidisciplinary public officials and impacted communities in the decisions (Hickok, 2021). Voices of the impacted communities must be heard and respected. The obstacles preventing them from participating in such engagements must be removed. Especially for cases where a system makes determinations about a person’s life and liberty, ability to practice fundamental rights, or access to resources, impact assessments and documentation must be mandated. In parallel, public entities must engage with impacted communities and civil society in a transparent, multi-stakeholder manner which respects participation parity, to agree on AI-specific procurement guidelines, reporting and disclosure requirements.

Public must have access to relevant information in a way that facilitates meaningful engagement. In October 2021, Eric Lander and Dr Alondra Nelson, by-then White House Office of Science and Technology Policy Director and Deputy Director, stated ‘Powerful technologies should be required to respect our democratic values and abide by the central tenet that everyone should be treated fairly…country [US] should clarify the rights and freedoms we expect data-driven technologies to respect…enumerating the rights is just a first step. Possibilities include the federal government refusing to buy software or technology products that fail to respect these rights, requiring federal contractors to use technologies that adhere to this “bill of rights,” or adopting new laws and regulations to fill gaps. States might choose to adopt similar practices.’ (Lander and Nelson 2021). In the same way, where decisions have serious implications for individuals, algorithms can neither be secret (proprietary) nor uninterpretable (Busuioc 2021; Rudin 2019). AI systems are developed by humans; however, these systems are usually mistakenly perceived as independent, objective, unquestionable technologies. Therefore, the outcomes of these systems should not be used as substitute to other steps in a due process. Both the public and private actors must be held accountable for decisions and outcomes. Procurement, development, and implementation must be subject to robust governance and enforcement mechanisms. These mechanisms necessitate both initial internal capacity building and an ongoing capacity enhancement as the science and technology advances. Data generated, collected, processed, and used by humans can never be bias free. An appreciation of such fact, an understanding of the risks specific to AI systems and their socio-technical aspects should make public actors pay even more attention to due diligence. Procurement regulations should be updated to include obligations for the developers to share details of data qualities, model decision decisions, optimization techniques and processes when required by the public entity. Additionally, procurement guidelines should require a capable internal workforce to be in place before a procurement decision is made for an algorithmic system.

A functioning democracy in which both fundamental rights and rule of law are prioritized, society first needs and agreement, a social contract, on what kind of systems should be allowed or banned. As Gabriela Ramos, Assistant Director-General for the Social and Human Sciences of UNESCO, suggests ‘AI technologies can be used to strengthen government accountability and can produce many benefits for democratic action, participation, and pluralism, making democracy more direct and responsive. However, [such technologies] can also be used to strengthen repressive capabilities and for manipulation purposes’ (Ramos 2022). An engaging public debate and discourse should result in a basic agreement about which systems should be prioritized and which systems should never be implemented.