Introduction

The EU Falsified Medicines Directive [1] “introduces harmonised European measures to fight medicine falsifications and ensure that medicines are safe and that the trade in medicines is rigorously controlled”. Such obligatory safety features, legal framework, and record-kee** requirements have arguably imposed stricter controls for the manufacturing of medicines.Footnote 1 While the pharmaceuticalFootnote 2 industry has consistently improved its manufacturing processes in compliance with good manufacturing practices, and it is well documented that falsification of medicines continues [2] and has led to disastrous consequences worldwide [3]. Consequently, different organisations including the World Health Organization have long called for distinct remediation strategies [4].

There have been a number of technology-centric approaches to hinder the proliferation of fake medicines [5] including mobile apps for drug authentication and tracking, packaging enhancements such as digital tagging, radio-frequency identification, and web portals to verify pharmacies. However, such approaches have mostly concentrated on the distribution and packaging of medicines rather than on the quality and reliability of their production. That is to say, scant research has been conducted on organically ensuring the traceability and originality of medicines from the perspective of their manufacturing data.

From an industrial perspective, the current gold standard for data management is comprised in the FDA’s “Data Integrity and Compliance with current Good Manufacturing Practices” [6], where the term “ALCOA+” is defined. As a set of principles that should be followed throughout the data life cycle for achieving data integrity, ALCOA+ states that data should be Attributable, Legible, Contemporaneous, Original, and Accurate, along with complete, consistent, enduring, and available, the “(+)” side.

This paper advocates the use of blockchain networks to better ascribe and ensure the manufacturing originality of drugs as comprised in the O of ALCOA+, which establishes the Original principle, as the assurance of data origin as the primary source. While blockchains are well-established in the cryptocurrency domain, their systematic application in the pharma industry remains an open problem, particularly from a regulatory perspective.

The main contributions of this paper are: \(\imath )\) demonstrate the feasibility of using blockchain technologies to univocally assess originality—as defined by ALCOA+via raw data from pharma manufacturing batch records; and, \(\imath \imath )\) systematically ensure traceability and end-to-end verification in a scalable manner within the drug manufacturing process.

The proposed method includes a private Ethereum network with proof-of-authority (PoA) consensus and smart contracts, privacy-preserving verifiable programme stored on a blockchain that automatically enforce its components without the assistance of a trusted authority. Specifically, our smart contracts programmatically allocate the hash and the identifier of pharma records in a blockchain. They also enforce the originality principle by comparing the data committed to our blockchain network, e.g., a report that has the same identifier holding a different hash ought to be detected as unoriginal. That is to say, since the logged data and the business logic are tamper-proof and blockchain-embedded, they detect the originality infringement and identify the corresponding type of falsification. Running our ALCOA+ front-end in Valencia and the Ethereum network in Dublin, we have evaluated our approach employing a temporal data series of 1300 reports generated based on real pharma production lines. Out of these reports, 300 were randomly tampered with, i.e., made unoriginal (i.e., falsified) with four common data falsification types being randomly applied to each falsified report.

The results of the ALCOA+ evaluation show that our approach accurately detects all the manufacturing records whether original or not and, for those unoriginal, it also explains their source of falsification. It is also noted that this empirical evaluation has also studied the latency and performance implications of the geographically distributed system, as the assessment tool front-end has been executed in Valencia and the blockchain backend in Dublin.

This paper is organised as follows. In “Related Work” section, we describe the relevant related work for data integrity applications within the pharma industry. Then, in “Originality Assessment Approach” section, we introduce the proposed approach for assessing the originality of drug manufacturing records through blockchain technologies. In “Evaluation and Results” section, we present our evaluation methodology and the empirical evaluation results of our originality assessment approach and corresponding tool. Finally, in “Conclusions” section, we provide the concluding remarks of this work.

Related Work

The World Economic Forum (WEF) has recently coined the term TradeTech to denote the technologies and innovations that enable trade to be more efficient, inclusive and equitable, and ultimately harness the innovations of the Industry 4.0 technologies to support the public good [7]. More specifically, the WEF suggests the continual improvement and optimisation of manufacturing processes by relying on digital sensing, machine learning, and blockchain, among other technologies. While sensors have been widely employed in manufacturing for well over 2 decades [8], their enhanced computing capabilities and connectivity and their full integration with other technologies continue to be an active area of research and innovation.

The use of blockchain for data integrity assurance is still in its infancy; however, its potential is widely accepted to augment the capabilities of dynamic distributed network environments. Blockchain technologies can securely enable storage, sharing, and data analytics in data-driven computer networks while preserving user privacy, trustworthy network control, and decentralised resource management [9]. Such a network-oriented approach can be useful for fraud prevention and for real-time analytics with significant impact on data integrity. Relevant examples of blockchain for data security in distributed networks have also been provided in Kumar et al. [16].

Regulatory bodies are continually increasing their auditing requirements from traditional manual exercises to automated ALCOA+ trails, where detailed transactions capture complete traces of sensor and production lines to ultimately track the fabrication of pharmaceutical drugs at scale. This is not a new problem to solve. Rattan [17] discusses the large amount of regulations that, since 1963, have been put forward by several regulatory bodies. To assess data integrity, their proposed methods have initially been simple checklists, self-audits, and self-inspection techniques—mostly focused on self-prevention for random audit trails—instead of continuous monitoring and evaluation of manufacturing data.

More recently, regulatory bodies have identified the need to ensure data integrity in “The GAMP Guide: Records and Data Integrity” [18]. It comprises guidelines for the implementation and management of good practices (GxP)-regulated records and data, where the GxP are the accepted gold standard and framework for all automated manufacturing industries. It also provides a framework for regulatory focus, data governance, data life cycle, culture and human factors, and the application of quality risk management to data integrity. Pharma manufacturing industries are mostly automated, which implies full compliance with GxP and a large amount of data continuously generated.

Efficiently addressing ALCOA+ requirements arguably calls for novel technological solutions to dynamically analyse all generated data to assist qualified personnel and auditors to trace the originality of medicines.

Emerging domains, such as Pharma 4.0, are facing privacy risks and security vulnerabilities. Since manufacturing lines involve more connected devices in the information network, it becomes a big data reality with a decentralised topology. Therefore, such heterogeneity requires quality assurance as well as security mechanisms to prevent attacks or data threats. In this context, blockchain has been proposed as a solution for these scenarios [19, 20].

Widely adumbrated as immutable time-stamped data structures, blockchains are built as peer-to-peer networks where participant nodes verify interactions concurrently using decentralised consensus protocols. Data is stored in blocks that are “chained”, i.e., each block knows the hash of the previous block, thus creating a ledger. Because of its tamper-proof characteristics, blockchain has emerged as a solution to enable secure traceability of information and is therefore used to ensure information provenance in multiple domains. Its scalability and resilience have proven to be efficient, particularly in conjunction with PoA consensus mechanisms for the generation of new blocks [21].

There is scant research concerning the application of blockchain technologies in the pharma industry to assess data originality in its manufacturing lines. However, general big data projects [9, 22] including smart cities [31]. Developed by Modum.io and the University of Zürich, this ledger-based solution has been documented as a pioneering option for the safe transportation of controlled substances and, potentially, any temperature-controlled goods.

As a governance mechanism, blockchain has also been proposed to keep track of the end-to-end supply chain and exchange of medicament using cryptocurrency in Taiwan [32]. Such an approach promotes a safe commercial exchange for government agencies, manufacturers, pharmacies, large wholesalers, hospitals, and potentially patients. A similar approach has been proposed for the detection of counterfeited medicines [33]. Finally, a distributed ledger approach has been put forward to facilitate the return and redistribution of unused drugs in multi-level supply chains, a highly controversial matter in most countries due to the associated health risks [34]. Blockchain technologies have also been trialled to trace the serialisation packaging process at an Italian pharma manufacturer [35], mostly linking the marking of each medicine box and its unique identifier with the existing enterprise resource planning system.

Contribution

Table 1 includes a comparison concerning pharma-related blockchain applications. To the best of our knowledge, there is not a comprehensive technological approach to ensure traceability, authenticity, security, and other data quality dimensions [36], sine qua non in pharma manufacturing. Therefore, this paper proposes the use of blockchain networks for reinforcing data integrity, specifically assessing data originality.

Table 1 Comparison of blockchain applications in pharma

Originality Assessment Approach

Ensuring the authenticity and traceability of the pharma manufacturing processes should arguably be addressed by the preservation of original records and comparison of any changes. Due to the intrinsic (perceived) simplicity of the ALCOA+ originality principle, it has been a challenge to ensure the authenticity of an original batch manufacturing report, particularly when data sources are fully automated, multi-device production lines within large pharma facilities.

The proposed approach is hence to verify the originality by assessing the data using a blockchain network. In this context, we describe (i) a pharma-related use case where we are applying the assessment tool, (ii) the blockchain network, and (iii) the implementation of our approach.

Use Case: Pharma Manufacturing Records

The SPuMoNI consortium includes a leading pharma manufacturing partner, Instituto De Angeli (IDA). IDA produces more than 2000 drug batches annually in their Italian facilities. Their production lines generate a large number of datasets and data streams from real production lines of different unit operations such as milling, granulation, coating, and tablet pressing.

During the manufacturing of a specific product, the production lines execute a set of steps based on a Recipe. A Recipe is the protocol that describes in detail the fabrication process of a certain product. It is composed of a set of Phases, and each Phase is formed from a set of Instructions. An Instruction is a single action implemented within the manufacturing process. There are various types of Instructions, such as setting a mixing machine, verifying the quantities of raw materials, transferring the product to the next step, checking the cleaning stage of a particular robot, etc. Complementing the action to be executed, each Instruction serves as a checkpoint within the manufacturing process. A Recipe may have some variations and/or updates; therefore, each Recipe also has an associated Recipe Version.

A Recipe describes the tasks that must be accomplished, including the optimal value and the range of acceptable values for each of the parameters. However, the production line routine may have deviations that must be controlled. Hence, all production lines are monitored by a network of sensors to not only automate the production but also to control the quality of the process. When an Instruction is executed, depending on its nature, some parameters must be checked, such as the temperature of some fusion, the mixing speed, the number of cycles, the amount of product processed, or to check that the Instruction has been completed according to the expectations. The sensors record data directly from the production lines including the start and finish date and time of each Instruction. These data must be collected, as well as the information about the person in charge of the Instruction. We explore the feasibility of tracking and ensuring the originality in pharmaceutical manufacturing by applying the proposed approach to the data recorded by these networks.

These production raw data are structured and organised as Reports. A Report contains all data related to the production of a single batch of a particular drug, and therefore, it includes all information that would be reviewed in an audit trail. At the main level of a Report, the attributes are related to the batch information, such as the batch code, the Recipe code, its version, and the Qualified Person. The Qualified Person is responsible for assuring the quality of medicines available on the market [37]; hence, this person reviews not only the overall batch production which is recorded in the Report but also the compliance of ALCOA+ principles. Moreover, a Report contains a list of used materials as well as the data recorded by sensors. It follows the Recipe structure: (i) a list of Phases that contains a set of Instructions; (ii) each Instruction item includes a list of parameters to be controlled (as indicated in the Recipe) and the data recorded during the process. Figure 1 is a representation of the process from the Recipe protocol to getting the Report object.

Fig. 1
figure 1

Graphical representation of the workflow of manufacturing data. A Recipe includes Phases and Instructions. Each Instruction integrates the parameters to be controlled and monitored. The production line employs the required materials to execute the multiple phases. Finally, the Report is generated incorporating data related to the production. While orange boxes represent data from the manufacturing process, blue boxes illustrate the Recipe structure. The Report includes (i) Batch information (i.e., Batch code, Recipe code and its versions, and the Qualified Person responsible for the batch production), (ii) Used Materials, and (iii) the sensors data, i.e., date and time and the staff in charge

From the regulatory point of view, an audit trail evaluates random batch manufacturing records. These audits review all data generated during the process, whereby the sensor data are needed not only for quality control, but also for regulatory compliance. In this context, ensuring the originality of batch records may provide significant support for pharma manufacturing industries.

Blockchain Network in SPuMoNI

Blockchain has bee henticity, immutability, and transparency using decentralised environments. Decentralisation prevents single authorities from controlling the data. In this context, blockchain employs a distributed architecture eliminating centralised authorities and using immutability to prevent the alteration of past records. Moreover, by using Ethereum private networks and gas amounts, we have been able to empirically establish the feasibility to quantitatively marshal service levels and their associated quality of service for blockchain networks [38].

It is therefore possible to have an end-to-end verification of any process and, consequently, consistent verification of the corresponding data. In this scenario, we have adopted blockchain technologies to confidently ensure the originality of ALCOA+ principles taking advantage of such intrinsic properties. Specifically, our blockchain module involves (i) a private Ethereum network, (ii) PoA as consensus algorithm, and (iii) smart contracts. Figure 2 illustrates the behaviour of the network as transactions are received.

Fig. 2
figure 2

Private Ethereum network configured with proof-of-authority consensus algorithm. The transactions are committed into blocks and mined by validator nodes. Once approved, the block is added to the chain. Currently, our private Ethereum network is composed of two validator nodes. The Originality Smart Contracts structure the data and evaluate the data originality

Private Ethereum Network

Ethereum is one of the most popular blockchain technologies allowing us to implement decentralised and transaction-based systems. In addition, an Ethereum network supports smart contracts which are immutable pieces of code that enable to make intelligent decisions. Each Ethereum action requires an amount of gas, i.e., a computational fee required to perform specific Ethereum transactions. Our network is composed of two nodes working as miners which will receive and validate the transactions to be added into the blocks using the PoA consensus algorithm. The PoA consensus algorithm requires at least two nodes as miners to start chaining blocks in the network. Ethereum provides a JavaScript Object Notation (JSON) with Remote Procedure Call (RPC), i.e., JSON-RPC, which allows a front-end or an application to communicate with the Ethereum network. While JSON allows us to exchange data between a browser and a server, RPC allows us to perform requests in a network. Therefore, JSON-RPC defines the data structure, methods, and rules to communicate with the network.

Consensus Algorithm

It is used to achieve agreement on data transactions in a distributed network ensuring that the next block to be added in a blockchain is unique and reliable. Our private Ethereum network relies on PoA, where miner nodes that work as validators are aware of all identities using a reputation-based approach. A node is able to validate transactions if at least \(\frac{N}{2}{+}{1}\) network nodes have previously identified it as an honest node (where N represents the number of trusted nodes). Specifically, PoA works in our private Ethereum network as follows:

  • Each validator holds a fixed time slot to validate blocks. During that time slot, the corresponding node is the network leader.

  • Each node is enabled to validate transactions every \(\frac{N}{2}{+}{1}\) blocks (with a mining frequency of \(\frac{1}{\frac{N}{2}{+}{1}}\)).

  • A maximum of \({N}{-}(\frac{N}{2}{+}{1})\) nodes are allowed to propose blocks in the same time slot. When \(N=2\), there are no simultaneous nodes validating blocks, just the leader. When N is greater than 2, multiple nodes can propose blocks within the same time slot as the leader, e.g., with \(N=16\), 7 nodes are allowed to validate blocks at the same time. If one node is down, the remaining network participants, which are able to mine transactions, will validate all transactions submitted to the network.

  • The GHOST protocol [39] is applied if multiple nodes are validating the same transactions, simultaneously. This protocol privileges the leader.

  • The nodes that constantly propose invalid transactions reduce their reputation and, consequently, can be excluded from the list of reliable validators.

Our PoA configuration requires a “master” miner responsible for adding new miners kee** the blockchain network fully private. Therefore, being hosted and managed by a private entity, it prevents dishonest nodes from participating in the network and, consequently, avoids potential security related attacks.

Smart Contracts

The implemented private Ethereum network enables the deployment of smart contracts incorporating a dedicated data structure to manipulate transactions in the distributed network. When data is centrally stored, it can be easily manipulated to meet hidden interests. Given their tamper-proof nature, smart contracts aim to ensure complete data authenticity, i.e., avoid the manipulation from unethical stakeholders as well as user and data provenance. The proposed solution uses smart contracts to structure the data in the blockchain network and store the hash and Identifier values of pharmaceutical reports. Specifically, the smart contracts support a collection of transactions to access the stored information and assess the originality of the reports. Therefore, this work relies on smart contracts to evaluate the Original principle of ALCOA+ and, consequently, the integrity of data produced by pharmaceutical manufacturing lines. This originality is assessed by the smart contracts comparing the data stored in the blockchain network in the form of hashes and Identifiers. Thus, if a report with the same identifier is submitted into the network holding a different hash, the smart contract detects that report as being not original.

Implementation

The originality assessment tool aims to ensure the authenticity of data acquired and generated in pharma manufacturing lines. Towards this purpose, the tool is supported by blockchain infrastructure and provides a dashboard as the user interface.

Blockchain Infrastructure

Blockchain infrastructure is composed of a private Ethereum network that uses Go-ethereumFootnote 3 (Geth) as the Ethereum client and SolidityFootnote 4 as a smart contract language. The private Ethereum network was configured with a block period of 2 s to improve the network performance [40]. This network provides cost-free processing, i.e., the transactions are submitted using 0 as the gas price. Each OpenStack instance has 16 GiB RAM, 8 CPU, and 160 GiB of HDD. The originality assessment tool uses 6.5 MB of storage and 1000 reports use 35 MB of JSON files and 139 MB of database information.

The assessment is based on uploading a new batch record to the Ethereum network as a smart contract and verifying the uniqueness of all its information. The originality assessment is achieved by the verification of data authenticity, evaluating if the batch record has been corrupted. This software is designed for a direct interaction with the production line database. However, it is also possible to manually upload new batch reports. The workflow of the originality assessment tool is described in Fig. 3.

Fig. 3
figure 3

Originality Assessment Workflow. The process starts by the input of a new report, likewise production line upload, or manually uploaded report by a user (here is depicted a generic user, which can be, for example, an auditor or any authorised person). The new batch is assessed by the Originality Smart Contract in the Ethereum network. The originality assessment is performed based on the previously uploaded reports on the Ethereum network, evaluating the uniqueness of the new data by comparing it with the existing stored information. The results of the assessment are visualised on the originality assessment tool, where the user can explore the reports, see whether they are original or not, and each report that is not original is provided a trace that includes the source of data falsification. This workflow schema is proposed for a distributed system; hence, the different components are labelled in red, as general locations

When a new batch is manufactured, its data is uploaded using the JSON format, and then it is parsed into a Report. After that, the Report hash is calculated by converting the Report into a map (or Python dictionary in this case), which will be converted into a String. Then, this string which contains the Report information is transformed into a bytes array when using the SHA256 algorithm and converted into a hash. In addition, the Report Identifier is also calculated, which is composed using a combination of batch code, order code, recipe code, recipe version code, and product code. The Identifier and the hash are uploaded into the Originality Smart Contract instance allocated in the Ethereum network. In this function, the Report revision number is also computed, which is calculated by checking if another report with the same identification data is already in the Ethereum network.

To obtain the Originality results, the following phases are verified: (i) if the Identifier is unique, the Report is considered Original; (ii) if the Identifier is not unique, but the hash is equal to the hash of a previously uploaded Report (i.e. with the very same Identifier), the current Report will be considered an Original Report—this case occurs when the very same Report is uploaded twice; and (iii) when the Identifier is not unique and the hash is distinct from the hash of a previously uploaded Report (i.e. with the very same Identifier), then the current Report is considered Non-original, and differences between these two versions are calculated and provided. The complete Identifier combines the Identifier with the actual Revision number. The Revision number is 0 when the Report is original; otherwise, it is a different version. Figure 4 summarises our approach for the originality assessment process.

Fig. 4
figure 4

Proposed Originality Assessment Process. The orange box represents the originality assessment tool, where Identifier &hash are calculated and the originality-related results (i.e., whether the report is original or not) are computed. The green box represents the Originality Smart Contract, where Identifier &hash are uploaded to the Ethereum network and the originality assessment is performed. The blue workflow illustrates the Report uploading, and the grey workflow represents the originality assessment. Also, the Revision number update is represented in a blue box, when the result of the assessment is “Non-original”

Dashboard

For an easy interaction and comprehension of the results, the Originality Assessment tool has a user interface built using the Django framework and Python 3, in addition to the Bootstrap 4 web front-end framework. The main dashboard screen contains a summary table of the assessed reports including the batch code, the Recipe executed, the version of the Recipe, the Revision number and the originality assessment results. The tool also provides in-depth detailed information at report level, namely a user can explore the interactive view for a single Originality assessment result. Each batch information is represented in a table that includes (i) the batch code, (ii) the Recipe executed, (iii) the version of the Recipe, (iv) the Revision number, (v) the qualified person, and (vi) a breakdown of the assessment results. Additionally, the Report identifier, the Report version number, and the data trace (which identifies data disagreements between the current Report and its previous version) are included. Some examples of this interface are included in the “Results” section.

Evaluation and Results

The originality assessment tool provides support for assessing and ensuring data originality in pharmaceutical manufacturing. A critical measure of the effectiveness of the approach is whether the originality assessment tool is capable of accurately detecting manufacturing records that are not original. In addition, another assessment of the usefulness of the approach is whether it correctly identifies the root cause (i.e., source of falsification) of the non-original records. To empirically evaluate our approach, we have designed an evaluation methodology which allows us to measure the originality assessment accuracy as well as the system performance.

Evaluation Setup and Methodology

To evaluate our approach, we will upload a set of batch reports where a subset of these are randomly falsified. We have defined five types of data falsification based on typical situations that should be detected in a real-world scenario: (i) adulteration of the qualified person, (ii) adulteration of staff in charge of an instruction, (iii) adulteration of date and time of recording data, (iv) adulteration of the value of some sensor data, and (v) a combination of the former.

Due to privacy requirements of the Fareva-IDA facilities, we have employed data fabrication techniques to generate realistic temporal series for 1.000 Reports based on real raw data from their production lines, which account for some six months of their manufacturing operations. In addition, 300 reports were randomly selected to be generated as non-original, i.e., falsified. This set of non-original Reports may have one (scenarios i–iv) or multiple types (scenario v) of the data falsification scenarios defined above. The number and types of individual data falsifications applied in the fifth scenario to each non-original Report were randomly selected. The purpose of generating this random dataset of Reports was to simulate a real-world situation where data falsification occurs, i.e., without knowing which data has been falsified by design. It will allow us to evaluate our tool’s capability to detect the source of such falsifications.

Furthermore, the uploading and assessing time for each Report also has been measured, in order to characterise the overall distributed system performance as the Ethereum network is located at the NCI Cloud Competency Centre in Dublin and the Originality Assessment tool runs at the UPV facilities in Valencia. This distributed environment has allowed us to study the latency in the originality assessment process. Since the running time is recorded for each Report, we have also compared the times of evaluating the original and non-original Reports, distinguishing between the five (i–v) types of falsification, with the purpose of detecting potential performance variability due to the data falsification.

Results

Table 2 summarises these results and shows that the originality assessment tool has accurately detected as falsified reports all the Reports that were randomly altered, i.e., falsified (as compared to their originals). Therefore, our approach shows 100% accuracy in detecting non-original data. Furthermore, the tool has successfully identified, for each report, the source of data falsification.

Table 2 Confusion matrix for the originality assessment results of 1300 Reports evaluated

Dashboard Results Visualisation

This subsection shows some examples of result visualisation on the developed dashboard. Figure 5 shows the main menu, which presents a sample list of the assessed reports together with a results summary. The non-original reports are highlighted in orange.

Fig. 5
figure 5

Dashboard of the Originality Assessment tool. The dashboard provides a list of assessed reports. The reports identified as having originality-related issues are highlighted in orange. The columns represent the batch code, the Recipe executed, the version of the Recipe, the Qualified Person responsible for the batch data, the Revision number, and the original assessment result

Once the user clicks in one of the assessed reports of Fig. 5, the dashboard shows the detailed results of its assessment. We present an example of this interface for assessment visualisation for Scenario 5: Multiple sources of data falsification in Fig. 6. For other examples of scenarios 1 to 4, please see the supplementary material.

Fig. 6
figure 6

Multiple data modifications: Batch 0031. This figure shows a comparison between the originality assessment results of Batch 0031 Revision Number 0 (a) and Batch 0031 Revision Number 1 (b), where result a is “Original” and result b is “Non-original”. There are five types of data adulteration: (1) the qualified person has changed from Data Responsible 1 (DR1) to Data Responsible 4 (DR4); (2) in the Instruction 0 of Phase 0, the staff has been changed from Mr. Smith to Mrs. Elisabeth; (3) in the Instruction 0 of Phase 0, the recording date and time has been increased by 5 s; (4) in the Instruction 1 of Phase 0, the staff has been changed from Mr. Lopez to Mrs. Elisabeth; and, finally, (5) in the Instruction 1 of Phase 0, the original Speed data has been increased by 5 rpm. These data adulterations are summarised in the Revision 1 trace (b)

Performance Characterisation

The mean time for uploading and assessing 1300 Reports was 8.00 s with a standard deviation of 0.221 (normal distribution, Shapiro-Wilk normality test, \(\alpha = 0.05\)).

To analyse potential discrepancies in the performance results between all five falsification scenarios and the original Reports, we have applied a one-way ANOVA test followed by a Tukey Honest Significant Differences post hoc test.

Statistically, there was no significant difference among the analysed scenarios (\(\alpha = 0.05\)). As such, there is no evidence of any effect of data falsification on the performance of the proposed original assessment tool. Figure 7 illustrates the six box plots for each group of Reports.

Fig. 7
figure 7

System performance pertaining to the experiments included in Table 2: original reports, scenario 1, scenario 2, scenario 3, scenario 4, and scenario 5

Conclusions

The pharmaceutical industry is a data-intensive domain. Its manufacturing lines continuously generate large amounts of data that must be collected and have to be ALCOA+ compliant. However, the risk of negligent or non-intentional falsification is high in pharma environments. In this context, the pharmaceutical industry requires effective solutions to improve its manufacturing process in terms of ALCOA+ compliance. Blockchain, together with smart contracts, has shown to be a promising technology concerning data authenticity. Towards this scenario, we propose a novel blockchain-based approach for assessing originality (i.e., the “O” in the ALCOA+). The proposed method is composed of a private Ethereum network incorporating smart contracts to detect data falsifications.

The proposed method has been evaluated using pharma batch records where multiple types of data falsifications were randomly applied using a geographically distributed system. The results show the feasibility of our approach to support the compliance of ALCOA+ principles, in particular, the originality principle, as our tool has accurately detected all records as to whether they are original or not. Furthermore, for the records that are not original, our approach provides a trace with the source of data falsifications. It is important to note that all experiments have been performed under a controlled scenario and with standard data; however, performance characterisation results suggest that the proposed method should be scalable for large datasets in distributed environments. To achieve a higher readiness level, an evaluation of the proposed tool in the pharma shop-floor environment is needed. Therefore, as future work, we aim to validate our originality tool in a real pharma manufacturing environment and integrate it within the SPuMoNI system.