1 Introduction

With the growth of data in the Internet of Things, the number of data requests and historical data to be processed increases, which poses a great challenge to system processing and data storage. For the network architecture of sensing nodes in the Internet of Things, a distributed data storage architecture has been proposed to store data in the local network and reduce the cost of data transmission [1]. Data exchange across the Internet of Things requires middleware architecture. The request of IoT data analysis usually needs to retrieve the data of multiple IoT systems, and the model of middleware transformation leads to low retrieval efficiency. At the same time, the storage capacity of the local network is limited, and it is difficult to meet the requirements of persistent data storage. Therefore, the centralized IoT data management system has obvious insufficiencies in data processing and storage capacity in the face of multiple IoT systems [2].

The application research of blockchain technology is widely carried out, and the prospect of the integration of the Internet of Things and blockchain is also very broad. The characteristics of blockchain decentralization and zero trust perfectly compensate for the problems of relying on third-party trust agencies and malicious nodes in centralized data management [3]. The research on data management of the Internet of Things based on blockchain has attracted wide attention. Ma et al. [4] proposed decentralized trust management and IoT big data security usage control scheme based on permission-based blockchain to achieve decentralized trust management and IoT data management based on smart contracts. Karlsson et al. [5] proposed a partition-tolerant blockchain Vegvisir, which can be used to build a shared, tamper-proof and traceable data management system in the Internet of Things environment containing battery and network restricted nodes. However, the long-term offline and asynchronous data of key nodes in this scheme will affect the reliability of the blockchain. Ren et al. [6] proposed a decision feedback proxy voting and revocation scheme based on blockchain, which made the intelligent system ignore the heterogeneity and heterogeneity in 6G technology. Using the attributes of decision-related nodes to achieve anonymous voting, improve the ability of decision-making and traffic command. The security proof of this scheme ensures the security and consistency of outsourced micro-services. Ren et al. [7] The data query model of Internet of Things based on blockchain is constructed by DCOMB method. This model combines the data stream of the Internet of Things with the timestamp of the blockchain, which improves the interoperability of data and the universality of the Internet of Things database system. Ren et al. [8] In order to strengthen the data storage security of smart home, an identity-based proxy aggregation signature scheme is proposed to improve the efficiency of signature verification, compress storage space and reduce communication bandwidth. This paper builds an LCP-Chain (Logistics cooperation platform Chain) blockchain logistics information platform based on the alliance blockchain, which is a logistics industry information platform based on blockchain technology, aiming at integrating the development resources of logistics enterprises. It provides a secure and trusted cooperation platform for the information exchange and cooperation of various logistics companies, and optimizes the design of the retrieval function and consensus mechanism of the platform. To improve the retrieval function and efficiency on the original blockchain, and realize the relational retrieval, this paper adopts the information retrieval based on dynamic information, which analyzes, reorganizes and synchronizes the blockchain data to the MySQL database in real-time. When the platform needs relational retrieval data, the MySQL database is used for data retrieval. To make the original blockchain consensus algorithm well support the Byzantine fault tolerance and increase the consensus efficiency, this paper proposes a consensus mechanism based on VRF-BFT-SmaRt, which modifies and designs the selection of endorsement nodes and consensus algorithm on the basis of the original consensus mechanism. In the next section, we consider using blockchain technology to optimize the blockchain logistics system, and Sect. 3 shows the dynamic information retrieval of logistics. In Sect. 4, we present the logistics dynamic information retrieval to improve the traceability and transparency of logistics information, promote information sharing and optimize logistics transportation.

Article Highlights

  1. (1)

    The data security model based on blockchain improves the security and credibility of logistics information sharing.

  2. (2)

    Based on data analysis, the retrieval effect of logistics transportation information is optimized.

  3. (3)

    Logistics enterprises improve data security and operational efficiency by exploring the application of blockchain technology.

2 Related work

2.1 Blockchain technology

Blockchain is essentially a distributed ledger established by all nodes on the P2P network under the consensus protocol [9]. Blockchain is built on a P2P network and contains all nodes on the P2P network [10]. The structure of the blockchain is shown in Fig. 1.

Fig. 1
figure 1

Blockchain structure

According to Fig. 1, the blockchain structure consists of a block head and a block body. The block head contains a single block, timestamp, block information and Merkle tree. The block contains Merkle tree and transaction date. Merkle tree verifies whether the data exists in the tree by comparing the hash value of the data with the hash value of the root node, thus ensuring the integrity and security of the data. The data block or its hash value is transmitted to other nodes for comparison, so that the block head and the block body can be found and connected. Blocks are composed of a series of chronological connections [11], and block information processing such as blockchain version number, hash summary of the previous block and random value Nonce is completed by timestamp. The number of transactions, menu sequence and other information are obtained through Merkle tree to complete the storage of block transaction data [12].Blockchain is built on the P2P network and exists in every node in the P2P network. In a P2P network, all nodes, which communicate with each other and have the same function, are equal in status. The shared resources set by a node can be shared by other nodes, such as computing resources, storage space resources, and network resources [13].

2.2 Logistics system based on blockchain

Logistics is a network composed of raw material suppliers, commodity manufacturers, warehousing, goods distribution centers, logistics transportation, and other links. Information flow, logistics, and capital flow are constantly flowing in this network. blockchain technology can provide a safe and effective way to share information for all parties in logistics, and it is a distributed network based on P2P technology. The characteristics of blockchain technology, such as secure sharing, decentralization, tamper-proof and data traceability, making it the first choice to establish a logistics information platform in the era of smart logistics [14].

The core technologies of blockchain systems include consensus mechanisms, security mechanisms, storage mechanisms, P2P communication mechanisms, intelligent contracts, and other core technologies. Blockchain technology has gradually become the underlying technology choice for all walks of life to build technology platforms. Different industries use blockchain technology to build systems and systems suitable for industry business development according to their own characteristics. In the theoretical research field of “logistics + blockchain”, the introduction of blockchain technology into traditional logistics requires many changes [15]. In traditional logistics, when goods are transported from one place to another, they need to go through many intermediate links, such as processing, warehousing and distribution, so the information communication cost is high and the information flow is opaque. The blockchain technology provides a trustworthy shared account book through decentralization, which can realize real-time tracking, monitoring and information sharing of goods in the whole process, making the logistics process more transparent, efficient and safe. The traditional logistics business process needs to be redesigned, and the decentralized and trustworthy characteristics of blockchain technology should be combined with the business process to realize the real-time traceability of information. While using blockchain technology, logistics enterprises need to upgrade their information systems, improve the level of intelligence, integration and informatization of the system on the premise of ensuring system security, and integrate logistics information with information on the blockchain. The transaction information involved in the logistics field is complex and varied, so it is necessary to formulate industry standards to ensure the interoperability and scalability of data on the blockchain, so that logistics information can be shared throughout the process.

In the field of blockchain logistics, blockchain technology is currently mainly used in the traceability of goods data and anti-tampering of logistics data. In the process of logistics transportation, logistics data are updated in real-time and in large quantities. It is necessary to ensure the stable and efficient operation of the blockchain logistics information system, and the rich and fast information retrieval function. This requires in-depth analysis and design of the consensus efficiency and retrieval mechanism of the blockchain system [16].

3 Logistics dynamic information retrieval

In the logistics link, individual users and enterprise users have a large number of frequent relational retrieval needs in the logistics system. The most frequent is the real-time transportation status of goods, so a safe and fast retrieval function is the focus of the platform function development. Traditional blockchain systems store data in a fixed format, and the system does not support direct retrieval of transaction details [17]. If you want to retrieve the details of the transaction, you can only retrieve the complete information of the transaction through the hash value of the transaction to retrieve the required data information from the beginning. Aiming at the above problems in the retrieval process of blockchain, this paper proposes a dynamic information retrieval [18].

In the logistics system, data can only be obtained through the terminal or the limited interface provided by the chain code, and the user can only obtain the transaction information through the transaction ID or according to the block number [19].

In many scenarios, enterprise users need to do some complex business analysis on the whole system data or do some fine-grained retrieval on transaction data [20]. However, the interface provided by the terminal does not support a series of fine granularity such as relation retrieval. The main problems in the current logistics system retrieval are as follows:

  1. 1.

    Too few data retrieval functions: at present, the terminal only provides the application programming interface for retrieving data blocks by number or retrieving transactions by identification. However, these simple application programming interfaces are not sufficient for practical data retrieval requirements, such as retrieving transactions by the defined creator, endorser, called function, channel ID, timestamp, etc. [21].

  2. 2.

    Insufficient support for relational retrieval: the data storage system used by the logistics system is LevelDB, which is an unstructured data storage system based on the Key-Value model. LevelDB only has the data insertion function and data retrieval function for Key and does not support the relationship operation of complex retrieval for blockchain data. Although chain code provides the function GetQueryResult to perform a rich retrieval using CouchDB as the world state database, this function will not work if LevelDB is used as the world state database [22]. Moreover, with the increase of stored data, it is difficult for users to find the data accurately and quickly.

4 Blockchain data retrieval

Blockchain data retrieval requires two steps. The first step is to build a filter for all retrieval indexes in each block to retrieve whether there is data related to the spatial location attribute in the block; the second step is to build a double-layer combined filter for all retrieval indexes in each block for the precise location of the index corresponding to the spatial location attribute [23]. These two layers of structures will use hash functions to construct encrypted indexes, which can quickly retrieve data during operation, and will only return wrong results with a very low probability. This error probability will be analyzed later. The retrieval operation is shown in Fig. 2.

Fig. 2
figure 2

Data retrieval process

According to Fig. 2, starting from the first block within the required time range, use a filter to retrieve the next block, and check whether there is data to be retrieved in the current block. If there is no data to be retrieved, check whether there is any obstacle, and if there is data to be retrieved, search for target data in the current block. Return the retrieval results, which can be the block where the data is located and the corresponding location, or the complete data content, etc., depending on the actual needs. Check whether there are obstacles and end the search [24]. When the IoT node uploads the verification data to the blockchain, the verification data cannot be tampered with; the data of the previous block cannot be modified at the current time, so the data of all blocks except the current block are determined and do not need to be dynamically modified, so the hash calculation server can construct a filter with optimal performance according to the determined amount of data [25].

The criteria for the filter with the best performance is the lowest probability of false normality and the lowest memory footprint. Assume that a block \(B_{i}\) is a block in the whole blockchain \(B = \left( {B_{1} ,B_{2} , \ldots ,B_{i} , \ldots ,B_{s} } \right)\), in which there is a total of ni index data determined. The length of the filter \(BF_{i}\) constructed for this block is \(m_{i}\). According to the existing situation, the false normal probability can be calculated, represented by \(P_{{f_{i} }}\), which is:

$$P_{{f_{i} }} = \left( {1 - \left( {1 - \frac{1}{{m_{i} }}} \right)^{{n_{i} \times k_{i} }} } \right)^{{R_{i} }} \approx \left( {1 - e^{{ - \frac{{n_{k} k_{i} }}{{m_{i} }}}} } \right)^{{k_{i} }}$$
(1)

When the probability of false normality is lowest, use: \(k_{i} = \frac{{m_{i} }}{{n_{i} }} \times \ln 2,\quad m_{i} = - \frac{{n_{i} m_{j} }}{{(\ln 2)^{2} }}\).

Write operation: The retrieval filter BFi corresponding to the block Bi is initially a vector \(m_{i}\) with a row of bits of 0. The lth bit is represented by \(BF_{i} \left[ l \right]\), where \(0 \le l \le m_{i - 1}\). There are ni indices in the block, and the spatial location attribute corresponding to the rth index is represented by \(e_{r}\), where 0 ≤ r ≤ ni−1. The filter uses a total of \(k_{i}\) hash functions, and the jth hash function is represented by a corner, where \(0 \le j \le k_{j}\)[26].

Assume that the rth spatial location attribute is to be written into the filter BFi, that is, the element is mapped to BR with several hash functions. The position of each hash function map** data to the filter is set to 1, and all hash functions are used to insert a spatial location attribute into the filter BFi, which can be expressed as

$$BF_{i} [h_{j} (e_{r} )] = BF_{i} [l] = 1,(for \, 0 \le j \le k_{i} )$$
(2)

Insert all spatial location attributes in the block into the corresponding filter BFi, which can be expressed as

$$BF_{i} [h_{i} (kr)] = BF_{i} [L] = 1(for \, 0 \le f \le k_{i} - 10 \le r \le n_{i} - 1)$$
(3)

The number of hash computations required to insert the spatial location attributes of all blocks into the filters of corresponding blocks is:

$$insert_{num} = \sum\limits_{B} {k_{i} }$$
(4)

Retrieval operation: according to the feature that the time attribute of the IoT data corresponds to the block timestamp, only the data in the corresponding time block needs to be retrieved to retrieve the data within a certain time range. To retrieve the spatial location attribute y, it is necessary to first determine the retrieval time range, that is, to determine the range of the search block, and then check whether each block within the range has the spatial location attribute y [27]. Check the retrieved filter BFi corresponding to the block Bi, and use all the hash functions in BFi to calculate the map** position result of the spatial position attribute y, which can be expressed as:

$$BF_{i} [y] = h_{j} (y),(for \, 0 \le j \le k - 1)$$
(5)

If the position \(h_{j} \left( y \right)\) on the filter corresponding to the position in the result is not 1, then the spatial position attribute y must not be in the block Bi; if the corresponding positions are all 1, then y is likely to be in the block [28]. Assuming that an index is retrieved, the block range to be retrieved is B, the corresponding filter set is \(BF,BF_{i} \in BF\). \(BF_{i}\) has \(k_{i}\) hash functions. The number of hash calculations required to retrieve the index is:

$$Query_{Num} = \sum\limits_{BF} {k_{i} }$$
(6)

Error rate of the first layer of retrieval: assume that the block range is B. The corresponding filter set is \(BF,BF_{i} \in BF\). Each block and the corresponding filter are independent, and each filter BF also has its own false normal probability \(P_{fi}\). So, the error probability of each retrieval is:

$$P_{f} (BF) = \sum\limits_{BF} {P_{fi} }$$
(7)

5 Experiment and analysis

5.1 Experimental environment and method

The experimental HyperLedger Logistics native blockchain system uses the HyperLedger Logistics v1.4.0 open source framework. The environment of the experiment is shown in Table 1.

Table 1 Configuration parameters of the experimental platform

The network node uses a virtual machine to simulate the construction. The experiment is run on the virtual machine. The memory allocation of each network node is 1 GB, and 25 GB hard disk is allocated. The system of the node is Ubuntu 16.04.

Figure 3 shows the experimental network node deployment diagram of the native HyperLedger logistics blockchain system.

Fig. 3
figure 3

Experimental deployment diagram of native blockchain system

As shown in Fig. 3, the experimental deployment nodes of the local blockchain system include sub-submission nodes, client nodes, classification codes, and back book nodes. The sub-submitting node is mainly responsible for monitoring and collecting the transaction data of the node, and packaging it into a block and submitting it to other nodes in the network. This node needs to install and configure the corresponding blockchain node software and use the same encryption algorithm and consensus protocol as other nodes to ensure the security and reliability of the system. The client node is an ordinary user in the blockchain network and can initiate and receive transaction requests. These nodes need to install client software matched with blockchain software and connect to the main blockchain nodes through the network to access the transaction records and status information of the blockchain. The classification code is a contract running in the blockchain network, which is responsible for processing and verifying specific types of transaction information and recording it in the blockchain to ensure its security and transparency. Classification codes need to be written according to specific business requirements, and programmed with intelligent contract languages such as Solidity. The back book node is responsible for storing the historical data of the blockchain network and providing the backup and restoration functions of the blockchain data. Back-up software matching with blockchain software needs to be installed at the back book node and connected to the main blockchain node to synchronize blockchain data and status information in real time. Through the joining and quitting of nodes, the verification and processing of transaction information, and the protection of network security, the management level of blockchain is improved.To build a P2P network in a real commercial environment, this paper uses a virtual model machine to simulate six network nodes.

5.2 Experimental results and analysis

  1. (1)

    Retrieval efficiency test

After inputting 300 m, 500 m, 700 m and 900 m data in the original logistics system and LCP-Chain system respectively, the time required for data input in the original logistics system and LCP-Chain system is recorded. The data entry time of the LCP-Chain system includes the total time of data synchronization, data parsing, data reorganization and other operations performed by the data analysis layer on the underlying blockchain data. The results are shown in Fig. 4.

Fig. 4
figure 4

Synchronization time diagram

In Fig. 4, the time required for data entry in the original logistics system and the LCP-Chain system is positively correlated with the size of the data volume. When the data scale is from 300 to 700 M, the time required for data entry in the native logistics system increases nearly linearly, and when the data scale increases to 900 M, the synchronization time increases faster.

After inputting the same 300 M data in the original logistics system and the LCP-Chain system, the method of retrieving 1000 times of retrieval transaction information and then taking the average value is used to calculate the retrieval time of a transaction, and the retrieval time of the two systems is recorded respectively. In the same way, change the data size to 500 M, 700 M, and 900 M, then retrieve the same transaction information, and record the time consumed by retrieving data from both systems. The experimental data is shown in Fig. 5

Fig. 5
figure 5

Retrieval time

In Fig. 5, when the data volume increases from 300 to 900 M, the retrieval time of the LCP-Chain system is always maintained at about 100 ms, and the growth rate is relatively slow. When the data size is 300 M, the retrieval time of the native logistics system is 9 times that of the LCP-Chain. When the amount of data increases to 500 M, the retrieval time changes little, and when the amount of data increases to 700 M, the retrieval time increases by about 15%. When the amount of data increases to 900 M, the retrieval time increases by about 17%, and the retrieval time is close to 1000 ms. The LCP-Chain system first retrieves the relevant logistics data information directly from the relational database, while the HyperLedger logistics native blockchain system accesses the underlying data of the blockchain for retrieval by means of chain code, so the retrieval time is longer.

  1. (2)

    Algorithm performance comparison

At present, the mainstream retrieval methods are the Raft algorithm, PBFT algorithm, and so on. We select these two methods and the algorithm in this paper to analyze the algorithm in the logistics scene. Raft algorithm, which relies on the system log for synchronization, is a consistency algorithm for managing replication logs. It is mainly to let a group of servers in the same state execute instructions in the same order to ensure that the data is consistent. The raft algorithm divides the server into three roles: leader, follower, and candidate, which can be converted to each other [29]. PBFT (Practical Byzantine Fault Tolerance): the traditional BFT algorithm can only solve the fault problem in the distributed environment, but it is limited by the low performance of the algorithm and the high time complexity of the algorithm. Through research, a polynomial level algorithm is proposed to reduce the cost of the BFT algorithm, making practical BFT algorithms widely available [30].

The retrieval algorithm in this paper is compared with retrieval algorithms such as Raft and PBFT in terms of throughput, and the throughput results of the retrieval algorithm are shown in Fig. 6.

Fig. 6
figure 6

Throughput comparison of retrieval algorithms

In Fig. 6, as the number of nodes in the blockchain network increases, the throughput of the system slowly decreases. The raft algorithm does not need to consider the existence of Byzantine nodes, so the throughput of the system is much higher than that of the PBFT and the retrieval algorithm in this paper. This is an excellent performance at the expense of the stability and security of the system. This algorithm can not meet the requirements of the LCP-Chain blockchain logistics information platform for retrieval algorithm. The throughput of the retrieval algorithm in this paper is about 20% higher than that of the PBFT retrieval algorithm, and the algorithm also has a good treatment for the problem nodes, which can ensure the stable operation of the system, and the throughput can reach more than 1000 trasanction/s, basically meeting the requirements of LCP-C for retrieval algorithm.

The delay comparison of retrieval algorithms is shown in Fig. 7:

Fig. 7
figure 7

Delay comparison of retrieval algorithms

In Fig. 7, with the increase of the number of nodes in the blockchain network, the delay of the system rises slowly. Under the same node conditions, the delay of the proposed algorithm is lower than that of RAFT algorithm and PBFT algorithm. This is because the algorithm is realized by exchanging messages to detect any possible node failures and taking necessary measures to ensure the data consistency between scattered node sets. Therefore, even when the number of nodes is increasing, the algorithm can maintain efficient operation and low delay.

6 Conclusion

The data security model based on blockchain can improve the security and credibility of logistics information sharing. By analyzing the data of logistics industry, it is found that the data security model based on blockchain can effectively improve the retrieval, analysis and sharing of logistics dynamic information. In addition, this study also shows that future logistics enterprises should actively explore the application of blockchain technology to improve data security and operational efficiency. Although the research results are positive, its limitation is that more research is needed to find suitable blockchain applications and promote industry changes. Future research should explore the combination of blockchain and logistics more deeply and evaluate its value in practice. By setting up a data analysis layer to synchronize the underlying data of the logistics blockchain with the relational database, it will increase the data writing time of the platform, thus affecting the transaction speed of the platform. Therefore, the next step is how to reduce the time consumed by the data analysis layer in the process of data synchronization, analysis and reorganization, and reduce the data entry time of the LCP-Chain system.