1 Introduction

Data, which is commonly touted as the oil of the 21st century, is not only fueling the success of the tech giants (i.e. Google, Apple, Facebook, Amazon) but also driving innovation in enterprises in general, as evidenced by the rise in data science across a variety of domains. Although personal data is particularly valuable, in Europe the General Data Protection Regulation (GDPR) stipulates obligations with respect to personal data processing and sharing that must be fulfilled by data controllers and processors. Such obligations relate to obtaining consent from data subjects, the provision of transparency with respect to personal data processing and sharing, and ensuring compliance with usage restrictions.

Although a number of tools that focus on GDPR compliance [4,5,6] have recently been released, such tools are targeted at self assessment, whereby companies are given information on their obligations after completing a standard questionnaire. In contrast the system described herein can be used by companies to automatically check if existing data processing and sharing practices comply with data protection related obligations. In this demo paper, we describe the SPECIALFootnote 1 consent, transparency and compliance system, which can be used not only to record consent but also to provide transparency to data subjects concerning the use of their personal data.

The contributions of the paper can be summarized as follows: we (i) demonstrate how usage constraints, data processing and sharing events can be expressed using the Resource Description Framework (RDF); and (ii) propose a transparency and compliance system that can automatically verify that data processing and sharing complies with the relevant usage control policies.

2 Related Work

The traditional way to obtain consent is to have a human readable description of the processing where the data collected is described in some very general terms. Dynamic consent is a relatively new framework that refers to the use of modern communication mediums to provide transparency, enable consent management and to elicit greater involvement of data subjects from a consent perspective [3].

When it comes to transparency with respect to data processing, relevant work primarily relates to the re-purposing of existing logging mechanisms as the basis for personal data processing transparency and compliance [2]. Many of the works analyzed by [2] use a secret key signing scheme based on Message Authentication Codes (MACs) together with a hashing algorithm to generate chains of log records that are in turn used to ensure log confidentiality and integrity [1]. MACs are themselves symmetric keys that are generated and verified using collision-resistant secure cryptographic hash functions. However, only a few works [7, 8] focused on personal data processing.

As for GDPR compliance, recently the Information Commissioner’s Office (ICO) in the UK [4], Microsoft [5], and Nymity [6] have developed compliance tools that enable companies to assess the compliance of their applications and business processes by completing a predefined questionnaire. In contrast to existing approaches, we propose a system that can be used to record both usage policies and data processing and sharing events in a manner that supports automatic compliance checking.

3 RDF Vocabularies for Usage Policies and Events

The vocabularies described in this section are based on the SPECIAL usage policy languageFootnote 2 and log vocabularyFootnote 3, which were derived from in-depth legal analysis of use cases that require the processing and sharing of personal data for improved information and communication technology and financial services. SPECIAL usage policies can be used to denote the following information at different levels of granularity:

  • Data describes the personal data collected from the data subject.

  • Processing describes the operations that are performed on the personal data.

  • Purpose represents the objective of such processing.

  • Storage specifies where data are stored and for how long.

  • Recipients specifies with whom the data is shared.

In this paper we use the standard namespace prefixes for both rdf and rdfs, and adopt the SPECIAL vocabulary prefixes represented in Listing 1.1.

figure a

Usage Policies. Using the SPECIAL usage policy language it is possible to specify basic usage policies as OWL classes of objects, as denoted in Listing 1.2 (represented using the OWL functional syntax for conciseness). Whereby the permission to perform SomeProcessing of SomeDataCategory for SomePurpose has been given to SomeRecipient in compliance with SomeStorage restrictions.

figure b

Data Processing and Sharing Events. The SPECIAL policy log vocabulary is used to represent data processing and sharing events. The event log extract represented in Listing 1.3 (represented using turtle), relates to a new processing event corresponding to a data subject identified as befit:Sue on the 03.01.2018 at 13:20 (i.e., validity time). The event was recorded few seconds later (i.e., transaction time). The actual data captured can be traced via the splog:eventContent property, which is detailed in Listing 1.4, and usually stored in a separate knowledge base. While a hash of the content is stored in the event log.

figure c
figure d

Compliance Checking. In order to verify that data processing and sharing events comply with the corresponding usage policies specified by data subjects, we use OWL reasoning to decide whether the authorized operations specified by a data subject through their given consent, subsume the specific data processing records in the transparency log.

4 A Scalable Consent, Transparency and Compliance Architecture

The SPECIAL demo system architecture, which is depicted in Fig. 1, enables transparency and compliance checking based on usage policies and events expressed using the aforementioned vocabularies.

Fig. 1.
figure 1

A scalable consent, transparency and compliance architecture

  • Kafka and Zookeeper. Data processing and sharing event logs are stored in the KafkaFootnote 4 distributed streaming platform, which in turn relies on ZookeeperFootnote 5 for configuration, naming, synchronization, and providing group services. Each application log is represented using a distinct Kafka topic, while a separate compliance topic is used to store the enriched log after compliance checks have been completed.

  • Virtuoso Triple Store Based on our current use case requirements, we assume that consent updates are infrequent and as such usage policies and the respective vocabularies are represented in a Virtuoso triple store.

  • Compliance Checker. The compliance checker, which includes an embedded HermiTFootnote 6 reasoner uses the consent saved in Virtuoso together with the application logs provided by Kafka to check that data processing and sharing complies with the relevant usage control policies. The results of this check are saved onto a new Kafka topic.

  • Elasticsearch. As logs can be serialized using JSON-LD, it is possible to benefit from the faceting browsing capabilities of ElasticsearchFootnote 7 and the out of the box visualization capabilities provided by Kibana.

  • Consent and Transparency & Compliance Backends. Interaction between the various architectural components is managed by mu.semte.chFootnote 8 an open source micro-services framework for building RDF enabled applications.

  • Consent and Transparency & Compliance Dashboards. Users interact with the system via the consent management and the transparency and compliance dashboards. The former supports granting and revoking consent for processing/sharing. While, latter provides the data subject with transparency with respect to data processing and sharing events in a digestible manner.

5 Conclusion

The objective of this demo paper is to introduce the SPECIAL consent, transparency and compliance system, which is built around the Kafka distributed streaming platform. Future work includes the benchmarking of the various system components of the SPECIAL system from a performance and a scalability perspective, and the hardening of the system against various security attacks.