Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In recent years, organizations and governments have made large volumes of useful Open Data available on the web. Publishers frequently release the data under a license that allow anyone to use, reuse and redistribute it. This allows interested stakeholders to analyze the data, put it in a new context, gain insights, and create innovative services. Using Vienna Open Government Footnote 1 data, for example, one can easily find Points of Interest (POI) such as public barbecue areas or bathing areas at the shores of the Danube River; from the LinkedGeoData Footnote 2 repository, a tourist can find POIs all over the world, accessible as Linked Data. Open Data, particularly if published as Linked Open Data (LOD) in an interlinked, structured and machine-understandable manner, has large potential to inform decisions and solve problems.

However, end users are not able to directly access, explore, and combine different sources to satisfy their information needs or support their everyday decision-making due to a number of technological barriers: (i) Users do not know where to find the required data sources; (ii) provided that they are aware of appropriate sources, they frequently do not have the means and skills to access them; and (iii) if users are able of collecting raw data from various sources, they are typically not capable of performing the necessary data processing and data integration tasks manually.

Therefore, end users can not, as yet, tap the potential of Open Data, but rather have to rely on applications built by others. Research into End User Programming aims to emancipate users from this dependence upon programmers and allow them to satisfy their individual needs with limited up-front learning time investment [10]. In this research tradition, widget-based mashups were developed as a visual programming paradigm that allows end users to compose ad-hoc applications by combining available widgets. Such applications use “content from more than one source to create a single new service displayed in a single graphical interface” [3], thereby increasing the value of existing data.

Following a widget-based mashup approach, we have developed a Linked Widgets platformFootnote 3 that aims to (i) provide universal practical utility without restrictions on domain or data sources, (ii) allow users to combine multiple Open Data sources and leverage their joint value, and (iii) allow novice users to analyze, integrate and visualize data.

The platform is built upon Semantic Web technologies and its design follows three guiding principles: openness, connectedness, and reusability. Openness distinguishes the platform from similar approaches and is the key for achieving our first objective, i.e., the capability to deal with various data sources. This openness should encourage developers to implement and add new widgets to the platform. End users can reuse and connect these widgets to collect, integrate, and combine data from different sources in a dynamic and creative manner.

To achieve the second goal, we use a graph-based model to semantically describe the input and output of a widget. The platform uses the annotated models to provide semantic search, data model matching, and auto composition. Semantic search is a mechanism for the discovery of widgets that help solve a given information problem. Data model matching allows the platform to highlight compatible widgets (i.e., signal the user which widgets can be connected). Auto composition is an innovative approach to compose complete applications automatically from a set of widgets. Research on the latter is still in an early stage of development and beyond the scope of this paper.

The remainder of this paper is organized as follows. In Sect. 2, we introduce key terms and outline the Linked Widget life cycle. Section 3 illustrates the potential of the platform by means of a sample use case. Section 4 introduces the Linked Widget model, Sect. 5 outlines the widget development process, and Sect. 6 discusses how widgets from different developers are connected. Section 7 provides pointers to related work and we conclude in Sect. 8 with an outlook on future research.

2 Linked Widget Life Cycle

Before we define the Linked Widget life cycle, it is necessary to define a set of basic terms and concepts. A widget is an “interactive single purpose application for displaying and updating local data or data on the Web, packaged in a way to allow a single download and installation on a user’s machine or mobile device”.Footnote 4 Widgets can make use of web services or web Application Programming Interfaces (APIs). Furthermore, they can access existing Open Data sources such as Open Governmental Data (OGD) and Linked Data.

Linked Widgets [18] are the key concept that our platform is based upon. They extend standard widgets with a semantic model following Linked Data principles. The semantic model describes data input/output and metadata such as provenance and license. In particular, the model consists of four main components: (i) input terminals, (ii) output terminals, (iii) options, and (iv) a processing function. Input/output terminals are used to connect widgets in a mashup and represent the data flow. Options are HTML inputs inside a widget. They provide a mechanism for users to control a widget’s behavior. Finally, the processing function defines how widgets receive input and return their output.

We distinguish three types of widgets: data widget, process widget, and presentation widget. A data widget retrieves data from a data source and provides the collected data to other widgets. Hence, it has no input terminals. A process widget takes input data from other widgets, applies operations on the data, and provides the result to other widgets. It has both input and output terminals. A presentation widget has at least one input terminal and presents the data from another widget in a particular manner (e.g., textually or visually). It has no output terminals.

A Mashup is an interconnected combination of widgets. It should contain at least one data widget providing the data and one presentation widget to display the final results.

Figure 1 illustrates the Linked Widgets life cycle. Developers first implement and deploy their widgets on arbitrary servers; then, they pass the Uniform Resource Locator (URL) of the widget to the widget annotator module. This module allows them to add semantic annotations to their widgets. Next, the platform manager checks whether (i) the widget is accessible via the annotated URL and (ii) whether the used vocabularies conform to existing widgets to enforce consistency throughout the platform. After this validation process, the created widget model is stored permanently as Linked Data and is accessible through a SPARQL endpoint. A published widget can be revoked if, for instance, its corresponding Open Data source is no longer available. The platform uses PubbyFootnote 5 to provide dereferenceable Uniform Resource Identifiers (URIs).

Once widgets are made available on the platform, users can use them in their mashups. To this end, because the number of widgets and related Open Data sources may be large, users may first need to search for appropriate widgets. By executing a SPARQL query over the semantic data repository of widget models in the background, the platform allows users to (i) search for widgets based on their input or output model, and (ii) for a selected terminal of a selected widget, find all terminals of other widgets that are compatible with it.

Fig. 1.
figure 1

Linked Widget life cycle

The final result of a mashup is directly displayed on the platform. Alternatively, it can also be shown via the mashup publication module and be shared and published on other websites. Users can also package a mashup as a new widget.

In conclusion, the platform is versatile, open and extensible. Maintaining this platform is economical since developers can store widgets externally and both the data retrieval and data processing tasks take place in either the client’s browser or on the widget server. Finally, although widgets come from different developers and end users, they can be reused in an efficient manner: (i) users can creatively combine Linked Widgets from different developers to compose LOD applications, (ii) they can reuse LOD applications from others, but change the parameters of the constituent widgets, (iii) they can reuse a composed LOD application as a new widget, and (iv) based on available widgets, developers can implement new widgets to support new use cases.

3 Open Data Exploration

As the Open Data paradigm gains broad support, an increasingly relevant challenge is to provide means to utilize these data. The process of working with published Open Data sources should be as intuitive as possible, especially for users without a technical background. In this section we present a motivational scenario that illustrates how the platform allows users to handle Open Data sources in an innovative fashion and fosters (re)use of data.

We organize widgets into widget collections addressing different problem domains. Each collection might use various Open Data sources. For the example use case, consider a tour guide collection of seven widgets that combine data from Google Maps, Last.fm, Flickr, and LinkedGeoData.

  • Map Pointer: Users can define a point on a map. The point’s latitude and longitude is then returned as output.

  • Music Artist Search: Via the last.fm APIFootnote 6 this widget accepts an artist’s name as an input and returns the corresponding URL.

  • Music Event by Artist: Based on an artist URL this widget returns events this artist participates in while providing a time and event name filter. This is also done with the last.fm API.

  • Point of Interest Search: This widget leverages the LinkedGeoData repository to find semantically encoded POIs. Users can influence the output by providing parameters. Users can select the type of POI and the radius of retrieved POIs with respect to the incoming location.

  • Flickr Geo Image Search: By using the Flickr Image Search APIFootnote 7 this widget enriches location data with images. Users may specify a radius and result limit.

  • Google Map: This visualization widget displays points on a map. It is typically used to display the final results of a mashup.

  • Geo Merge: This widget merges two lists of point data into a single list of pairs based on their distance. Users can specify a minimum and a maximum distance between points. The Geo Merge widget therefore serves two purposes, i.e., merging of two inputs into one output and filtering based on distance constraints.

Fig. 2.
figure 2

A mashup example

Figure 2 shows one of the mashups in this collection. It covers the following scenario: We are traveling to a city X and want to know whether our favorite music artist will give a concert there. After the concert, we want to go get a drink at a bar near the venue. Is there any combination of music events and bars which satisfy these conditions?

Google Map displays the enriched result. In our example, we get four pairings of a bar and a nearby music event that the artist is involved in. Each bar and music event combination is enriched with illustrative Flickr images and a URI pointing to the corresponding entities at LinkedGeoData and Last.fm.

Many other combinations of widgets are possible. The platform ensures the semantic validity of widget combinations. For example, we can find (i) all past or future music events for our artist on the map by wiring Music Event By Artist directly to Google Map, (ii) all POIs near a defined place by connecting Point of Interest Search to Google Map etc.

4 Linked Widget Model

We enrich the widget’s input and output data with semantic models. These semantic I/O models are essential for the subsequent search and composition processes. Furthermore, they are crucial for the effective sharing of widgets. For example, even when the number of widgets available is limited (e.g. 43 for Yahoo! Pipes [15] and 300+ for Microsoft PopflyFootnote 8), finding appropriate widgets needed to build a particular mashup solution is already a difficult task. Existing mashup platforms usually employ a text-based approach for widget search, which is not particularly helpful for advanced widget exploration and widget composition tasks.

Figure 3 presents a part of our ontology for the modeling of Linked Widgets. Using Semantic Web technologies to describe mashups and their components is not by itself a novel approach (cf. [12, 14]). However, rather than capturing the functional semantics and focusing on input and output parameters like SAWSDL [7], OWL-SFootnote 9, or WSMOFootnote 10, we use a graph-based model [16, 17, 19] to formally annotate the input and output components as well as their relations. The SWRL vocabulary is reused to define the semantic relation between two nodes in the input and output graphs.

Figure 3 also shows the detailed model of the Geo Merger widget. The widget takes two arrays of arbitrary objects containing the wgs84:location property as input. Its domain is the Point class with two literal properties, i.e. lat and long. The widget output is a two-dimensional array in which each row represents two objects from two input arrays, respectively. Those objects include locations satisfying the distance filter of the Geo Merger widget.

Fig. 3.
figure 3

General Linked Widget model and Geo Merger model

To specify that input/output is an array of objects, we use the literal property hasArrayDimension (\(0\): single element; \( n > 0{:}\; n\)-dimensional array). Because the input of Geo Merger is an “arbitrary” object, we apply the owl:Thing class to represent it in the data model.

The point, location, lat and long terms are available in different vocabularies. However, since a well-established ontology facilitates data exchange between widgets, we chose wgs84. The widget annotator module interactively recommends frequently used terms of the most popular vocabularies to developers. This eases the annotation process and fosters consistency by diminishing the use of varying terms to describe the same concepts.

With SPARQL queries, we can find a widget that receives/outputs an object containing, for instance, geographic information. Moreover, based on an input terminal, e.g., the first input of Geo Merger, we can find all output terminals that can be connected (cf. Listing 1.1). Conditions that have to be satisfied for the terminals are: (i) matching type and array dimension; and (ii) matching attributes, i.e., the set of attributes required by the input terminal must be a subset of the attributes provided by the output terminal. In the tour guide collection, the outputs of Map Pointer, Point of Interest Search, Flickr Geo Image Search, and Music Event by Artist satisfy these conditions.

Similarly, we can model a more advanced widget. Its input and output have object attributes and there can be relations between those objects. For example, when modeling the Geo Merger, if required, we can present the nearby relation between the two input points as shown in Fig. 3. Due to the graph-based description, the platform can answer questions such as “find all widgets containing the nearby relation between two locations”.

figure a

5 Widget Development

Widgets may be created in two ways: either developers program them or end users create a mashup and save it as a new widget. In the former case, developers follow three steps: (i) inject a JavaScript file from the platform into the widget to equip it with the capability of cooperating with others, (ii) define the input and output configuration, and (iii) implement the JavaScript run(data) function which defines how the widget processes input data. If a widget has no input terminal, the corresponding data object is null. Otherwise, during runtime, the platform collects data from all relevant output terminals to build the data object and pass it to the run function as a parameter.

Developers can use arbitrary web languages. The widget annotator module can automatically generate a skeleton of the widget as well as sample data. Developers then only need to implement the processing function. We expect this simplicity of widget development will foster developers’ productivity and creativity. Users hence have more means to explore and combine different Open Data sources.

6 Widget Cooperation

Technically, a Linked Widget is an HTML iframe wrapped in a widget skin. The platform automatically creates the skin to provide additional functionalities for the widget such as create input and output terminals, run, cache, view output data, and resize widgets.

Iframes can trigger events, which contain messages. These messages are then consumed by other iframes, which registered a listener for these events. Based on that, we implement a communication protocol that addresses the challenge of reusing and connecting different applications on top of Open Data. The protocol is transparent to developers and can be easily extended to fit new use cases that require different types of data such as stream data or batch processed data.

As an example for how the protocol facilitates communication at runtime, consider a mashup with three widgets \(A{\rightarrow }B{\rightarrow }C\). Typically, when a user triggers an action to run a widget, e.g., widget \(C\), this action requires all widgets that provide input to this widget to run first. Because widget \(C\) requires the output from widget \(B\), which in turn, requires output from widget \(A\), widgets \(A\) and \(B\) need to run first. Figure 4 shows the messages transferred between the platform and the widgets. The first two messages are delivered when widgets are created for the mashup. The platform sets identifiers for all widgets and then receives their terminal configurations. The communication takes place entirely in the client’s browser. After a user has created the mashup, the platform server is no longer needed because the browser and the widget’s servers do the computation. This process reduces the platform-server load and improves performance and scalability.

Fig. 4.
figure 4

An example of widget communication

7 Related Work

Researchers have been develo** mashup-based tools for years. Many of them are geared towards end users and aim to allow them to efficiently create applications by connecting simple and lightweight entities.

Aghaee and Pautasso [1] provide a good overview of mashup approaches. They discuss open research challenges which we – at least partly – address with our platform. For instance, we address the Simplicity and Expressive Power Tradeoff challenges through a semantic model. They also evaluate Yahoo! Pipes [15], IBM Mashup CenterFootnote 11, Presto CloudFootnote 12, and ServFace [11]. A common limitation they identify for all these platforms is that the wiring paradigm is hard to grasp for non-expert end users. We aim to overcome this barrier, for instance, by recommending valid wiring options to the user.

Other surveys of the mashup literature [2] have developed a number of evaluation criteria and identified shortcomings of existing approaches. Computer scientists addressed some of these shortcomings in more recent contributions, but others remain an open challenge. Grammel and Storey [4] review six different approaches and identify potential areas of improvement and future research. For instance, they argue that context-specific suggestions could support learning how to build and find mashups. Regarding user interface improvements, they note that designing mechanisms such as automatic mashup generation to provide starting points to end users would enhance usability drastically. This feature is also provided by the platform presented in this paper. However, detecting invalid mashups still remains a challenge that requires appropriate debugging mechanisms for non-programmers.

Super Stream Collider (SSC) [5], MashQL [6], and Deri Pipes [8] are three platforms aimed at semantic data processing. Whereas SSC consumes live stream data only, MashQL allows users to easily create a SPARQL query, using its custom query-by-diagram language. MashQL cannot aggregate data from different sources and its output visualization only supports text and table formats. Deri Pipes requires users to be familiar with Semantic Web technologies, SPARQL queries, and programming to perform semantic data processing tasks from different data sources. There are multiple other platforms which we only want to point at, such as Vegemite [9], Paggr [13], or Marmite [20]. They all follow a mashup-based approach to ease users’ access to data sources, but unlike our approach, they do not make use of semantic models.

8 Conclusion and Future Work

This paper presents an overview of a extensible, generic, open, economical, and sharable mashup platform for the exploration of Open Data. The platform is aimed at end users without knowledge of Semantic Web technologies or programming skills. We encapsulate semantic, graph-based models inside Linked Widgets and provide mechanisms to annotate their inputs and outputs. Leveraging these annotations, the platform combines and searches widgets based on defined semantics.

Because we encourage developers to contribute their widgets to the platform and the widget development is language-independent, we expect to have a large number of versatile widgets in future. As a consequence, users can explore more Open Data sources and easily collect and combine desired information.

In the future, this system should serve as a universal data platform and bring together both mashup developers and mashup users. This will allow users to work with different kinds of data sources (e.g., governmental, financial, environmental data etc.) and types of data (e.g., open, linked, tabular data etc.), without technical barriers. From a data perspective, the vision is to support people in their everyday decision-making.

From a technical perspective, we aim to provide a mashup platform for the exploration of Open Data following Semantic Web design principles. We also want to push these ideas a step further by lifting non-semantic data on a semantic level and leverage its full potential.

Future research will focus on using the semantic models, especially for the widget auto-composition feature. We also need to improve the model-matching algorithm by utilizing ontology alignment techniques, since two models can use different resources from different ontologies. Another interesting direction for future research is the automatic creation of new widgets able to handle dynamic web data as an input source.