Conference Papers
The Information Workbench as a Self-Service Platform for Linked Data Applications
Peter Haase, Christian Hütter, Michael Schmidt, Andreas Schwarte.
Presented at WWW 2012, Lyon (France).
Read full paper
GovWILD: Integrating Open Government Data for Transparency
Christoph Böhm, Markus Freitag, Arvid Heise, Claudia Lehmann, Andrina Mascher, Felix Naumann, Vuk Ercegovac, Mauricio Hernandez, Peter Haase, Michael Schmidt.
Presented at WWW 2012, Lyon (France).
Read full paper
FedBench: A Benchmark Suite for Federated Semantic Data Query Processing
Michael Schmidt, Olaf Görlitz, Peter Haase, Günter Ladwig, Andreas Schwarte, Thanh Tran.
In Proc. ISWC 2011, Bonn (Germany).
Read full paper
FedX: Optimization Techniques for Federated Query Processing on Linked Data
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, Michael Schmidt.
In Proc. ISWC 2011, Bonn (Germany).
Read full paper
The Information Workbench as a Self-Service Platform for Linked Data Applications
Peter Haase, Michael Schmidt, Andreas Schwarte.
In 2nd Intl. Workshop on Consuming Linked Data (COLD), 2011, Bonn (Germany).
Read full paper
A-R-E: The Author-Review-Execute Environment
Wolfgang Müller, Isabel Rojas, Andreas Eberhart, Peter Haase and Michael Schmidt.
Read full paper
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt.
Read full paper
A Scalable Kernel Approach to Learning in Semantic Graphs with Applications to Linked Data
Veli Bicer, Thanh Tran, and Anna Gossen. Accepted at the Extended Semantic Web Conference (ESWC 2011), Heraklion, Greece, June 2011.
Read full paper
Semantic Technologies for Enterprise Cloud Management
Peter Haase, Tobias Mathäß, Michael Schmidt, Andreas Eberhart, Ulrich Walther. Accepted at the International Semantic Web Conference, ISWC 2010, Shanghai, China.
Read full paper
An Evaluation of Approaches to Federated Query Processing over Linked Data
Accepted at the I-SEMANTICS 2010, Graz, Austria.
Peter Haase, Tobias Mathäß, Michael Ziller.
Read full paper
SBML2SMW: bridging System Biology with semantic web technologies for biomedical knowledge acquisition and hypothesis elicitation
Tobias Mathäß, Peter Haase, Hiroaki Kitano, Luca Toldo. Accepted at the 2nd Workshop of Ontologies in Biomedicine and Life Sciences, OBML2010, Mannheim, Germany.
Read full paper
Integrierte Informationsverwaltung für die Lebenswissenschaften mit der Information Workbench
Tobias Mathäß, Peter Haase. Accepted at the 2. Workshop über Daten in den Lebenswissenschaften auf der Informatik 2010, Leipzig, Germany.
Read full paper
Usability of Keyword-driven Schema-agnostic Search – A Comparative Study of Keyword Search, Faceted Search, Query Completion and Result Completion
Thanh Tran, Tobias Mathäß, Peter Haase. Accepted at the Extended Semantic Web Conference (ESWC 2010), Heraklion, Greece, June 2010.
Technical Reports
The Information Workbench – Interacting with the Web of Data
Peter Haase, Andreas Eberhart, Sebastian Godelet, Tobias Mathäß, Thanh Tran, Günter Ladwig, Andreas Wagner, October 2009.
Read abstract
We present the Information Workbench, an application for interacting with the Web of data. The Information Workbench manages large amounts of structured and unstructured information, which may be imported and integrated from existing sources, but also allows end users to annotate, complete and update information in a collaborative way. New paradigms for accessing information include hybrid search across the structured and unstructured data, keyword search combined with facetted search, as well as semantic query completion and interpretation, which assists the user in expressing complex information needs by an automated translation of keyword queries into hybrid queries. A Living UI based on widgets for the interaction with the data enables a homogeneous, seamless, continuous and personal experience.
Recently Completed Student Theses
Entity-based Search Contextualization
Holger Lamm, January 2013, Karlsruhe Institute of Technology (KIT)
Read abstract
The ongoing success in the domain of Semantic Web leads to a rapidly growing amount of Linked Data sources. This trend poses new challenges in the search and discovery process of data sources. We therefore introduce meta descriptions that allow the application of sophisticated search procedures. To further improve the efficiency of search, we present an approach of finding contextualizing data sources based on entity clustering. The recommendations of such related data sources can be used to either support humans or automated systems on their search. We demonstrate the practicability of our approach with the implementation of a real use case and show its efficiency with two comprehensive user experiments.
Read full paper
FedX: Optimization Techniques for Federated Query Processing on Linked Data
Andreas Schwarte, July 2011, Saarland University
Read abstract
Motivated by the ongoing success of Linked Data and the growing amount of semantic data sources available on the Web, new challenges to query processing are emerging. Especially in distributed settings that require joining data provided by multiple sources, sophisticated optimization techniques are necessary for efficient query processing. We propose novel join processing and grouping techniques to minimize the number of remote requests, and develop an effective solution for source selection in the absence of preprocessed metadata. We present FedX, a practical framework that enables efficient SPARQL query processing on heterogeneous, virtually integrated Linked Data sources. In experiments, we demonstrate the practicability and efficiency of our framework on a set of real-world queries and data sources from the Linked Open Data cloud. With FedX we achieve a significant improvement in query performance over state-of-the-art federated query engines.
Read full paper
Vergleich semantischer Suchparadigmen
Tobias Mathäß, January 2010, Karlsruhe Institute of Technology (KIT)
Read abstract
Semantische Suchsysteme gewinnen in Zeiten des Semantic Web mehr und mehr an Bedeutung. Die Organisation der Daten im Web of Data, mit expliziten Relationen zwischen Datenobjekten, ermöglicht Anfragen, die weit über die Suche nach in Dokumenten vorkommenden Schlüusselwörtern, wie sie in klassischen Suchsystemen umgesetzt sind, hinaus gehen. In dieser Diplomarbeit werden verschiedene Suchparadigmen vorgestellt, miteinander verglichen, in eine bestehende Infrastruktur integriert und im Rahmen einer Benutzerstudie auf ihre Eignung für verschiedene Anfragen hin untersucht.
Information Filtering Using an Automated Widget Selection Algorithm
Daniel Kurtsiefer, September 2009, International University in Germany, Bruchsal.
Read abstract
Semantic technology is one of the current hot topics in research and for the industry. However, when building an application we face the problem that structured data is hard to perceive. The solution for this could be found in widgets as they can be programmed to operate on structured data and turn table entries into graphical representations which are easier to understand. A problem is though the question of how to select fitting widgets to represent a given set of data. If left to the user this task is a tedious, time consuming and error prone process which becomes even more severe with an ever increasing amount of data and widgets to choose from. This thesis discusses a new automated widget selection algorithm based upon a genetic algorithm to automatically choose the right set of widgets to display a set of data and optimize this set according to aspects such as coverage of the entity’s properties and redundancy amongst the widgets. Through this algorithm, the act of selecting fitting widgets and maintaining the entity in terms of widget coverage has become easier and less time consuming while scaling well with increasing numbers of both properties and widgets.
The eCloudManager – Versioning Framework
Hanjo Viets, August 2009, International University in Germany, Bruchsal.
Read abstract
Looking at the market it is hard to find a management software which takes care of software versions and of deploying them on the virtual machines (VMs) of a Landscape as a Service (LaaS) cloud. As cloud computing and server virtualization becomes more and more important, a versioning tool for software distribution becomes essential. The SAP Center of Excellence needs to manage hundreds of hosts and the software running on them needs to be versionized and updated with little effort, many of them at once. To solve this problem a system which takes care of software versions is needed. Versioning is done for a variety of reasons: distinguishing development states, identifying customer versions in case support is needed and rolling back to a proven and tested version if problems occur are only some of them. The eCloudManager – Versioning Framework developed in this thesis will be able to distribute these to a number of clients. In addition it will monitor these clients to be able to identify software versions.
The eCloudManager – Versioning framework will make the distribution of software versions easier, faster and more failsafe.
Organizing and Provisioning Local Hardware Resources as a Cloud Infrastructure
Johannes Lorey, August 2009, Universität Karlsruhe.
Read abstract
Due to the distribution of powerful computers and to the establishment of networks connecting them, users all over the world have access to a vast pool of software and hardware resources. To utilize these resources, they usually do not need to be physically present at the respective location. Instead, the Internet is employed to enable access to various services offering different features and functionality.
These services and the infrastructure required to host them are usually summarized by the term Cloud Computing. Cloud Computing incorporates a number of diverse concepts, primarily differing from one another by the amount and flexibility of resources they grant to the user. One variant is referred to as Infrastructure as a service and offers remote control over a complete abstract computer, a so-called Virtual Machine (VM).
Amazon Web Services LLC has become one of the most successful commercial providers in this segment with its Elastic Compute Cloud (EC2). EC2 grants various benefits to its customers. For example, using virtual machines may compensate the required hardware infrastructure at hand, offering dynamic deployment and usage on demand as well as pay-per-use billing. Additionally, the simplified allocation and utilization of IT resources increases user satisfaction.
Different companies are employing the Elastic Compute Cloud to outsource hardware resources because of its flexibility. However, a number of its restrictions and limitations may render this utilization problematic or redundant. First of all, the resources within the rented infrastructure are typically already present at the local data center, thereby eliminating the need for additional hardware. Furthermore, storing sensible information on a publicly accessible service may pose a security threat for a company.
To overcome these dilemmas, this work introduces the implementation of a local Cloud Computing solution that offers the same features and interfaces as EC2 does. In addition to being interoperable with the Amazons service, the implementation focuses on integrating and consolidating heterogeneous hardware and software products in order for companies to utilize their established infrastructure effectively and efficiently.
Hence, a different end user and resource provider requirements are identified and discussed for the proposed software. Based on this analysis, the introduced solution enables transparent integration of various virtualization and storage technologies. Moreover, the work at hand illustrates how replicating the concepts and interfaces of the original Elastic Compute Cloud ensures compatibility with current and future versions of Amazon’s service.
A Unified API Framework for Exposing Application Logic
Andreas Schwarte, August 2009, International University in Germany, Bruchsal.
Read abstract
In contemporary computer architecture the separation of backend logic and frontend interfaces is a requirement that has to be considered when developing new software. Commonly it is desired to publish and provide certain services for developers and customers in form of a public API. With regard to today’s middleware developers are facing numerous limitations: no complete and compact middleware solution is available that integrates common core technologies and services for exposing backend functionality to various endpoints.
The ides of this work is to discuss and establish a unified API framework that enables exposing application logic for various channels: the software’s backend implementation should be annotated with metadata and documentation only once, and the lightweight API framework exposes available functionality to supported channels such as CLI, Java RMI, SOAP or interactive scripting shells automatically. In addition to the support for various channels the API framework will integrate an aspect handling system and common middleware services for authentication, session management, directory access with LDAP and role based security. This new approach will support and enable dynamic and interactive access to exposed application functionality and thus benefit developers in the software engineering process in terms of time and cost efficient API development.
Scalable Triple Stores – Sesame on the Google App Engine
Christopher Georg Haccius, August 2009, International University in Germany, Bruchsal.
Read abstract
RDF and the Semantic Web are new technologies with rapidly increasing importance in the areas on gathering and linking data and knowledge. Constantly increasing datasets require scalable datastores to meet the increasing storage and processing requirements of the RDF stores and interferences.
fluid Operations offers a product which makes use of RDF data. They are using the Sesame API to manage the RDF store. The scope of this thesis is to test whether or not the Google App Engine offers in combination with the Google DataStore an efficient backend service to run the Sesame API.
First of all the technologies available and used are researched and explored. In a second instance conceptual ideas of how a mapping from the Sesame structure to the Google DataStore might look like are created and explored. The next step is to implement the conceptual ideas. Finally the implementation is tested and the Google App Engine is evaluated for its use with and RDF store.
The Google App Engine is not the fastest available backend service when it comes to persisting data, which is especially due to the limited processing capacities of the Google servers. With respect to queries the retrieval of data from the data store returned promising results.
Due to many technical difficulties the use of the Sesame API on the Google App Engine cannot be suggested right now. However, as both the App Engine as well as Sesame are still evolving technologies and improvements and new features are added on a very regular basis it is only a matter of time until those two technologies can be combined to a powerful RDF storage and managing solution for private users. The use of this setup for enterprise applications still requires further research.
Einbindung von Cloud Computing- und Storage-Diensten in Virtual Private Data Centers aus der Reseller Perspektive
Andreas Hoffmann, July 2009, Universität Karlsruhe.
Read abstract
This thesis paper examines the question of how cloud computing and storage services can be integrated in the operation of virtual private data centers run in the system landscape at fluid Operations GmbH and SAP AG’s Centre of Excellence. The clouds used here are Amazon EC2 and S3. The integration in an already existing administrative tool is considered as an additional aspect in the thesis paper.
fluid Operations’ eCloudManager is applied as the main administration tool for the operation of the existing system landscape. For this purpose the features of the Amazon EC2 are implemented in the internal architecture of the eCloudManager.
As a preparation for the integration of cloud computing resources, an implementation of the local template concept is developed in Amanzon EC2′s functionality. Thereby problems occurring in the usage of SAP systems are being identified, which hinder an efficient integration.
For the integration of Cloud Storage, various possibilities suitable for backups are analyzed. For this purpose, an enhancement for Alfresco JLAN which makes Amazon S3 available over CIFS, FTP and NFS is designed and implemented. Through the additional enhancement of JLAN for reading VMFS-access, JLAN is universally used in the given environment.
Rahmenwerk zur regelbasierten Überwachung und Steuerung von Cloud Infrastrukturen
Carsten Glose, May 2009, Universität Karlsruhe.
Read abstract
In this thesis paper, a rule-based framework for monitoring and controlling cloud infrastructures is designed and implemented. Cloud computing has several advantages for the user. However it should be ensured that users can actually trust the availability of services in the cloud. The company that operates the cloud computing infrastructure must have the service level agreements to be able to monitor the cloud infrastructure with a management solution. Monitoring has become more complicated with the utilization of the virtualization technology in cloud computing environments. Even the administration of cloud computing landscapes has become more difficult in comparison to classic data processing centers.
The main task of this thesis paper is to expand the component evaluation of SAP Center of Excellence’s (CoE) existing cloud management solution, which is developed by fluid Operations AG, so that the system recognizes the exceptional situations and takes actions such as sending notification emails or running a custom responding script. Such data analysis is not trivial in regard to the amount of data which is to be evaluated and the inconsistencies in the data. Another notification component that will be implemented through this thesis paper is a component that will keep the system from sending unnecessary and irrelevant information that would overwhelm the administrator.
The starting point consists of performing an analysis of existing monitoring solutions to solve this problem. Subsequently the results of the Needs and Requirements Analysis will be presented to the Center of Excellence. The policy language and the technology that will be used will be decided based on the outcome of the analysis. The next step is then to integrate and enhance the Drools Rule System in the management application, for the purpose of data evaluation.
The result of this thesis paper is the further improvement of this management application by a reporting and a notification component on the basis of the Drools Rule System. Improvements created in this thesis paper are used in production environments and successfully support the cloud infrastructure at the SAP Center of Excellence.
Applikationsmonitoring und Directory Service für SAP Virtual Private Datacenter in der Cloud
Sebastian Schmidt, January 2009, Universität Karlsruhe.
Read abstract
The development of the data centers from individual, local mainframes, over clusters of convenient standard servers, to wide-spread grids, has always raised new requirements for the provided monitoring and management software. The latest development in the area of Cloud Computing is in no way behind that. Thus, in addition to the requirements of monitoring distributed resources, new requirements through virtualization are now introduced. In addition to the known cloud alternatives: Infrastructure as a Service (processing and storage power provisioning), Platform as a Service (provisioning of a platform for application development and hosting), and Software as a Service (software provisioning), there is also an area in which all the alternatives mentioned above are combined. The so-called Landscape as a Service is applied so as to make available over the Internet complex and not only or only limited customer-able Software as Software as a Service. The concept allows the outsourcing of entire data centers and is thus directed at companies which want to outsource their entire data centers, including hardware, software, maintenance, and provisioning. SAP is carrying out a project in Walldorf which is concerned with testing the LaaS Cloud for SAP products (Center of Excellence – CoE), which is addresses large SAP customers which allow for the implementation of Proof-of-Concept projects. For this purpose, fluid Operations GmbH has developed a monitoring and management software (SAP Manager), which enables the centralized monitoring of the hosted data centers (Virtual Private Data Center – VPDC). Moreover, information and performance data regarding virtual and physical resources is gathered from various sources and centrally consolidated (in the backend of the SAP Manager).
This thesis paper deals with the further development of this software, so as to come one step closer the vision of an Autonomic Data Center with a self-service portal for customers. The developments are based on a comprehensive requirements assessment at the CoE in Walldorf, and on an analysis of existing monitoring solutions in the fields of application-, grid and cloud-monitoring.
As a result of the analysis, a higher requirement for SAP application monitoring, as well as for the monitoring and management of VPDC network configurations (DNS, DHCP, firewall, routes) is determined. Therefore, this thesis paper deals with the connection of VPDC and SAP applications to the SAP Manager. This is realized through an agent on the VLM, which is the central component to each VL. In addition to systems monitoring, triggered monitoring information for the support of work processes is also required at the CoE.
One of these workflows, which is also supported in this thesis paper, deals with the operating systems update for the individual virtual machines (VMs) of a VPDC. In the SAP environment, this is not a trivial problem, since the individual upon one another dependant SAP instances must determine, must shut down in the correct order after the update, and must be restarted after the subsequent restart of the VM. The software created through this thesis paper is already being used efficiently, and has led to numerous improvements in the monitoring and management of the SAP Laas Cloud.