FedX is a practical framework for transparent access to Linked Data sources through a federation. It incorporates new sophisticated optimization techniques combined with effective variants of existing techniques and is thus a highly scalable solution for practical federated query processing.
FedX allows to easily setup on-demand federations by specifying a list of relevant endpoints (e.g. from the LOD cloud) and to query these federations via SPARQL in a transparent and efficient way.
- Virtual Integration of heterogeneous Linked Data sources (e.g. as SPARQL endpoints)
- Transparent access to data sources through a federation
- Efficient query processing in federated environments
- On-demand federation setup at query time
- Fast and effective query execution due to new optimization techniques for federated setups
- Practical applicability & easy integration as a Sesame SAIL
- Comprehensive CLI for federated query processing from the command line
FedX is built on top of Sesame 2.6 and constitutes a practical federation layer as a Sesame SAIL implementation. It allows to virtually integrate Linked Data sources using any Sesame SAIL mediator (e.g. as SPARQL endpoint or NativeStore) and the (virtually) combined RDF graph of those sources can be used or federated query processing. Going beyond the federation support defined in the SPARQL 1.1 federation extensions, FedX makes it possible to dynamically set up federations over distributed sources and efficiently execute standard SPARQL queries (i.e., queries without federation extensions) transparently over the federation. FedX incorporates the following optimization techniques to allow for efficient query processing in a distributed setting:
- Statement sources: Examine relevant statement sources using SPARQL ASK queries
- Join order: Join reordering using variable counting technique & heuristics
- Bound joins: Compute joins as block nested loop
- Exclusive groups: Group statements with same relevant source
Details to the optimization techniques can be found in our publications listed below. All of our optimization techniques use SPARQL 1.0 features only, making our solution suitable in contemporary environments. However, we are working on an integration of SPARQL 1.1 language extensions to support the upcoming facilities. Note that our system does not need any preprocessed metadata such as statistics or indices, which makes it suitable for on-demand query processing. In particular, FedX is designed to work as an out-of-the box system.
FedX is licensed under the GNU Affero General Public License (AGPL) for use in open source applications.
For proprietary, closed source applications, and other commercial applications, we offer alternative
license terms upon request.
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. ESWC 2011.
FedX: Optimization Techniques for Federated Query Processing on Linked Data
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, Michael Schmidt. ISWC 2011.
ESWC FedX Poster
ISWC Presentation Slides
FedX originated from a Master Thesis by Andreas Schwarte (fluidOperations AG Germany) in cooperation with the Cluster of Excellence at the Saarland University. Thesis supervisors and their contact information are listed below.
- Peter Haase (fluid Operations AG, Germany)
- Michael Schmidt (fluid Operations AG, Germany)
- Katja Hose (Max-Planck Institute for Informatics, Saarbrücken, Germany)
- Ralf Schenkel (Max-Planck Institute for Informatics, Saarbrücken, Germany)
Further research was supported by the German Federal Ministry of Education and Research (BMBF) in the CollabCloud project.
To keep up to date with FedX developments, you can register to our mailing list iwb-discussion (at) googlegroups.com. In case of any further questions, concerns or issues, please do not hesitate to send an email to Andreas.Schwarte (at) fluidops.com.