TY - GEN
T1 - On metrics for measuring fragmentation of federation over SPARQL endpoints
AU - Rakhmawati, Nur Aini
AU - Karnstedt, Marcel
AU - Hausenblas, Michael
AU - Decker, Stefan
PY - 2014
Y1 - 2014
N2 - Processing a federated query in Linked Data is challenging because it needs to consider the number of sources, the source locations as well as heterogeneous system such as hardware, software and data structure and distribution. In this work, we investigate the relationship between the data distribution and the communication cost in a federated SPARQL query framework. We introduce the spreading factor as a dataset metric for computing the distribution of classes and properties throughout a set of data sources. To observe the relationship between the spreading factor and the communication cost, we generate 9 datasets by using several data fragmentation and allocation strategies. Our experimental results showed that the spreading factor is correlated with the communication cost between a federated engine and the SPARQL endpoints . In terms of partitioning strategies, partitioning triples based on the properties and classes can minimize the communication cost. However, such partitioning can also reduce the performance of SPARQL endpoint within the federation framework.
AB - Processing a federated query in Linked Data is challenging because it needs to consider the number of sources, the source locations as well as heterogeneous system such as hardware, software and data structure and distribution. In this work, we investigate the relationship between the data distribution and the communication cost in a federated SPARQL query framework. We introduce the spreading factor as a dataset metric for computing the distribution of classes and properties throughout a set of data sources. To observe the relationship between the spreading factor and the communication cost, we generate 9 datasets by using several data fragmentation and allocation strategies. Our experimental results showed that the spreading factor is correlated with the communication cost between a federated engine and the SPARQL endpoints . In terms of partitioning strategies, partitioning triples based on the properties and classes can minimize the communication cost. However, such partitioning can also reduce the performance of SPARQL endpoint within the federation framework.
KW - Data distribution
KW - Federated SPARQL query
KW - Linked data
KW - SPARQL endpoint
UR - http://www.scopus.com/inward/record.url?scp=84902385845&partnerID=8YFLogxK
U2 - 10.5220/0004760101190126
DO - 10.5220/0004760101190126
M3 - Conference contribution
AN - SCOPUS:84902385845
SN - 9789897580239
T3 - WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies
SP - 119
EP - 126
BT - WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies
PB - SciTePress
T2 - 10th International Conference on Web Information Systems and Technologies, WEBIST 2014
Y2 - 3 April 2014 through 5 April 2014
ER -