TY - GEN
T1 - Provenance Context Entity (PaCE)
T2 - 22nd International Conference on Scientific and Statistical Database Management, SSDBM 2010
AU - Sahoo, Satya S.
AU - Bodenreider, Olivier
AU - Hitzler, Pascal
AU - Sheth, Amit
AU - Thirunarayan, Krishnaprasad
PY - 2010
Y1 - 2010
N2 - The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.
AB - The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.
KW - Biomedical knowledge repository
KW - Context theory
KW - Provenance context entity
KW - Provenir ontology
KW - RDF reification
UR - http://www.scopus.com/inward/record.url?scp=77955047629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955047629&partnerID=8YFLogxK
UR - https://corescholar.libraries.wright.edu/knoesis/17
U2 - 10.1007/978-3-642-13818-8_32
DO - 10.1007/978-3-642-13818-8_32
M3 - Conference contribution
SN - 3642138179
SN - 9783642138171
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 461
EP - 470
BT - Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings
A2 - Gertz, Michael
A2 - Ludäscher, Bertram
PB - Springer Berlin Heidelberg
Y2 - 30 June 2010 through 2 July 2010
ER -