ctrlnum article-2133
fullrecord <?xml version="1.0"?> <dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><title lang="en-US">Schema Matching for Large-Scale Data Based on Ontology Clustering Method</title><creator>Alani, Harith Oraibi; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia</creator><creator>Saad, Saidah; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia</creator><subject lang="en-US">automatic schema matching; large-scale data; ontology; clustering; web interfaces</subject><description lang="en-US">Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.</description><publisher lang="en-US">International Journal on Advanced Science, Engineering and Information Technology</publisher><contributor lang="en-US">Universiti Kebangsaan Malaysia</contributor><date>2017-10-30</date><type>Journal:Article</type><type>Other:info:eu-repo/semantics/publishedVersion</type><type>Other:</type><type>File:application/pdf</type><identifier>http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133</identifier><identifier>10.18517/ijaseit.7.5.2133</identifier><source lang="en-US">International Journal on Advanced Science, Engineering and Information Technology; Vol 7, No 5 (2017); 1790-1797</source><source>2460-6952</source><source>2088-5334</source><language>eng</language><relation>http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133/pdf_546</relation><rights lang="en-US">Authors who publish with this journal agree to the following terms:Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a&#xA0;Creative Commons Attribution License&#xA0;that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See&#xA0;The Effect of Open Access).</rights><recordID>article-2133</recordID></dc>
language eng
format Journal:Article
Journal
Other:info:eu-repo/semantics/publishedVersion
Other
Other:
File:application/pdf
File
Journal:eJournal
author Alani, Harith Oraibi; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia
Saad, Saidah; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia
author2 Universiti Kebangsaan Malaysia
title Schema Matching for Large-Scale Data Based on Ontology Clustering Method
publisher International Journal on Advanced Science, Engineering and Information Technology
publishDate 2017
topic automatic schema matching
large-scale data
ontology
clustering
web interfaces
url http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133
http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133/pdf_546
contents Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.
id IOS1116.article-2133
institution Indonesian Society for Knowledge and Human Development
institution_id 204
institution_type library:special
library
library Indonesian Society for Knowledge and Human Development
library_id 78
collection International Journal on Advanced Science, Engineering and Information Technology
repository_id 1116
subject_area Program Komputer dan Teknologi Informasi
city -
province DKI JAKARTA
repoId IOS1116
first_indexed 2017-10-19T19:39:21Z
last_indexed 2017-11-09T19:33:21Z
recordtype dc
_version_ 1722528162884091904
score 17.610468