Schema Matching for Large-Scale Data Based on Ontology Clustering Method
Main Authors: | Alani, Harith Oraibi; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia, Saad, Saidah; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia |
---|---|
Other Authors: | Universiti Kebangsaan Malaysia |
Format: | Article info application/pdf eJournal |
Bahasa: | eng |
Terbitan: |
International Journal on Advanced Science, Engineering and Information Technology
, 2017
|
Subjects: | |
Online Access: |
http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133 http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133/pdf_546 |
ctrlnum |
article-2133 |
---|---|
fullrecord |
<?xml version="1.0"?>
<dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><title lang="en-US">Schema Matching for Large-Scale Data Based on Ontology Clustering Method</title><creator>Alani, Harith Oraibi; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia</creator><creator>Saad, Saidah; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia</creator><subject lang="en-US">automatic schema matching; large-scale data; ontology; clustering; web interfaces</subject><description lang="en-US">Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.</description><publisher lang="en-US">International Journal on Advanced Science, Engineering and Information Technology</publisher><contributor lang="en-US">Universiti Kebangsaan Malaysia</contributor><date>2017-10-30</date><type>Journal:Article</type><type>Other:info:eu-repo/semantics/publishedVersion</type><type>Other:</type><type>File:application/pdf</type><identifier>http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133</identifier><identifier>10.18517/ijaseit.7.5.2133</identifier><source lang="en-US">International Journal on Advanced Science, Engineering and Information Technology; Vol 7, No 5 (2017); 1790-1797</source><source>2460-6952</source><source>2088-5334</source><language>eng</language><relation>http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133/pdf_546</relation><rights lang="en-US">Authors who publish with this journal agree to the following terms:Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).</rights><recordID>article-2133</recordID></dc>
|
language |
eng |
format |
Journal:Article Journal Other:info:eu-repo/semantics/publishedVersion Other Other: File:application/pdf File Journal:eJournal |
author |
Alani, Harith Oraibi; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia Saad, Saidah; Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia |
author2 |
Universiti Kebangsaan Malaysia |
title |
Schema Matching for Large-Scale Data Based on Ontology Clustering Method |
publisher |
International Journal on Advanced Science, Engineering and Information Technology |
publishDate |
2017 |
topic |
automatic schema matching large-scale data ontology clustering web interfaces |
url |
http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133 http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/2133/pdf_546 |
contents |
Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching. These approaches were mainly dependent on clustering techniques. In fact, clustering aims to group the similar fields within the schemas in multiple groups or clusters. However, fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the XBenchMatch which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation. |
id |
IOS1116.article-2133 |
institution |
Indonesian Society for Knowledge and Human Development |
institution_id |
204 |
institution_type |
library:special library |
library |
Indonesian Society for Knowledge and Human Development |
library_id |
78 |
collection |
International Journal on Advanced Science, Engineering and Information Technology |
repository_id |
1116 |
subject_area |
Program Komputer dan Teknologi Informasi |
city |
- |
province |
DKI JAKARTA |
repoId |
IOS1116 |
first_indexed |
2017-10-19T19:39:21Z |
last_indexed |
2017-11-09T19:33:21Z |
recordtype |
dc |
_version_ |
1722528162884091904 |
score |
17.610468 |