77范文网 - 专业文章范例文档资料分享平台

I. Ontology-based Information Retrieval(2)

来源:网络收集 时间:2021-04-06 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:或QQ: 处理(尽可能给您提供完整文档),感谢您的支持与谅解。点击这里给我发消息

Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce

Model elements can be also used for search and retrieval of relevant documents. In case all documents are linked to the same domain model, it is possible to calculate a similarity between documents using the abovementioned conceptual structure of this domain model. Such approach supports also ‘soft’ techniques, where a search engine can utilize the domain model to find concepts related to those specified by user. The search engine can thus return every document linked to the concepts, which are close enough to the concepts mentioned in the user’s query.

In order to evaluate efficiency retrieval of such an ontology-based approach, we did a series of experiments with two other, frequently used techniques for information retrieval (vector model with tf-idf weight schema and latent semantic indexing model). In the following section 2, all three retrieval methods are briefly described. Section 3 describes for the experiments used data set as well as the results achieved. Finally, section 4 provides a summary of the experimental results and suggestions for future work.

2. SCHEME OF DOCUMENT RETRIEVAL

We developed package with three different approaches to document retrieval: vector representation, latent semantic indexing method (LSI), and ontology-based method used in the Webocrat system. In next sub-chapters, each of these approaches is briefly described.

2.1. VECTOR REPRESENTATION APPROACH

This well know approach is based on vector representation of document collection. First of all every document is passed through set of pre-processing tools (lower case, stop words filter, document frequency). Then a vector of index term weights is calculated as the document internal representation. These weights are calculated by most often used tf-idf scheme [4]:

wij=tfij×idfi Nwhere tfij= and idfi=log, maxefreqejnifreqij

freqij is the number of occurrences of term ti in document dj, N is number of documents in collection, and niis the document frequency for term ti in the whole document collection.

Such a vector is then normalized to unit length and stored into the term-document matrix A, which is internal representation of the whole document collection.

In order to find some relevant document to a specific query Q it is necessary to represent the query Q in the same way as a document Di (i.e. a vector of index term weights). Similarity between a query Q and a document Di is computed as cosine of those two normalized vectors (document and query vectors).

simTF IDF(Q,Di)=Di×Q

DiQ

2.2. LATENT SEMANTIC INDEXING APPROACH

LSI approach is based on singular value decomposition of tf-idf matrix A. By this decomposition three matrixes are computed [8].

A=USVT

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说教育文库I. Ontology-based Information Retrieval(2)在线全文阅读。

I. Ontology-based Information Retrieval(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印 下载失败或者文档不完整,请联系客服人员解决!
本文链接:https://www.77cn.com.cn/wenku/jiaoyu/1214189.html(转载请注明文章来源)
Copyright © 2008-2022 免费范文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ: 邮箱:tiandhx2@hotmail.com
苏ICP备16052595号-18
× 注册会员免费下载(下载后可以自由复制和排版)
注册会员下载
全站内容免费自由复制
注册会员下载
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: