I. Ontology-based Information Retrieval(2)

来源：网络收集时间：2021-04-06 下载这篇文档手机版

说明：文章内容仅供预览，部分内容可能不全，需要完整文档或者需要复制内容，请下载word后使用。下载word有问题请添加微信号:或QQ：处理（尽可能给您提供完整文档），感谢您的支持与谅解。

Abstract: In the proposed article a new, ontology-based approach to information retrieval (IR) is presented. The system is based on a domain knowledge representation schema in form of ontology. New resources registered within the system are linked to conce

Model elements can be also used for search and retrieval of relevant documents. In case all documents are linked to the same domain model, it is possible to calculate a similarity between documents using the abovementioned conceptual structure of this domain model. Such approach supports also ‘soft’ techniques, where a search engine can utilize the domain model to find concepts related to those specified by user. The search engine can thus return every document linked to the concepts, which are close enough to the concepts mentioned in the user’s query.

In order to evaluate efficiency retrieval of such an ontology-based approach, we did a series of experiments with two other, frequently used techniques for information retrieval (vector model with tf-idf weight schema and latent semantic indexing model). In the following section 2, all three retrieval methods are briefly described. Section 3 describes for the experiments used data set as well as the results achieved. Finally, section 4 provides a summary of the experimental results and suggestions for future work.

2. SCHEME OF DOCUMENT RETRIEVAL

We developed package with three different approaches to document retrieval: vector representation, latent semantic indexing method (LSI), and ontology-based method used in the Webocrat system. In next sub-chapters, each of these approaches is briefly described.

2.1. VECTOR REPRESENTATION APPROACH

This well know approach is based on vector representation of document collection. First of all every document is passed through set of pre-processing tools (lower case, stop words filter, document frequency). Then a vector of index term weights is calculated as the document internal representation. These weights are calculated by most often used tf-idf scheme [4]:

wij=tfij×idfi Nwhere tfij= and idfi=log, maxefreqejnifreqij

freqij is the number of occurrences of term ti in document dj, N is number of documents in collection, and niis the document frequency for term ti in the whole document collection.

Such a vector is then normalized to unit length and stored into the term-document matrix A, which is internal representation of the whole document collection.

In order to find some relevant document to a specific query Q it is necessary to represent the query Q in the same way as a document Di (i.e. a vector of index term weights). Similarity between a query Q and a document Di is computed as cosine of those two normalized vectors (document and query vectors).

simTF IDF(Q,Di)=Di×Q

DiQ

2.2. LATENT SEMANTIC INDEXING APPROACH

LSI approach is based on singular value decomposition of tf-idf matrix A. By this decomposition three matrixes are computed [8].

A=USVT

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读，免费范文网，提供经典小说教育文库I. Ontology-based Information Retrieval(2)在线全文阅读。

I. Ontology-based Information Retrieval(2).doc 将本文的Word文档下载到电脑，方便复制、编辑、收藏和打印下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档

本文链接：https://www.77cn.com.cn/wenku/jiaoyu/1214189.html（转载请注明文章来源）

上一篇：接龙小学教师2011年暑期培训学习活动总结
下一篇：浙江省温州市十校联合体2014-2015学年高一地理上学期期中联考试