77范文网 - 专业文章范例文档资料分享平台

中文文本分类特征提取方法的研究与实现(4)

来源:网络收集 时间:2020-12-24 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:或QQ: 处理(尽可能给您提供完整文档),感谢您的支持与谅解。点击这里给我发消息

中文文本分类特征提取方法的研究与实现

ABSTRACT

With the development of society, especially the rapid development of network technology, various types of information get an exponential growth. Text classification can manage huge and heterogeneous data effectively. Information retrieval and filtering, which based on text classification, helps people get the required information in the huge data and helps people work more effectively. Text classification techniques have become popular and significant research topic.

This thesis does the detailed study and analysis on key techniques of text classification firstly, then focuses on the study of feature selection and proposes a new feature selection method. Finally, we design and realize the TC system by new method.

① Do analysis on the process and key techniques of TC, and do study on text feature selection methods. We find that negative feature and poor correlation feature effect the quality of selected feature by comparing several common methods which based filter model. Feature selection, this paper proposes a new approach of feature selection for TC, which is based on the strong class correlation and positive class correlation, named SP. SP can eliminate the effect of negative feature and poor correlation feature effectively by selecting positive and strong features, and then get high quality features.

② SP has been applied in designing and realizing the Chinese text classification system (CTCS), we do the overall design of CTCS and detailed design of modules of CTCS. This paper study on Chinese grammar analysis tool package ICTCLAS and Full-text search package Lucene, and then combines ICTCLAS and Lucene to be a solution of realizing CTCS, finally realize the CTCS.

③ We do many comparison experiments on new feature selection method SP and common method, such as DF, CHI.etc. This paper evaluates the result of classification by several classification performance evaluations. The result of experiments indicates the new feature selection method SP can select quality features, construct low- dimensional feature vector and reduce the dimensionality of feature space. SP has a good performance on feature selection in Chinese text classification, reflecting the degree of difference among classes.

Keywords: Text Classification, Feature Dimensionality Reduction,

Feature Selection, Class Positive Correlation, Class Strong Correlation

百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说综合文库中文文本分类特征提取方法的研究与实现(4)在线全文阅读。

中文文本分类特征提取方法的研究与实现(4).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印 下载失败或者文档不完整,请联系客服人员解决!
本文链接:https://www.77cn.com.cn/wenku/zonghe/1170008.html(转载请注明文章来源)
Copyright © 2008-2022 免费范文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ: 邮箱:tiandhx2@hotmail.com
苏ICP备16052595号-18
× 注册会员免费下载(下载后可以自由复制和排版)
注册会员下载
全站内容免费自由复制
注册会员下载
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: