I am deeply indebted to Dr. Michael Berry, my major advisor, for his kind guidance and support. I also thank Dr. Susan Dumais, director of the Information Sciences Research Group at Bellcore, for her technical advice. In addition, she graciously allowed us
comparedtotheotherarraysinthespace.Sincedocumentssharingsimilarnotionswouldpresumablylieneareachotherinthenotion-space,asimilaritymeasurecouldbeusedto nddocumentsrelevanttothequery.Theresultsofthesearchcouldthenbeusedtobetterspecifythequery,allowingbetterretrievalperformance.
Toaidinde ningthenotionsforadocumentcollection,Luhnsuggestedtwodictionarieswereneeded.The rstdictionarywouldcontainalistingofthenotionsusedtode nethedocuments’positioninthespacealongwiththeircorrespondingindexnumbers.Theseconddictionarywouldcontainanalphabeticallistingofthewordsworthindexinginthecollectionandthenotionstowhichthewordsbelonged.Byexaminingthetwodictionaries,thearraysusedtorepresentqueriesandnewdocumentscouldbeconstructedautomatically.
AlthoughLuhn’sproposedvector-spacerepresentationlackedmanyimportantimplementationdetails,itprovidedanintuitiveexplanationofthepurposeforthevector-spacemodelandlaidthefoundationforlaterimplementationsandimprove-ments.Modern,morecomplexretrievalsystemsareatleastpartiallybasedontheideaspresentedbyLuhninthe1950’s.However,itisinterestingtonotethatsomeoftheissuesheraisedover40yearsago(forexample,howtoovercomethedif cultiesofsynonomyandpolysemy)havenotyetbeentotallyresolved.
LSIwasdevelopedtosolvemanyoftheinformationretrievalproblemsLuhnan-ticipatedinthe1950’s.TheLSImodelwillbediscussedinSection2.3.
2.2.2BorkoandBernickonReduced-SpaceDocumentClassi cationAshorttimeafterLuhn’sideaswerepublished,H.BorkoandM.Bernick[BB63]pre-sentedamethodbywhichdocumentscouldautomaticallybeclassi edintoprede nedcategories.Althoughdocumentclassi cationhasdifferentgoalsthaninformationre-trieval,BorkoandBernick’sapproachtodocumentclassi cationcanbeviewedasaspecialcaseofinformationretrieval.Likeallvector-spaceapproaches,BorkoandBer-nickassumedthetermsinadocumentwereafairlyreliableindicatorofthesemanticcontentofthedocument.Theyhopeddocumentsbelongingtothesameclassi cation
10
百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说综合文库Toward Large-Scale Information Retrieval Using Latent Semant(19)在线全文阅读。
相关推荐: