I am deeply indebted to Dr. Michael Berry, my major advisor, for his kind guidance and support. I also thank Dr. Susan Dumais, director of the Information Sciences Research Group at Bellcore, for her technical advice. In addition, she graciously allowed us
comparingtherepresentationofthequerytotherepresentationofeachdocumentinthespace,andcanretrievedocumentsthatdon’tnecessarilycontainoneofthesearchterms.Althoughthevector-spacetechniquessharecommoncharacteristicswithothertechniquesintheinformationretrievalhierarchy,theyallshareacoresetofsimilaritiesthatjustifytheirownclass.
Vector-spacemodelsrelyonthepremisethatthemeaningofadocumentcanbederivedfromthedocument’sconstituentterms.Theyrepresentdocumentsasvectorsofterms12where1isanon-negativevaluedenotingthesingleormultipleoccurrencesofterm
representedasavectorindocument.Thus,eachuniqueterminwheretermthedocumentcollectioncorrespondstoadimensioninthespace.Similarly,aqueryis121isanon-negativevaluedenotingthenumberofoccurrencesof(or,merelya1tosignifytheoccurrenceofterm)inthequery[BC87].Boththedocumentvectorsandthequeryvectorprovidethelocationsoftheobjectsintheterm-documentspace.Bycomputingthedistancebetweenthequeryandotherobjectsinthespace,objectswithsimilarsemanticcontenttothequerypresumablywillberetrieved.
Vector-spacemodelsthatdon’tattempttocollapsethedimensionsofthespacetreateachtermindependently,essentiallymimickinganinvertedindex[FBY92].However,vector-spacemodelsaremore exiblethaninvertedindicessinceeachtermcanbeindividuallyweighted,allowingthattermtobecomemoreorlessimportantwithinadocumentortheentiredocumentcollectionasawhole.Also,byapplyingdifferentsimilaritymeasurestocomparequeriestotermsanddocuments,propertiesofthedoc-umentcollectioncanbeemphasizedordeemphasized.Forexample,thedotproduct(or,innerproduct)similaritymeasure ndstheEuclideandistancebetweenthequeryandatermordocumentinthespace.Thecosinesimilaritymeasure,ontheotherhand,bycomputingtheanglebetweenthequeryandatermordocumentratherthanthedistance,deemphasizesthelengthsofthevectors.Insomecases,thedirectionsofthevectorsareamorereliableindicationofthesemanticsimilaritiesoftheobjectsthanthedistancebetweentheobjectsintheterm-documentspace[FBY92].
7
百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说综合文库Toward Large-Scale Information Retrieval Using Latent Semant(16)在线全文阅读。
相关推荐: