lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magnus Johansson" <mag...@technohuman.com>
Subject Re: understanding IR topics on this list [was: Re: Vector Space Model in Lucene?]
Date Sun, 16 Nov 2003 17:48:04 GMT
I would also like to recommend "Modern Information Retrieval"
by Ricardo Baeza-Yates 

/magnus 


Gerret Apelt writes: 

> Dror -- 
> 
> I just completed an introductory course in IR. I can recommend the 
> textbook we used: "Managing Gigabytes: Compressing and Indexing Documents 
> and Images". When I don't understand posts on this list I can typically 
> look up the theory in that book, then come back to the list and have a 
> better idea of whats going on. "Managing Gigabytes" appears to be getting 
> good reviews from most readers, but I can't compare it to similar works as 
> I haven't read any. 
> 
> I've spent some time searching for websites that introduce advanced IR 
> topics at a level that is less rigorous than academic papers. But I 
> haven't really found anything I can recommend. Suggestions welcome :) 
> 
> cheers,
> Gerret
> **
> Dror Matalon wrote: 
> 
>> Hi, 
>> 
>> I might be the only person on the list who's having a hard time
>> following this discussion. Would one of you wise folks care to point me
>> to a good "dummies", also known as an executive summary, resource about
>> the theoretical background of all of this. I understand the basic
>> premise of collecting the "words" and having pointers to documents and
>> weights, but beyond that ... 
>> 
>> TIA, 
>> 
>> Dror 
>> 
>> On Fri, Nov 14, 2003 at 12:52:15PM -0500, Chong, Herb wrote:
>>   
>> 
>>> i don't know of any open source search engine that incorporates 
>>> interterm correlation. i have been looking into how to do this in Lucene 
>>> and so far, it's not been promising. the indexing engine and file format 
>>> needs to be changed. there are very few search engines that incorporate 
>>> interterm correlation in any mathematically and linguistically rigorous 
>>> manner. i designed a couple, but they were all research experiments. 
>>> 
>>> if you are familiar with the TREC automatic adhoc track? my experiments 
>>> with the TREC-5 to TREC-7 questions produced about 0.05 to 0.10 
>>> improvement in average precision by proper use of interterm correlation. 
>>> my project at the time was cancelled after TREC-7 and so there haven't 
>>> been any new developments. 
>>> 
>>> Herb.... 
>>> 
>>> -----Original Message-----
>>> From: Andrzej Bialecki [mailto:ab@getopt.org]
>>> Sent: Friday, November 14, 2003 12:39 PM
>>> To: Lucene Users List
>>> Subject: Re: Vector Space Model in Lucene? 
>>> 
>>> Herb.... 
>>> 
>>> Hmm... Are you perhaps familiar with some open system which doesn't? I'm 
>>> curious because one of my projects (already using Lucene) could benefit 
>>> from such feature. Right now I'm using a bastardized version of Markov 
>>> chains, but it's more of a hack... 
>>> 
>>> -- 
>>> Best regards,
>>> Andrzej Bialecki 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org 
>>> 
>>>     
>>> 
>> 
>>   
>> 
>  
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org 
> 
 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message