lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerret Apelt <>
Subject understanding IR topics on this list [was: Re: Vector Space Model in Lucene?]
Date Sun, 16 Nov 2003 06:02:53 GMT
Dror --

I just completed an introductory course in IR. I can recommend the 
textbook we used: "Managing Gigabytes: Compressing and Indexing 
Documents and Images". When I don't understand posts on this list I can 
typically look up the theory in that book, then come back to the list 
and have a better idea of whats going on. "Managing Gigabytes" appears 
to be getting good reviews from most readers, but I can't compare it to 
similar works as I haven't read any.

I've spent some time searching for websites that introduce advanced IR 
topics at a level that is less rigorous than academic papers. But I 
haven't really found anything I can recommend. Suggestions welcome :)

Dror Matalon wrote:

>I might be the only person on the list who's having a hard time
>following this discussion. Would one of you wise folks care to point me
>to a good "dummies", also known as an executive summary, resource about
>the theoretical background of all of this. I understand the basic
>premise of collecting the "words" and having pointers to documents and
>weights, but beyond that ...
>On Fri, Nov 14, 2003 at 12:52:15PM -0500, Chong, Herb wrote:
>>i don't know of any open source search engine that incorporates interterm correlation.
i have been looking into how to do this in Lucene and so far, it's not been promising. the
indexing engine and file format needs to be changed. there are very few search engines that
incorporate interterm correlation in any mathematically and linguistically rigorous manner.
i designed a couple, but they were all research experiments.
>>if you are familiar with the TREC automatic adhoc track? my experiments with the TREC-5
to TREC-7 questions produced about 0.05 to 0.10 improvement in average precision by proper
use of interterm correlation. my project at the time was cancelled after TREC-7 and so there
haven't been any new developments.
>>-----Original Message-----
>>From: Andrzej Bialecki []
>>Sent: Friday, November 14, 2003 12:39 PM
>>To: Lucene Users List
>>Subject: Re: Vector Space Model in Lucene?
>>Hmm... Are you perhaps familiar with some open system which doesn't? I'm 
>>curious because one of my projects (already using Lucene) could benefit 
>>from such feature. Right now I'm using a bastardized version of Markov 
>>chains, but it's more of a hack...
>>Best regards,
>>Andrzej Bialecki
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message