lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: Vector Space Model in Lucene?
Date Fri, 14 Nov 2003 19:27:42 GMT
Hi,

I might be the only person on the list who's having a hard time
following this discussion. Would one of you wise folks care to point me
to a good "dummies", also known as an executive summary, resource about
the theoretical background of all of this. I understand the basic
premise of collecting the "words" and having pointers to documents and
weights, but beyond that ...

TIA,

Dror

On Fri, Nov 14, 2003 at 12:52:15PM -0500, Chong, Herb wrote:
> i don't know of any open source search engine that incorporates interterm correlation.
i have been looking into how to do this in Lucene and so far, it's not been promising. the
indexing engine and file format needs to be changed. there are very few search engines that
incorporate interterm correlation in any mathematically and linguistically rigorous manner.
i designed a couple, but they were all research experiments.
> 
> if you are familiar with the TREC automatic adhoc track? my experiments with the TREC-5
to TREC-7 questions produced about 0.05 to 0.10 improvement in average precision by proper
use of interterm correlation. my project at the time was cancelled after TREC-7 and so there
haven't been any new developments.
> 
> Herb....
> 
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org]
> Sent: Friday, November 14, 2003 12:39 PM
> To: Lucene Users List
> Subject: Re: Vector Space Model in Lucene?
> 
> Herb....
> 
> Hmm... Are you perhaps familiar with some open system which doesn't? I'm 
> curious because one of my projects (already using Lucene) could benefit 
> from such feature. Right now I'm using a bastardized version of Markov 
> chains, but it's more of a hack...
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message