lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua O'Madadhain" <jmad...@ics.uci.edu>
Subject inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 18:14:29 GMT
Incorporating inter-term correlation into Lucene isn't that hard; I've 
done it.  Nor is it incompatible with the vector-space model.  I'm not 
happy with the specific correlation metric that I picked, which is why 
I'm not eager to generally release the code I wrote, but I think that 
the basic mechanism that I came up with (query expansion via correlated 
terms, where the added terms were boosted according to the strength of 
the correlation) is fairly sound.  And I didn't need any changes to 
Lucene to do this.

You can get some details on the specific mechanism that I used here, if 
you're interested:

http://www.ics.uci.edu/~jmadden/research/index.html

(and go down to "Fuzzy Term Expansion and Document Reweighting", about 
halfway down.)

If you decide that my ideas are interesting enough that you want to 
have a look at my code, let me know, and perhaps we can work something 
out.

Regards,

Joshua O'Madadhain

On Friday, Nov 14, 2003, at 09:52 US/Pacific, Chong, Herb wrote:

> i don't know of any open source search engine that incorporates 
> interterm correlation. i have been looking into how to do this in 
> Lucene and so far, it's not been promising. the indexing engine and 
> file format needs to be changed. there are very few search engines 
> that incorporate interterm correlation in any mathematically and 
> linguistically rigorous manner. i designed a couple, but they were all 
> research experiments.
>
> if you are familiar with the TREC automatic adhoc track? my 
> experiments with the TREC-5 to TREC-7 questions produced about 0.05 to 
> 0.10 improvement in average precision by proper use of interterm 
> correlation. my project at the time was cancelled after TREC-7 and so 
> there haven't been any new developments.
>
  jmadden@ics.uci.edu...Obscurium Per 
Obscurius...www.ics.uci.edu/~jmadden
   Joshua O'Madadhain: Information Scientist, Musician, 
Philosopher-At-Tall
  It's that moment of dawning comprehension that I live for--Bill 
Watterson
My opinions are too rational and insightful to be those of any 
organization.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message