lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <HCho...@bloomberg.com>
Subject RE: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 18:13:35 GMT
if you didn't have to change the index then you haven't got all the factors needed to do it
well. terms can't cross sentence boundaries and the index doesn't store sentence boundaries.

Herb...

-----Original Message-----
From: Joshua O'Madadhain [mailto:jmadden@ics.uci.edu]
Sent: Friday, November 14, 2003 1:14 PM
To: Lucene Users List
Subject: inter-term correlation [was Re: Vector Space Model in Lucene?]


Incorporating inter-term correlation into Lucene isn't that hard; I've 
done it.  Nor is it incompatible with the vector-space model.  I'm not 
happy with the specific correlation metric that I picked, which is why 
I'm not eager to generally release the code I wrote, but I think that 
the basic mechanism that I came up with (query expansion via correlated 
terms, where the added terms were boosted according to the strength of 
the correlation) is fairly sound.  And I didn't need any changes to 
Lucene to do this.

You can get some details on the specific mechanism that I used here, if 
you're interested:

http://www.ics.uci.edu/~jmadden/research/index.html

(and go down to "Fuzzy Term Expansion and Document Reweighting", about 
halfway down.)

If you decide that my ideas are interesting enough that you want to 
have a look at my code, let me know, and perhaps we can work something 
out.

Regards,

Joshua O'Madadhain

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message