lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua O'Madadhain" <>
Subject Re: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 18:52:39 GMT
Not sure what you mean by "terms can't cross sentence boundaries".  If 
you're only using single-word terms, that's trivially true.  What is it 
that you're trying to achieve, exactly?  (Your comment makes it sound 
as though you simultaneously want and don't want sentence boundaries, 
so I'm confused.)


On Friday, Nov 14, 2003, at 10:13 US/Pacific, Chong, Herb wrote:

> if you didn't have to change the index then you haven't got all the 
> factors needed to do it well. terms can't cross sentence boundaries 
> and the index doesn't store sentence boundaries.
> Herb...
> -----Original Message-----
> From: Joshua O'Madadhain []
> Sent: Friday, November 14, 2003 1:14 PM
> To: Lucene Users List
> Subject: inter-term correlation [was Re: Vector Space Model in Lucene?]
> Incorporating inter-term correlation into Lucene isn't that hard; I've
> done it.  Nor is it incompatible with the vector-space model.  I'm not
> happy with the specific correlation metric that I picked, which is why
> I'm not eager to generally release the code I wrote, but I think that
> the basic mechanism that I came up with (query expansion via correlated
> terms, where the added terms were boosted according to the strength of
> the correlation) is fairly sound.  And I didn't need any changes to
> Lucene to do this.
> You can get some details on the specific mechanism that I used here, if
> you're interested:
> (and go down to "Fuzzy Term Expansion and Document Reweighting", about
> halfway down.)
> If you decide that my ideas are interesting enough that you want to
> have a look at my code, let me know, and perhaps we can work something
> out.
> Regards,
> Joshua O'Madadhain
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> Per
   Joshua O'Madadhain: Information Scientist, Musician, 
  It's that moment of dawning comprehension that I live for--Bill 
My opinions are too rational and insightful to be those of any 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message