lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: Revival of Dmitry's Term Vector patches
Date Thu, 18 Sep 2003 22:17:16 GMT
Otis Gospodnetic wrote:

>Dmitry and others,
>One of the relatively frequently asked for features is 'conceptual
>search', or 'search by similarity', etc.  Lucene does not store term
>vectors in its index, so such searches cannot be supported.
>However, almost two years ago, Dmitry provided a large set of patches
>that added term vector support to Lucene.  We never applied those
>patches for some reason, even though the patches looked really good.
>The other day I looked at Dmitry's two year old email again.
>I applied a few diffs to my copy of Lucene and added new classes that
>Dmitry wrote in order to add term vector support, to the source tree.
>Unfortunately, lots of classes changed over the last two years, and not
>all patches will apply.
>I was wondering, Dmitry, if you have your term vector changes
>integrated with the current version of Lucene.  If you do, would it be
>possible for you to send the patches again?
Well, it's actually not that simple. The code of Lucene that we use is 
pretty heavily modified (by the term vector patch and by a few later 
additions, such as the TermEnum patch from 6 months ago or so). What I'd 
like to do with the file handles is to make changes in the current 
Lucene sources, do the testing and all, and then port the changes into 
our version of Lucene. This way the contribution will be readily usable. 
The term vector patches that I sent before, are out there, so feel free 
to incorporate them into Lucene, but I can't really spend time on them 
right now. Plus, I think that from IP point of view, those changes allow 
the company I work for to do things with Lucene that our competitors 
can't readily do, and these things happen to be very much key to our 
value proposition, so I really can't publish any more of those changes 
yet. Now, if Lucene acquired a similar capability from what I already 
published or from some other source, perhaps we could contribute to that 
effort later in smaller ways.

A great thing about the Apache license is that it allows this kind of 
flexibility (IANAL). This is just where I'm comfortable drawing the line 
right now. Sorry if this comes across as ungrateful... We are really 
very appreciative of the Lucene project and of the community, and we'll 
try to contribute in other ways, but this one is not available any 
more/yet. :)

>Also, I noticed that a large portion of those patches contained a good
>amount of documentation (code comments, Javadocs).  Dmitry obviously
>studied the code in depth :)  I will try extracting at least the
>documentation from that contribution.
Yes, I did read it end to end - boy, was that a learning experience! :)

>Finally, Dmitry, if you have term vector support in your local copy of
>the current Lucene sources, how are you going to make patches
>containing only the changes that you outlined in the recent email?
>Are term vector changes gone or....?
Like I said above, I'll be working with the current Lucene from CVS up 
until the changes are final, then I will port them to my copy of the 
Perhaps later we can get back to the TermEnum changes as well. Those I 
could contribute (well, actually I already did :) ). The jist there is 
that I was able to reduce garbage collection on certain operations 
substantially, but I think someone reported that the code did not work 
correctly in some cases (must be uses of Lucene that we do not 
experience in our environment).

Thanks for digging the term vectors back out, Otis.

View raw message