lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: Term pollution from binary data
Date Tue, 13 Nov 2007 10:41:02 GMT
"Chuck Williams" <> wrote:
> Doug Cutting wrote on 11/07/2007 09:26 AM:
> > Hadoop's MapFile is similar to Lucene's term index, and supports a 
> > feature where only a subset of the index entries are loaded 
> > (determined by  It would not be difficult to add 
> > such a feature to Lucene by changing TermInfosReader#ensureIndexIsRead().
> >
> > Here's a (totally untested) patch.
> Doug, thanks for this suggestion and your quick patch.
> I fleshed this out in the version of Lucene we are using, a bit after 
> 2.1.  There was an off-by-1 bug plus a few missing pieces.  The attached 
> patch is for 2.1+, but might be useful as it at least contains the 
> corrections and missing elements.  It also contains extensions to the 
> tests to exercise the patch.

Thanks Chuck, I will start from your patch & get it working on trunk.

> I tried integrating this into 2.3, but enough has changed so that it was 
> not straightforward (primarily for the test case extensions -- the 
> implementation seems it will apply with just a bit of manual merging).  
> Unfortunately, I have so many local changes that is has become difficult 
> to track the latest Lucene.  The task of syncing up will come soon.  
> I'll post a proper patch against the trunk in jira at a future date if 
> the issue is not already resolved before then.
> Michael McCandless wrote on 11/08/2007 12:43 AM:
> > I'll open an issue and work through this patch.
> >   
>  Michael, I did not see the issue, else would have posted this there.  
> Unfortunately, I'm pretty far behind on lucene mail these days.

Sorry, I haven't yet gotten to opening the issue.  I will try to do so

> > One thing is: I'd prefer to not use system property for this, since
> > it's so global, but I'm not sure how to better do it.
> >   
> Agree strongly that this is not global.  Whether ctors or an 
> index-specific properties object or whatever, it is important to be able 
> to set this on some indexes and not others in a single application.
> Thanks for picking this up!

Will do!  Sorry for the delay.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message