lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Term pollution from binary data
Date Thu, 08 Nov 2007 18:09:08 GMT
I think it would be better to have IndexReaderProperties, and  

Just seems an easier API for maintenance. It is more logical, as it  
keeps related items together.

On Nov 8, 2007, at 12:04 PM, Doug Cutting wrote:

> Michael McCandless wrote:
>> One thing is: I'd prefer to not use system property for this, since
>> it's so global, but I'm not sure how to better do it.
> I agree.  That was the quick-and-dirty hack.  Ideally it should be  
> a method on IndexReader.  I can think of two ways to do that:
> 1. Add a generic method like IndexReader#setProperty(String,String).
> 2. Add a specific method like IndexReader#setTermIndexDivisor(int).
> I slightly prefer the former, as it permits various IndexReaders  
> implementations to support arbitrary properties, at the expense of  
> being untyped, but that might be overkill.  Thoughts?
>> We can't add a "setIndexDivisor(...)" method because the terms are
>> already loading (consuming too much ram) during the ctor.
> Aren't indexes loaded lazily?  That's an important optimization for  
> merging, no?  For performance reasons, opening an IndexReader  
> shouldn't do much more than open files.  However, if we build a  
> more generic mechanism, we should not rely on that.
>> What if, instead, we passed down a Properties instance to IndexReader
>> ctors?  Or alternatively a dedicated class, eg,
>> "IndexReaderInitParameters"?  The advantage of a dedicated class is
>> it's strongly typed at compile time, and, you could put things in
>> there like an optional DeletionPolicy instance as well.  I think  
>> there
>> are a growing list of these sorts of "advanced optional parameters
>> used during init" that could be handled with such an approach?
> (I probably should have read your entire message before starting to  
> respond...  But it's nice to see that we think alike!)  This is  
> similar to my (2) approach, but attempts to solve the typing issue,  
> although I'm not sure how...
> The way we handle it in Hadoop is to pass around a <String,String>  
> map in the abstract kernel, then have concrete implementation  
> classes provide static methods that access it.  So this might look  
> something like:
> public class LuceneProperties extends Properties {
>   // utility methods to handle conversion of values to and from  
> Strings
>   void setInt(String prop, int value);
>   int getInt(String prop);
>   void setClass(String prop, Class value);
>   Class getClass(String prop);
>   Object newInstance(String prop)
>   ...
> }
> public class SegmentReaderProperties {
>   private static final String DIVISOR_PROP =
>     "org.apache.lucene.index.SegmentReader.divisor";
>   public static setTermIndexDivisor(LuceneProperties props, int i) {
>     props.setInt(DIVISOR_PROP, i);
>   }
> }
> Then the IndexReader constructor methods could accept a  
> LuceneProperties.  No point in making this IndexReader specific,  
> since it might be useful for, e.g., IndexWriter, Searchers,  
> Directories, etc.
> An advantage of a <String,String> map over a <String,Object> map  
> for Hadoop is that it's trivial to serialize.
> Is this what you had in mind?
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message