lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: Lucene 2.9 and deprecated IR.open() methods
Date Mon, 05 Oct 2009 17:28:51 GMT
I think we shouldn't discuss too many different things here.
To begin I'd just like to introduce the IndexConfig class, that will 
hold the parameters we currently pass to the different IndexWriter 
constructors.

If we later need to create different IndexWriter impls we can introduce 
a factory.

If we want to change some IW settings to be mandatory on IW 
instantiation we can move those parameters from IW to the Config class 
then.

If we see in the future the need to pass arguments to the different flex 
index consumers, we can add an AttributeSource or Properties hashmap to 
the config class, or maybe directly to the IndexingChain class. I don't 
really think the IndexWriter needs this flexibility right now and it 
seems like Mike hasn't seen the need thus far while working on 
LUCENE-1458 either.

Adding the Config class and deprecating all other IW constructors will 
not prevent us from doing any of the other things in the future IMO and 
is already a great start to simplify things. So let's do that first and 
discuss the other points separately when the need arises.

  Michael

On 10/5/09 5:40 AM, Uwe Schindler wrote:
> Hi Mike,
>
>    
>> I think AS is overkill for conveying configuration of IW/IR?
>>
>> Suddenly, instead of:
>>
>>    cfg.setRAMBufferSizeMB(128.0)
>>
>> I'd have to do something like this?
>>
>>
>> cfg.addAttribute(IndexWriterRAMBufferSizeAttribute.class).setRAMBufferSize
>> (128.0)
>>
>> It's too cumbersome, I think, for something that ought to be simple.
>> I'd prefer a dedicated config class with strongly typed setters
>> exposed.  Of all the "pure syntax" options so far I'd still prefer the
>> traditional "config object with setters".
>>      
> > From this point-of-view, it's also overkill for TokenStream. But as AS was
> also designed for flexible indexing it would fit very well into this area.
>
> The new query parser is a good example pro attributes. What is an argument
> against atts is the fact, that also Michael Bush didn't promote them from
> the beginning of this discussion :-) (maybe he needs also one night longer
> to think about it).
>
> Good points for AS, are e.g. the type-safety, simplicity to enhance,
> built-in defaults (you do not need to check for existence of attributes,
> just add them at the point you want to use them, like in your example -
> maybe with nicer and shorter names). With generics, AS is as simple to use
> like simple get/setters.
>
>    
>> Also, I don't think we should roll this out for all Lucene classes.  I
>> think most classes do just fine accepting args to their ctor.  EG
>> TermQuery simply takes Term to its ctor.
>>
>> I do agree IW should not be in the business of brokering changes to
>> the settings of its sub-components (eg mergeFactor, maxMergeDocs).
>> You really should make such changes directly via your merge policy.
>>      
> AttributeSource would also help us with e.g. the possibility for later
> changes to various attributes. If some of the attributes are fixed after
> construction of IW/IR, just throw IllegalStateExceptions.
>
>    
>> Finally, I'm not convinced we should lock down all settings after
>> classes are created.  (I'm not convinced we shouldn't, either).
>>
>> A merge policy has no trouble changing its mergeFactor,
>> maxMergeDocs/Size.  IW has no trouble changing the its RAM buffer
>> size, maxFieldLength, or useCompoundFile.  Sure there are some things
>> that cannot (or would be very tricky to) change, eg deletion policy.
>> But then analyzer isn't changeable today, but could be.
>>
>> But, then, I can also see it'd simplify our code to not have to deal
>> w/ such changes, reduce chance of subtle bugs, and it seems minor to
>> go and re-open your IndexWriter if you need to make a settings change?
>> (Hmm except in an NRT setting, because the reader pool would be reset;
>> really we need to get the reader pool separated from the IW instance).
>>
>> Mike
>>
>> On Mon, Oct 5, 2009 at 4:38 AM, Uwe Schindler<uwe@thetaphi.de>  wrote:
>>      
>>>>> See my second mail. The recently introduced Attributes and
>>>>>            
>>>> AttributeSource
>>>>          
>>>>> would solve this. Each component just defines its attribute interface
>>>>>            
>>>> and
>>>>          
>>>>> impl class and you pass in an AttributeSource as configuration. Then
>>>>>            
>> you
>>      
>>>> can
>>>>          
>>>>> do:
>>>>>
>>>>> AttributeSource cfg = new AttributeSource();
>>>>>
>>>>> ComponentAttribute compCfg =
>>>>>            
>> cfg.addAttribute(ComponentAttribute.class);
>>      
>>>>> compCfg.setMergeScheduler(FooScheduler.class);
>>>>>
>>>>> MergeBarAttribute mergeCfg =
>>>>>            
>> cfg.addAttribute(MergeBarAttribute.class);
>>      
>>>>> mergeCfg.setWhateverProp(1234);
>>>>> ...
>>>>> IndexWriter iw = new IndexWriter(dir, cfg);
>>>>>
>>>>> (this is just brainstorming not yet thoroughly thought about).
>>>>>            
>>>> This approach suggests IW creates its components, and while doing so
>>>> provides them your AS instance.
>>>> I personally prefer creating all these components myself, configuring
>>>> them (at the moment of creation) and passing them to IW in one way or
>>>> another.
>>>> This requires way less code, you don't have to invent elaborate
>>>> schemes of passing through your custom per-component settings and
>>>> selecting which exact component types IW should use, you don't risk
>>>> construct/postConstruct/postpostConstruct-style things.
>>>>          
>>>
>>> Not really. That was just brainstorming. But you can pass also instances
>>> instead of class names through attributesource. AttributeSurce only
>>>        
>> provides
>>      
>>> type safety for the various configuration settings (which are
>>>        
>> interfaces).
>>      
>>> But you could also create an attribute that gets the pointer to the
>>> component. So "compCfg.setMergeScheduler(FooScheduler.class);" could
>>>        
>> also be
>>      
>>> compConfig.addComponent(new FooScheduler(...));
>>>
>>> The AttributeSource approach has one other good thing:
>>> If you want to use the default settings for one attribute, you do not
>>>        
>> have
>>      
>>> to add it to the AS (or you can forget it). With the properties
>>>        
>> approach,
>>      
>>> you have to hardcode the parameter defaults and validation everywhere.
>>>        
>> As
>>      
>>> the consumer of an AttributeSource gets the attribute also by an
>>> addAttribute-call (see current indexing code consuming TokenStreams),
>>>        
>> this
>>      
>>> call would add the missing attribute with its default settings defined
>>>        
>> by
>>      
>>> the implementation class. So in the above example, if you do not want to
>>> provide the "whateverProp", leave the whole MergeBarAttribute out. The
>>> consumer (IW) would just call addAttribute(MergeBarAttribute.class),
>>>        
>> because
>>      
>>> it needs the attribute to configure itself. AS would add this attribute
>>>        
>> with
>>      
>>> default settings.
>>>
>>> Uwe
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>        
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>      
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message