lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <luc...@mikemccandless.com>
Subject Re: MergePolicy interface and SegmentInfos
Date Sat, 03 Nov 2007 09:58:40 GMT

"Chris Hostetter" <hossman_lucene@fucit.org> wrote:

> I haven't really delved into the MergePolicy work that's been done, but a 
> recent Jira comment going me poking arround the javadocs -- MergePolicy is 
> a public interface, which suggests  clients are allowed to impliment it, 
> leading me wonder about two things...
> 
> 1) Writing a MergePolicy requires knowing about the package protected 
> SegmentInfos class ... how do we expect people to make that work (i know 
> we've said in the past that people shouldn't have to implement classes in 
> the o.a.l namespace just to make thigns work for them)

Good point.  Currently your class (implementing MergePolicy) must be
part of the o.a.l.index package, so you can see the package-protected
SegmentInfos/SegmentInfo classes.  I had thought that was OK.

Is it really so bad to require users to put their class into the
o.a.l.index package, when what they are doing is a very advanced
thing?

The only other option I can see is to make SegmentInfos/SegmentInfo
public.

Maybe we should add API warning caveats in the javadocs ("this API is
advanced & new & may change") like we have now for Payloads, and leave
the package-protection in place for now to limit usage to brave early
adopters (even if we intend later to make things public)?

> 2) should we instead make this an abstract base class to help "future 
> proof" ourselves against wanting to add support for more "optional" 
> methods we might want to allow MergePolicies to specify?
> 
> (this being the age old interface vs bse class discussion ... providing a 
> base class allows us add support for new methods later by providing 
> defaults, interfaces can never be changed except in major leases (ie:
> X.0)
> 
> For example: suppose down the road we want to support an option like yonik 
> describes here...
> 
> https://issues.apache.org/jira/browse/LUCENE-1043?#action_12539675
> > More controversial: maybe even expand the number of docs that can be 
> > bulk copied by not bothering removing deleted docs if it's some very small 
> > number (unless it's an optimize). This is probably not worth it.
> 
> ...this is the kind ofthing a MergePolicy could specify with some new 
> method...
>      public float getMaxAllowedPercentageOfDeletedDocsIgnored() {
>         return 0.0f;
>      }
> ...that individual MergePolicies could override.

Switching to an abstract base class is a good idea.  I think it's
important to reserve the freedom to add default methods in-between
major releases.  I'll work out a patch.

> Perhaps the broader question is: do we really want/expect people to write 
> their own MergePolicies, or is hte interface just to provide an 
> abstraction for picking one of the provided Impls? ... in that case, it 
> seems like we should lock down the API a bit more (we can always open it 
> up later)

I *think* people will want to implement their own merge policies,
though it is of course hard to tell at this point :).  EG use cases:
customize optimize to NOT merge the very large segments; favor merging
segments that have many pending deletes; postpone heavy merging until
overnight when search traffic is low; make a merge policy that's free
to merge non-adjacent segments (though we can't do that one until we
fix IndexWriter to accept such a MergeSpecification).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message