lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader plugins
Date Tue, 14 Apr 2009 10:34:43 GMT
On Mon, Apr 13, 2009 at 9:44 AM, Earwin Burrfoot <earwin@gmail.com> wrote:
> On Mon, Apr 13, 2009 at 17:14, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>> On Mon, Apr 13, 2009 at 9:02 AM, Earwin Burrfoot <earwin@gmail.com> wrote:
>>
>>>> I think I'd lean towards only at construction.  Seems dangerous to
>>>> allow swap in/out at some later time.
>>> I have several points pro-runtime:
>>> 1. We have a lot of static factory methods, it's gonna be a problem
>>> passing plugins everywhere on creation.
>>
>> "We" is who here?  Couldn't "they refactor their code to callback into
>> those static factory methods?
> "We" as in "(un)fortunate users of Lucene".
> IndexReader.java is littered with the likes of:
> public static IndexReader open(final Directory directory,
> IndexDeletionPolicy deletionPolicy) throws CorruptIndexException,
> IOException;

But I don't understand why is this a problem...

In the early binding approach (where component is established at
creation of a SegmentReader), you'd pass something that can create new
components, regardless of whether we use static factory vs ctors to
birth a new reader.  Ie, this seems non-differentiating to the early
vs late binding discussion.

>> Presumably you'd pass into Lucene a "CreatesXYZComponent" instance
>> which given a SegmentInfo (or a public'd version of it, with
>> extensibility, etc) returns an XYZComponent.
> I want to pass a component into toplevel reader, it will be
> consequently notified of subreader creation/deletion and given a
> chance to reuse its instance for them, or provide a new one.

I still feel this is backwards, on "first blush".

Meaning, for each SegmentReader I'd like to ask your "creator" to give
me a component that can read this segment, right now (early binding).
Ie, each SegmentReader has a private final instance of the
"PostingsComponent" (say).

Sure, it could be sometimes you'll share a given component instance
across more than one segment, but I'd expect that to be the exception
not the rule.

In Lucene today, everything is a "private" component, though for
reading the doc stores, this is wasteful in the case where the doc
store files are shared across segments. So that'd be one case today
where sharing would be useful.  Maybe you could focus on that as your
first proof point for this change?

>>> 2. public <T extends IRP> void bindPlugin(Class<? super T> type,
T
>>> instance) is ..er.. typesafer than passing Map<Class, IRP> in
>>> constructor.
>>
>> Can you make this more concrete?  (I'm having trouble following... the
>> generics).
> Let's define the following:
> interface Plugin {}
> interface Service1 {}
> interface Service2 {}
> class Impl implements Plugin, Service1, Service2 {}
>
> IndexReader: public <T extends Plugin> void bindPlugin(Class<? super
> T> type, T instance);
>
> Now we can invoke bindPlugin like this:
> indexReader.bindPlugin(Service1.class, new Impl());
> indexReader.bindPlugin(Service2.class, new Impl());
>
> Nice points:
> It won't compile if Impl doesn't implement Plugin (which we need for
> attach/detach notifications).
> It won't compile if Impl doesn't implement the service interface we're
> binding it to.
> Service interface doesn't have to extend Plugin, so attach/detach
> methods that are a contract between a plugin and its host won't be
> visible to service consumers that will get an instance through:
> IndexReader: public <T> T getPlugin(Class<T> type);
> Service1 srv = indexReader.getPlugin(Service1.class);
>
> If you're passing all plugins as a bunch during creation, you have to
> use wither Map, or a List of Pairs.
> Map<Class, Plugin> plugins;
> new SegmentReader(somestuff, plugins);
> That way you're able to do less type checking than in previous example.

With the early binding approach, you wouldn't pass all plugins during
creation; you'd pass a factory object that exposes methods like:

  getPostingsComponent(SegmentInfo)
  getStoredFieldsComponent(SegmentInfo)
  getValueSourceComponent(SegmentInfo)

SegmentReader during creation would ask your factory for all the
components it needs, and hold them privately, as final.

We'd still have our strong type checking?  So, "strong type checking"
also seems non-differentiating wrt early vs late binding decision.

>>> 3. Plugins support attachment/detachment and they don't need to know
>>> exact reason they were attached/detached for - there should be no
>>> difference between closing a reader and removing plugin from it.
>> But what's the use case here?  Why would a SegmentReader need to
>> detach/attach a new XYZComponent?
> We already have it for free. A component should receive notification
> when SegmentReader is created. A component should also receive
> notification when SegmentReader is close()d. Nobody prevents us from
> using these notifications at a different point in time.

I understand we got something for free... it's that I don't see the
value of that "something", but I do see added API complexity.

>> And... why not simply open a new SegmentReader?  With this change,
>> opening a new reader can be an extremely low-cost operation since you
>> could share already opened XYZComponents.  A SegmentReader simply
>> becomes a set of components held together.
> I lean to post-construction attach/detach vs passing plugins in ctor
> not because I want to abuse it for inflight indexReader
> reconfiguration, it's just simpler to use with the way IndexReaders
> are created.

I guess net/net I still can't see what we gain by decoupling
instantiation of a SegmentReader from binding of its components (late
binding).  As best I can tell it adds complexity to the API without
gaining us anything (at least anything I understand so far).

But since you seem quite convinced of its worth, maybe code up a
simple demonstration (patch on Lucene's trunk) of what it buys us,
where early binding would not buy us that benefit, and then we can
iterate from there?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message