lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: IndexReader plugins
Date Tue, 14 Apr 2009 16:51:20 GMT
>> > With the early binding approach, you wouldn't pass all plugins during
>> > creation; you'd pass a factory object that exposes methods like:
>> >
>> >  getPostingsComponent(SegmentInfo)
>> >  getStoredFieldsComponent(SegmentInfo)
>> >  getValueSourceComponent(SegmentInfo)
>> That basically kills the whole idea.
> Heh.  I can confirm that that approach turns out to be less flexible than we
> might hope.
You're speaking of getPostings/StoredFields/ValueSourceCompontent
factory, or my initial proposal?

>> My initial reason for adding index plugins was to support user-written
>> components that have strong 1-1 binding with segments.
> I consider solving this problem crucial for Lucy.
> Earwin/Kirill, I wasn't able to think of any way to pull this off except to
> install components using a hash table and retrieve them by string identifier.
> Can you think of any other options?
That is exactly what I am offering. Except I key by Class<?> and not
String, it's a little bit faster and allows type-safe component
retrieval method.
I don't see any issues with this design, except some strange people
retrieving a component per-hit while searching. I do not pity them :)

>> Filter caches, Query caches, Value caches, Sort caches, Clustering caches,
>> whatever.  The same plugin system can also support lucene-internal
>> components with similarily strong binding to segments/indexes.
> Can you elaborate on that?
What exactly do you want to hear? Description for each mentioned component?

>> If you introduce that factory to create components, you hardcode
>> component types once again, and one can't add a new type of component
>> without patching Lucene.
> Not necessarily.  The list of fixed components can be augmented with an
> auxilliary list.
If you have a generic API that works well enough, what is the point in
making partial specializations? API should be minimal.

> However, in Lucy, I'm tempted to strip down the API for SegReader so that you
> would almost always access data by grabbing a component first.  Keeping the
> interface minimal makes supporting wildly disparate back ends more
> straightforward.
Yup! You read my thoughts.

>> Also, I strongly believe components should receive a reader when binding.
>> If they need segmentInfo - they should get it from reader, if they need
>> anything else and it is private - there should be a getter for it.
> Hmm.  This is how I've been thinking about having the factory methods work in
> Lucy (translated to Java):
>  public class MyArchitecture extends Architecture {
>    public DataReader makeDocReader(Schema schema, Folder folder,
>                                    Snapshot snapshot, Segment segment)
>      return new ZlibDocReader(schema, folder, snapshot, segment);
>    }
>    // ...
>  }
> Do you think something like this would work better?
>  public class MyArchitecture extends Architecture {
>    public void registerDocReader(SegReader reader) {
>      ZlibDocReader docReader = new ZlibDocReader(reader.getSchema(),
>        reader.getFolder(), reader.getSnapshot(), reader.getSegment());
>      reader.register("DocReader", docReader);
>    }
>    // ...
>  }
Absolutely. Your next DocReader implementation will most probably need
something off SegReader you forgot to include here ->
makeDocReader(Schema schema, Folder folder, Snapshot snapshot, Segment
Even more, some other (non-DocReader) component will require its own
totally different stuff, or maybe it will need only create/destroy
notifications and no data.
I'm trying to build a generic API that will also be partially immune
to Lucene's dreaded "We no longer need this stuff, let's deprecate it
and waste time, manpower and API clarity to support it for the three
upcoming years".

Kirill Zakharenko/Кирилл Захаренко (
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message