lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: IndexReader plugins
Date Tue, 14 Apr 2009 16:16:35 GMT
On Tue, Apr 14, 2009 at 03:52:39PM +0400, Earwin Burrfoot wrote:

> > With the early binding approach, you wouldn't pass all plugins during
> > creation; you'd pass a factory object that exposes methods like:
> >
> >  getPostingsComponent(SegmentInfo)
> >  getStoredFieldsComponent(SegmentInfo)
> >  getValueSourceComponent(SegmentInfo)
> That basically kills the whole idea.

Heh.  I can confirm that that approach turns out to be less flexible than we
might hope.  

In KinoSearch svn trunk, SegReader is now made up entirely of pluggable
components.  These components are loaded via factory methods from the
Architecture class.  It's very nice to be able to override
architecture.makeDocReader() and install your own custom class.  

However, the system doesn't provide a way to install custom components.  That
causes problems for custom Query subclasses that might rely on specialized
data -- for example, an RTreeQuery which needs data from an RTreeReader.

> My initial reason for adding index plugins was to support user-written
> components that have strong 1-1 binding with segments. 

I consider solving this problem crucial for Lucy.

Earwin/Kirill, I wasn't able to think of any way to pull this off except to
install components using a hash table and retrieve them by string identifier.
Can you think of any other options?

Discussion on lucy-dev at...

> Filter caches, Query caches, Value caches, Sort caches, Clustering caches,
> whatever.  The same plugin system can also support lucene-internal
> components with similarily strong binding to segments/indexes.

Can you elaborate on that?

> If you introduce that factory to create components, you hardcode
> component types once again, and one can't add a new type of component
> without patching Lucene.

Not necessarily.  The list of fixed components can be augmented with an
auxilliary list.

However, in Lucy, I'm tempted to strip down the API for SegReader so that you
would almost always access data by grabbing a component first.  Keeping the
interface minimal makes supporting wildly disparate back ends more

> Also, I strongly believe components should receive a reader when binding.
> If they need segmentInfo - they should get it from reader, if they need
> anything else and it is private - there should be a getter for it.

Hmm.  This is how I've been thinking about having the factory methods work in
Lucy (translated to Java):

  public class MyArchitecture extends Architecture {
    public DataReader makeDocReader(Schema schema, Folder folder, 
                                    Snapshot snapshot, Segment segment) {
      return new ZlibDocReader(schema, folder, snapshot, segment);
    // ...

Do you think something like this would work better?

  public class MyArchitecture extends Architecture {
    public void registerDocReader(SegReader reader) {
      ZlibDocReader docReader = new ZlibDocReader(reader.getSchema(), 
        reader.getFolder(), reader.getSnapshot(), reader.getSegment());
      reader.register("DocReader", docReader);
    // ...

Marvin Humphrey

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message