lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexReader plugins
Date Mon, 13 Apr 2009 10:45:09 GMT
I think this (truly componentizing SegmentReader) makes tons of sense.
 After all, a SegmentReader is just a bunch of separate components
handling different parts of the index.

This is really orthogonal to LUCENE-831 (the field cache is just one
component).  They can land in either order...

Earwin do you want to take an initial stab (patch) at this?

I think it'll be interesting how the components API handles near
real-time search, because we want/expect components to be able to
merge themselves efficiently "in RAM" when possible.  EG if field
cache already has certain fields loaded, they can be merged in RAM; if
not, they should be merged on disk.  If field cache has pending
changes (in a future world when CSF makes it possible to suddenly
change say the price of certain documents), then the components must
properly implement clone (ideally incremental copy-on-write cloning).

Mike

On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot <earwin@gmail.com> wrote:
> To support my dream of kicking fieldCache out of the core and to add
> some extensibility to Lucene, I want to introduce IndexReaderPlugins.
> Rough pseudocode follows:
>
> interface IndexReaderPlugin {
>        void attach(SegmentReader reader);
>        void detach(SegmentReader reader);
>
>        void attach(MultiSegmentReader reader);
>        void detach(MultiSegmentReader reader);
> }
>
> IndexReader.java:
> private Map<Class, IndexReaderPlugin> plugins;
>
> on opening/closing toplevel/segment reader we iterate over plugins:
> for(IndexReaderPlugin plugin : plugins)
>    plugin.attach(reader);
>
> the map is passed to toplevel reader initially, and then shared with
> lowlevel readers, we can also retrieve plugins:
> public <T> T plugin(Class<T> pluginType);
>
> then we can do something like:
> indexReader.plugin(ValueSource.class).doSomething // lucene code
> indexReader.plugin(FieldsCache.class).forField(LAST_UPDATE_TIME).doSomething
> // my code
> filter.apply(indexReader.plugin(FilterCache.class)) // my code
>
> Benefits are numerous. We get rid of alien code like:
> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy)
> @@ -83,6 +86,8 @@
> +  protected ValueSource valueSource;
> +
> @@ -555,6 +560,8 @@
> +
> +      valueSource = new CachingValueSource(this, new
> UninversionValueSource(this));
>
> If I don't need ValueSource attached to my readers, I won't have it.
> If I need my custom caches attached to my readers, I can do it in a
> natural way instead of hacking around MergeScheduler, or comparing
> subreader lists.
> If I want, I can replace Lucene's native ValueSource with my own
> implementation, and all Lucene classes that use it will happily accept
> it.
>
> On second thought, we shouldn't share plugin map across subreaders. If
> we allow attach(SegmentReader reader) to return an instance of plugin
> (plugin decides if it is the same instance always, or per-reader), and
> populate the map for subreader with results of attach invoked on
> toplevel reader map, we'll turn this code:
> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(segmentReader);
> into:
> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod();
> which makes more sense
>
> Any way the general idea is still the same.
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message