lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Date Thu, 16 Apr 2009 21:36:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699880#action_12699880
] 

Mark Miller commented on LUCENE-831:
------------------------------------

Okay, now that I half way understand this issue, I think I have to go back to the basic motivations.
The original big win was taken away by 1483, so lets see if we really need a new API for the
wins we have left.

h3. Advantage of new API (kind of as it is in the patch)
FieldCache is interface and it would be nice to move to abstract class, ExtendedFieldCache
is ugly
Avoid global sync by IndexReader to access cache
its easier/cleaner to block caching by multireaders (though I am almost thinking I would prefer
warnings/advice about performance and encouragement to move to per segment)
It becomes easier to share a ValueSource instance across readers.

h3. Disadvantages of new API
If we want only SegmentReaders to have a ValueSource, you can't efficiently back the old API
with the new, causing RAM reqs jumps if you straddle the two APIs and ask for the same array
data from each.

Its probably a higher barrier to a custom Parser to implement and init a Reader with a ValueSource
(presumably that works per field) than to simply pass the Parser on a SortField. However,
Parser stops making sense if we end up being able to back ValueSource with column stride fields.
We could allow ValueSource to be passed on the SortField (the current incarnation of this
patch), but then you have to go back to a global cache by reader the ValueSources passed that
way (you would also still have the per segment reader, settable ValueSource).

h3. Advantages of staying with old API
Avoid forcing large migration for users, with possible RAM req penalties if they don't switch
from deprecated code (we are doing something similar with 1483 even without deprecated code
though - if you were using an external multireader FieldCache that matched a sort FieldCache
key, youd double your RAM reqs).

h3. Thoughts
If we stayed with the old API, we could still allow a custom FieldCache to be supplied. We
could still back FieldCacheImpl with Uninverter to reduce code. We could still have CachingFieldCache.
Though CachingValueSource is much better :) FieldCache implies caching, and so the name would
be confusing. We could also avoid CachingFieldCache though, as just making a pluggable FieldCache
would allow alternate caching implementations (with a bit more effort).

We could deprecate the Parser methods and force supplying a new FieldCache impl for custom
uninversion to get to an API suitable to be backed by CSF.

Or:

We could also move to ValueSource, but allow a ValueSource on multi-readers. That would probably
make straddling the API's much more possible (and efficient) in the default case. We could
advise that its best to work per segment, but leave the option to the user.

h3. Conclusion
I am not sure. I thought I was convinced we might as well not even move from FieldCache at
all, but now that I've written a bit out, I'm thinking it would be worth going to ValueSource.
I'm just not positive on what we should support. SortField ValueSource override keyed by reader?
ValueSources on MultiReaders?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff,
fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff,
LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message