lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Date Sat, 11 Apr 2009 12:58:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698089#action_12698089
] 

Michael McCandless commented on LUCENE-831:
-------------------------------------------

bq. Iterate iterate iterate I suppose.

Here here!  Ready, set, GO!

bq. We may need the multireader to do the full array for back compat.

Can't we just create the "make massive array & copy sub arrays in"
inside the old FieldCache?  (And deprecate the old FieldCache
entirely).

bq. I dont think any of the custom type stuff is setup to work yet.

How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc.  (Just like
ExtendedFieldCache).

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion.  Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it?  It would default to
CachedValueSource(UninversionValueSource()).  I think we should
require that you set this on opening the reader, and you can't later
change it.

This would mean a single CachingValueSource can be used for more than
one reader, which is good because IndexReader.open would send it down
to all SegmentReaders it opens.

This would then replace *CacheKey.

This approach is not that different from what we have today, but I
think there are important differences:

  * Decouple value generation (ValueSource) from caching

  * Tell IndexReader what its ValueSource is, so eg when you do
    sorting the sort pulls from your ValueSource and not a global
    default one.

  * Hopefully don't duplicate so much code (eg uninversion)

Other thoughts:

  * Presumably, at this point, the arrays returned by field cache
    should be considered readonly by the app, right?  So cloning
    a reader should simply make a shallow clone of the cache.  (Longer
    term, with CSF as the source, we think updating fields should be
    possible, so we'd need a copy-on-write solution, like we now do w/
    deleted docs).

  * Looks like some accidental regressions snuck in, eg in
    DirIndexReader:
{code}
-    final String[] files = dir.listAll();
+    final String[] files = dir.list();
{code}
    and in IndexReader:
{code}
protected IndexReader(Directory directory) {
     this();
-    this.directory = directory;
}
{code}

  * Do we even need ComparatorFactory*?  Seems like this patch
    shouldn't be be in the business of creating comparators.

  * You should hit UOE if you try to getXXX() on a MultiReader

  * Shouldn't FieldCache be deprecated entirely?  I would think, going
    forward, I interact only w/ the IndexReader's default ValueSource?


> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff,
fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message