lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Date Sat, 11 Apr 2009 15:33:14 GMT


Uwe Schindler commented on LUCENE-831:

How about we create a ValueSource abstract base class, that defines
abstract byte[] getBytes(IndexReader r, String field),
int[] getInts(IndexReader r, String field), etc. (Just like

This is subclassed to things like UninversionValueSource (what
FieldCache does today), CSFValueSource (in the future) both of which
take an IndexReader when created.

UninversionValueSource should provide basic ways to customize the
uninversion. Hopefully, we can share mode code than the current
FieldCacheImpl does (eg, a single "enum terms & terms docs" loop that
switches out to a "handler" to deal with each term & doc, w/
subclasses that handle to byte, int, etc.).

And then I can also make MyFunkyValueSource (for extensibility) that
does whatever to produce the values.

Then we make CachingValueSource, that wraps any other ValueSource.

And finally expose a way in IndexReader to set its ValueSource when
you open it? It would default to
CachedValueSource(UninversionValueSource()). I think we should
require that you set this on opening the reader, and you can't later
change it.

I like this idea, but i am a little bit concerned about only one ValueSource for the Reader.
This makes plugging in different sources for different field types hard.

E.g.: One have a CSF and a TrieField and several normal int/float fields. For each of these
fields he needs another ValueSource. The CSF field can be loaded from Payloads, the TrieField
by decoding the prefix encoded values and the others like it is now.

So the IndexReaders ValueSource should be a Map of FieldValueSources, so the user could register
FieldValueSources for different field types.

The idea of setting the ValueSource for the IndexReader is nice, we then could simply remove
the extra SortField constructors, I added in LUCENE-1478, as it would be possible to specify
the type for Sorting when creating the IndexReader. Search code then would simply say, sort
by fields a, b, c without knowing what type of field it is. The sort code would get the type
and the arrays from the underlying IndexReaders.

The same with the current function query value sources (just a question: are the functions
query value sources then obsolete and can be merged with the "new" ValueSource)?

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>                 Key: LUCENE-831
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>         Attachments:, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff,
fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message