lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Earwin Burrfoot (JIRA)" <>
Subject [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Date Sun, 12 Apr 2009 11:48:14 GMT


Earwin Burrfoot commented on LUCENE-831:

I'm using a similar approach.

There's a FieldType, that governs conversions from Java type into Lucene strings and declares
'abilities' of that type. Like - conversion is order-preserving (all numerics + some others),
converted values can be meaningfully prefix-searched (like TreeId, that is essentially an
int[], used to represent things like nested category trees). Some types can also declare themselves
as derivatives of others, like DateType being derived from LongType.

Then there's a FieldInfo, that defines field name, FieldType used for it, and actions we're
going to take on the field. E.g. if we want to sort on it, build clusters with certain characteristics,
load values for this field for each found document, use fast rangefilters, store/filter on
field being null/notnull, apply transforms on the field before storing/searching, copy value
of the field to another field (with probable transformation) when indexing, etc. From FieldType
and desired actions, FieldInfo is able to deduce tokenize/index/store/cache behaviour, and
can say that additional lucene fields are required (e.g. for handling null/notnull searches,
or trie ranges, or a special sort-form).

Then there's an interface that contains FieldInfo constants and a special constant FieldEnum
FIELDS = fieldsOf(ResumeFields.class); that is essentially a navigable list of all FieldInfos
defined in this interface and interfaces it extends (allows me to have CommonFields + ResumeFields
extends CommonFields, VacancyFields extends CommonFields).

FieldType, and consequently FieldInfo is type-parameterized with the java type associated
with the field, so you get the benefit of type-safety when storing/loading/searching the field.
All Filters/Queries/Sorters/Loaders/Document accept FieldInfo instead of String for field
name, so for example Filters.Range(field, fromValue, fromInclusive, toValue, toInclusive)
knows whether to use a simple range filter or a trie one, ensures from/toValues are of a proper
type and converts them properly. Filters.IsSet(field) can consult an additional field created
during indexation, or access a FieldCache. DocLoader will either get a value for the field
from index or from the cache. etc, etc, etc.

While I like resulting schema-style very much, I don't want to see the likes of it within
Lucene core. Better to have some contrib/extension/whatever that builds on core-defined primitives.
That way if one needs to build his own somewhat divergent schema, they can easily do it, instead
of trying to fit theirs over Lucene's. For the very same reason I'd like to see fieldcaches
moved away from the core, and depending on the same in-core IndexReader segment creation/deletion/whatever
hooks that users will use to build their extensions. 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>                 Key: LUCENE-831
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>         Attachments:, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff,
fieldcache-overhaul.diff, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, LUCENE-831.03.31.2008.diff,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message