lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2665) Rework FieldCache to be more flexible/general
Date Fri, 24 Sep 2010 10:50:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914407#action_12914407
] 

Michael McCandless commented on LUCENE-2665:
--------------------------------------------

Here are some of the problems w/ FC that I'd love to see fixed here:
 
  * The source of values should be fully flexible/pluggable -- FC does
    uninversion, CSF pulls the array image from the index, an app can
    plugin its own source.  All three of these "sources" should be
    consumed via the same API (eg ByteValues w/ getValue(int docID)).

  * Value lookup needs to be a method call, w/ optional
    getBackingArray(), for (manual today, automatic tomorrow) code
    spec.

  * Uninversion is dangerous -- eg if you accidentally have multiple
    values per field, they "silently" overwrite one another.

  * When you do legitimately have multiple values (eg numeric fields),
    the Parser interface is also too inflexible -- eg the exception to
    stop visiting terms, the inabilty to specify which (users have
    requested "first only" and "last only") of multiple values should
    be kept, etc.

  * Cache should be stored/accessible via the reader, not in separate
    external WeakHashMap.  The eviction policy should be fully
    visible/controllable by the app (or maybe app optionally hands us
    a cache impl/factory).  There should be no static FC.DEFAULT that
    we have today.

  * Insanity shouldn't be allowed/possible -- it's just too dangerous
    today that we allow this.  We should at least make it really hard
    to do, by accident (eg, like you must use SlowMultiReader to prove
    your insanity).  EG caching values @ the MultiReader level.  Or,
    LUCENE-2527 (fasterButMoreRAM true/false causing a double entry).

  * The entries are too strongly tied to field names.  I may want
    virtual entries, not backed by a "real" field.  EG, say I want to
    do a "blended" sort, say mixing in recency with elevance... I
    should be able name this "RelevanceAndRecency" (say), which is not
    a real field.  I back this w/ my own FloatValues impl, which
    under-the-hood somehow combines the two "sources" and presents a
    FloatValues interface.  Then I should be able to pass a SortField
    somehow referencing my dynamic/virtual field.

  * Cannot support multiple values per doc (this is a future
    nice-to-have-but-don't-preclude sort of thing)
 
With these fixes, flex scoring (LUCENE-2392), the per-doc stats
(unique term count, total term count, boost, etc.) should all become
pluggable value sources.


> Rework FieldCache to be more flexible/general
> ---------------------------------------------
>
>                 Key: LUCENE-2665
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2665
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Ryan McKinley
>         Attachments: LUCENE-2665-FieldCacheOverhaul.patch
>
>
> The existing FieldCache implementation is very rigid and does not allow much flexibility.
 In trying to implement simple features, it points to much larger structural problems.
> This patch aims to take a fresh approach to how we work with the FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message