lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
Date Fri, 17 Sep 2010 14:01:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910591#action_12910591
] 

Shai Erera commented on LUCENE-2649:
------------------------------------

One thing I've wanted to do for a long time, but didn't get to doing it, is open up FieldCache
to allow the application to populate the entries from other sources - specifically pyloads.
I wrote a sorting solution which relies solely on payloads, and wanted to contribute it to
Lucene, but due to lack's of FieldCache hook points, I didn't find the time to do the necessary
refactoring.

Sorting based on payloads-data has several advantages:
# It's much faster to read than iterating on the lexicon and parsing the term values into
sortable values.
# If your application needs to cater sort over 10s of millions of documents, or if it needs
to keep its RAM usage low, you can do the sort while reading the payload data as the search
happens. It's faster than if it was in RAM, but the current FieldCache does not allow you
to sort w/o RAM consumption.
# You don't inflate your lexicon w/ sort values, affecting other searches. In some situations,
you can add a unique term per document for the sort values (such as when sorting by date and
require up to a millisecond precision).

I'm bringing it up so that if you consider any refactoring to FieldCache, I'd appreciate if
you can keep that in mind. If the right hooks will open up, I'll make time to contribute my
sort-by-payload package. If you don't, then it'll need to wait until I can find the time to
do the refactoring.



> FieldCache should include a BitSet for matching docs
> ----------------------------------------------------
>
>                 Key: LUCENE-2649
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2649
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 4.0
>
>         Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch
>
>
> The FieldCache returns an array representing the values for each doc.  However there
is no way to know if the doc actually has a value.
> This should be changed to return an object representing the values *and* a BitSet for
all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message