lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4372) CachingCollector.create(boolean, boolean, double) is trappy
Date Mon, 10 Sep 2012 15:12:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452055#comment-13452055
] 

Shai Erera commented on LUCENE-4372:
------------------------------------

CachingCollector has this in its javadocs:

{code}
 * Caches all docs, and optionally also scores, coming from
 * a search, and is then able to replay them to another
 * collector.  You specify the max RAM this class may use.
 * Once the collection is done, call {@link #isCached}. If
 * this returns true, you can use {@link #replay(Collector)}
 * against a new collector.  If it returns false, this means
 * too much RAM was required and you must instead re-run the
 * original search.
{code}

Notice the last sentence about isCached returning false.

Should we just fix the static create() method's documentation (even though it points to the
class's javadocs)?

I don't see any alternative -- if the user specified a too low RAM limit, what can you do
besides discarding the docs and documenting that behavior? I'd hate to see exceptions thrown...
                
> CachingCollector.create(boolean, boolean, double) is trappy
> -----------------------------------------------------------
>
>                 Key: LUCENE-4372
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4372
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>
> Followup to LUCENE-3102.
> Shai proposed a method that just caches all scores so they can be replayed:
> {quote}
> Do you think we can modify this Collector to not necessarily wrap another Collector?
We have such Collector which stores (in-memory) all matching doc IDs + scores (if required).
Those are later fed into several processes that operate on them (e.g. fetch more info from
the index etc.). I am thinking, we can make CachingCollector optionally wrap another Collector
and then someone can reuse it by setting RAM limit to unlimited (we should have a constant
for that) in order to simply collect all matching docs + scores.
> {quote}
> But Mike had concerns about the RAM usage:
> {quote}
> I'd actually rather not have the constant – ie, I don't want to make
> it easy to be unlimited? It seems too dangerous... I'd rather your
> code has to spell out 10*1024 so you realize you're saying 10 GB (for
> example).
> {quote}
> My concern here is what happens when you dont specify enough, I think those hits are
just silently dropped (which is worse than using lots of RAM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message