lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4372) CachingCollector.create(boolean, boolean, double) is trappy
Date Mon, 10 Sep 2012 15:18:08 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452060#comment-13452060
] 

Robert Muir commented on LUCENE-4372:
-------------------------------------

btw: what confuses me is what happens in the "run-out-of-ram" case,
why do we even bother collecting the overflow'd docs here (other.collect)?

{code}
      if (curDocs == null) {
        // Cache was too large
        cachedScorer.score = scorer.score();
        cachedScorer.doc = doc;
        other.collect(doc);
        return;
      }
{code}


                
> CachingCollector.create(boolean, boolean, double) is trappy
> -----------------------------------------------------------
>
>                 Key: LUCENE-4372
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4372
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>
> Followup to LUCENE-3102.
> Shai proposed a method that just caches all scores so they can be replayed:
> {quote}
> Do you think we can modify this Collector to not necessarily wrap another Collector?
We have such Collector which stores (in-memory) all matching doc IDs + scores (if required).
Those are later fed into several processes that operate on them (e.g. fetch more info from
the index etc.). I am thinking, we can make CachingCollector optionally wrap another Collector
and then someone can reuse it by setting RAM limit to unlimited (we should have a constant
for that) in order to simply collect all matching docs + scores.
> {quote}
> But Mike had concerns about the RAM usage:
> {quote}
> I'd actually rather not have the constant – ie, I don't want to make
> it easy to be unlimited? It seems too dangerous... I'd rather your
> code has to spell out 10*1024 so you realize you're saying 10 GB (for
> example).
> {quote}
> My concern here is what happens when you dont specify enough, I think those hits are
just silently dropped (which is worse than using lots of RAM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message