lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
Date Tue, 17 May 2016 18:30:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287208#comment-15287208
] 

Joel Bernstein commented on SOLR-9125:
--------------------------------------

Yeah, the CollapsingQParsePlugin can use a lot of memory. The original design goal was to
increase performance for collapsing on high cardinality fields and large result sets, as opposed
to large indexes. It was really designed to support fast collapse queries on large e-commerce
catalogs which are still typically small compared to other data sets.

If we can find a way to maintain the performance and shrink the memory usage this would be
a great thing. 



> CollapseQParserPlugin allocations are index based, not query based
> ------------------------------------------------------------------
>
>                 Key: SOLR-9125
>                 URL: https://issues.apache.org/jira/browse/SOLR-9125
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Jeff Wartes
>            Priority: Minor
>              Labels: collapsingQParserPlugin
>
> Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates space per-query
for: 
> 1 int (doc id) per ordinal
> 1 float (score) per ordinal
> 1 bit (FixedBitSet) per document in the index
>  
> So the higher the cardinality of the thing you’re grouping on, and the more documents
in the index, the more memory gets consumed per query. Since high cardinality and large indexes
are the use-cases CollapseQParserPlugin was designed for, I thought I'd point this out.
> My real issue is that this does not vary based on the number of results in the query,
either before or after collapsing, so a query that results in one doc consumes the same amount
of memory as one that returns all of them. All of the Collectors suffer from this to some
degree, but I think OrdScore is the worst offender.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message