lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Jungermann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Thu, 07 Jan 2010 18:06:57 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797716#action_12797716
] 

Patrick Jungermann commented on SOLR-236:
-----------------------------------------

Hi all,

we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. Within the index,
there are ~3.5 million documents with string-based identifiers of a length up to 50 chars.

The result document of our prefix query, which was at position 1 without collapsing, was with
collapsing not even within the top 10 results. We using the option {{collapse.maxdocs=150}}
and after changing this option to the value 15000, the results seem to be as expected. Because
of that, we concluded, that there has to be a problem with the sorting of the uncollapsed
docset.


Also, we noticed a huge memory leak problem, when using collapsing. We configured the component
with {{<searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent"/>}}.
Without setting the option {{collapse.field}}, it works normally, there are far no memory
problems. If requests with enabled collapsing are received by the Solr server, the whole memory
(oldgen could not be freed; eden space is heavily in use; ...) gets full after some few requests.
By using a profiler, we noticed that the filterCache was extraordinary large. We supposed
that there could be a caching problem (collapeCache was not enabled).


Additionally it might be very useful, if the parameter {{collapse=true|false}} would work
again and could be used to enabled/disable the collapsing functionality. Currently, the existence
of a field choosen for collapsing enables this feature and there is no possibility to configure
the fields for collapsing within the request handlers. With that, we could configure it and
only enable/disable it within the requests like it will be conveniently used by other components
(highlighting, faceting, ...).


Patrick

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch,
collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch,
field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch,
SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message