lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iván de Prado (JIRA) <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Thu, 13 Nov 2008 17:00:51 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647335#action_12647335
] 

Iván de Prado commented on SOLR-236:
------------------------------------

I attached a patch named collapsing-patch-to-1.3.0-ivan.patch. The patch applies to Solr 1.3.0.

Karsten commented in the comment "Karsten Sperling - 06/Nov/07 02:06 PM":
{quote}
Inverted the logic of the filter DocSet created by CollapseFilter to contain the documents
that are to be collapsed instead of the ones that are to be kept. Without this collapse.maxdocs
doesn't work.
{quote}

I found that this way of doing consumes a lot of memory, even if your query is bounded to
a few number of documents. And I found that there is not advantage on using collapse.maxdocs
if you don't speed up queries and reduces the amount of needed memory. 

So, I decided to revert the Karsten change in order to make field collapsing faster and less
resources consuming when querying for smaller datasets.

WARNING: This patch changes the semantic of collapse.maxdocs. Before this patch, the collapse.maxdocs
was used just for reduce the number of docs cheked for grouping, but presenting the rest of
documents that were not grouped in the result. 

With current patch, only documents that were examinated for grouping can appear in the result.
This semantic have two benefits:
- The amount of resources can be controled per each query
- Not ungrouped content is presented.


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.4
>
>         Attachments: collapsing-patch-to-1.3.0-ivan.patch, field-collapsing-extended-592129.patch,
field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message