lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Mon, 31 Aug 2009 11:10:33 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749464#action_12749464
] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

Hi Thomas, currently both collapsing algorithms do not store the the ids of the collapsed
documents. 
In order to have this functionality I think the following has to be done:
1) In the doCollapsing(...) methods of both concrete implementations of DocumentCollapser,
the collapsed documents have to be stored. Depending on what you want you can store it in
one big list or store it a list per most relevant document. The most relevant document is
the document that does *not* collapse.
2) In the getCollapseInfo(...) method in the AbstractDocumentCollapser you then need to output
these collapsed documents. If you are storing the collapsed documents in one big list then
adding a new NamedList with collapsed document would be fine I guess. If you are storing the
collapsed documents per document head, then I would add the collapsed document ids to existing
resDoc named list. It is important that you return the Solr unique id instead of the lucene
id.

This is just one approach, but what is the reason that you want this functionality? I guess
what would be much easier, is to do a second query after the collapse query. In this second
query you disable field collapsing (by not setting collapse.field) and you set fq=[collapse.field]=[collapse.value]
for example.

Potentially the number of collapsed documents can be very large and in that situation it can
have a impact on performance. Therefore I think that this functionality should be disabled
by default. In the same way collapseInfoDoc and collapseInfoCount are managed.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch,
collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch,
field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message