lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Sun, 20 Dec 2009 15:46:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792997#action_12792997
] 

Martijn van Groningen commented on SOLR-236:
--------------------------------------------

ttdi,
The latest patch is not in sync with the latest trunk. You can try to patch to the trunk or
use a previous patch for the 1.4 code.

Yonik,
The parameters description is a bit poor. The response format of the older patches contains
two separate lists of collapse group counts. A list with counts per most relevant document
id that is enabled or disabled with collapse.info.doc param. The second list with counts per
fieldvalue of the most relevant document that is controlled with collapse.info.count  param.
Now that the response format has changed we should rename it to something more descriptive.
Maybe something like collapse.showCount that adds the collapse count to the collapse group
in the response (default to true) and collapse.showFieldValue that adds the fieldvalue of
the most relevant document to the group (defaults to false)?

The collapse.maxdocs specifies when to abort field-collapsing after n document have been processed.
I have never used is. I can imagine that one would use it to shorten the search time. 

The collapse.includeCollapsedDocs.fl enables a collapse collector that collects the documents
that have been discarded and output the specified fields of the discarded documents to the
fieldcollapse response per collapse group (* for all fields). The parameter name does not
reflect that behaviour entirely. You think that collapse.collectDiscardedDocuments.fl is better?
However personally I would not use this, because of the negative impact it has on performance.
Usually one wants to know something like the average / highest / lowest price of a collapse
group. The AggregateCollapseCollector would fit the needs better.

bq. Should I be able to specify a completely different sort within a group? collapse.sort=...
seems nice... what are the implications? One bit of strangeness: it would seem to allow a
highly ranked document responsible for the group being at the top of the list being dropped
from the group due to a different sort criteria within the group. It's not necessarily an
implementation problem though (sort values for the group should be maintained separately).

I'm not sure about that. It would make things more complicated. Sorting the discarded documents
in combination with the collapse.includeCollapsedDocs.fl functionality would maybe make more
sense. 

bq. The most basic question about the interface would be how to present groups. Do we stick
with a linear document list and supplement that with extra info in a different part of the
response (as the current approach takes)? Or stick that extra info in with some of the documents
somehow? Or if collapse=true, replace the list of documents with a list of groups, each which
can contain many documents? Which will be easiest for clients to deal with? If you were starting
from scratch and didn't have to deal with any of Solr's current shortcomings, what would it
look like?

I think the latter would make more sense, because field-collapsing does change the search
result. It would just make it more obvious.

bq. Is there a way to specify the number of groups that I want back instead of the number
of documents?
No there is not, but if the list of documents is replaced with a list of groups then the rows
parameter should be used to indicate the number of groups to be displayed instead the number
of documents to be displayed.

Just one thought I had about the algorithm you propose. If you only create collapse groups
for the top ten documents then what about the total count of the search? Unique documents
outside the top ten documents are not being grouped (if I understand you correctly) and that
would impact the total count with how it currency works.

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch,
collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch,
field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch,
SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message