lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Will Johnson" <wjohn...@GETCONNECTED.COM>
Subject RE: [jira] Commented: (SOLR-236) Field collapsing
Date Tue, 05 Jun 2007 13:19:55 GMT
I haven't looked at any of the patches but I can comment some other uses
for the feature that are in production today with major vendors.  While
it's used for site collapsing ala google it's also heavily used in
ecommerce settings.  Check out BestBuy.com/circuitcity/etc and do a
search for some really generic word like 'cable' and notice all the
groups of items; BB shows 3 per group, CC shows 1 per group.  In each
case it's not clear that the number of docs is really limited at all, ie
it's more important to get back all the categories with n docs per
category and the counts per category than it is to get back a fixed
number of results or even categories for that matter.  Also notice that
neither of these sites allow you to page through the categorized
results.

I'd also point out that many vendors require the collapsing field to be
an int instead of a string and then force the front end to do the
mapping.  just one more thing to consider....

- will

 

-----Original Message-----
From: Yonik Seeley (JIRA) [mailto:jira@apache.org] 
Sent: Tuesday, June 05, 2007 9:01 AM
To: solr-dev@lucene.apache.org
Subject: [jira] Commented: (SOLR-236) Field collapsing


    [
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.p
lugin.system.issuetabpanels:comment-tabpanel#action_12501550 ] 

Yonik Seeley commented on SOLR-236:
-----------------------------------

I guess adjacent collapsing can make sense when one is sorting by the
field that is being collapsed.

For the normal collapsing though, this patch appears to implement it by
changing the sort order to the collapsing field (normally not desired).
For example, if sorting by relevance and collapsing on a field, one
would normally want the groups sorted by relevance (with the group
relevance defined as the max score of it's members).

As far as how to do paging, it makes sense to rigidly define it in terms
of number of documents, regardless of how many documents are in each
group.  Going back to google, it always displays the first 10 documents,
but a variable number of groups.   That does mean that a group could be
split across pages.  It would actually be much simpler (IMO) to always
return a fixed number of groups rather than a fixed number of documents,
but I don't think this would be less useful to people.  Thoughts?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.2
>            Reporter: Emmanuel Keller
>         Attachments: field_collapsing_1.1.0.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a
given field to a single entry in the result set. Site collapsing is a
special case of this, where all results for a given web site is
collapsed into one or two entries in the result set, typically with an
associated "more documents from this site" link. See also Duplicate
detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed
before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version (1.2)
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message