lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1682) Implement CollapseComponent
Date Tue, 12 Jan 2010 12:42:54 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799180#action_12799180
] 

Shalin Shekhar Mangar commented on SOLR-1682:
---------------------------------------------

bq. Shalin, I tried your patch out and I ran into a few problems with sorting and the collapse
counts which turned out to be bugs. 

Thanks Martijn.

{quote}
Though I have a question about the response format. When collapse.threshold is > 1 and
more than one documents is collapsed then the collapse.count is named group.size. The field
group.numFound is then added as well. Why did you gave it a different name?
{quote}

Actually I intended to rename "collapse.value" to "group.value" and "collapse.count" to "group.numFound"
but I forgot to do it in both the places.
* group.numFound = the total number of documents belonging to this group (i.e. have the same
group.value)
* group.size = the number of documents in this result set belonging to the same group  which
is equal to min(group.numFound, collapse.threshold)

So when collapse.threshold = 1, group.size=1 and group.numFound will be equal to the number
of documents in the same group. Suppose collapse.threshold = 5, but group.numFound=4 then
group.size=4. The group.size is required to read all docs belonging to the same group without
having to maintain a set. Let me know if you have suggestions for a better name than these.

{quote}
When collapse.threshold is larger than one two collectors are used. I understand that in both
situations a different algorithm is used. But now also a search is done twice. Shouldn't it
be better to have two complete distinct collectors that don't depend on one another?
{quote}

We can have distinct collectors. The CollapsedDocCollector uses some of the data that TopGroupCollector
gathers and that is why it uses it directly. We could keep references to the individual objects
that are needed too. As I said, this is just a PoC and not the final design.

I'll give a new patch with the names fixed for both the cases.

> Implement CollapseComponent
> ---------------------------
>
>                 Key: SOLR-1682
>                 URL: https://issues.apache.org/jira/browse/SOLR-1682
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>            Reporter: Martijn van Groningen
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: field-collapsing.patch, SOLR-1682.patch, SOLR-236.patch
>
>
> Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all
its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is
the finalize the request parameters and response format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message