lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten Sperling (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-236) Field collapsing
Date Tue, 06 Nov 2007 22:06:51 GMT

     [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karsten Sperling updated SOLR-236:
----------------------------------

    Attachment: field-collapsing-extended-592129.patch

I've done some work on the field collapsing patch and made some additions and changes and
posting this patch (against revision 592129) here for discussion.

- Added a collapse.facet = before|after parameter to control if faceting happens before or
after collapsing.
- Changed collapse.max to collapse.threshold -- this value controls after which number of
collapsible hits collapsing actually kicks in (collapse.max is still supported as an alias).
- Added a collapse.maxdocs parameter that limits the number of documents that CollapseFilter
will process to create the filter DocSet. The intention of this is to be able to limit the
time collapsing will take for very large result sets (obviously at the expense of accurate
collapsing in those cases).
- Inverted the logic of the filter DocSet created by CollapseFilter to contain the documents
that are to be collapsed instead of the ones that are to be kept. Without this collapse.maxdocs
doesn't work.
- Added collapse.info.doc and collapse.info.count parameters to provide more control over
what gets returned in the collapse_counts extra results.
- Made a minimal change to SolrIndexSearcher.getDocListC() to support passing both the filter
and filterList parameters. In most cases this was already handled anyway.
- Did some general refactoring and added comments and a test case.

If somebody with deeper Solr/Lucene knowledge could review these changes it would be much
appreciated.

Karsten


> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>         Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
field_collapsing_1.3.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236-FieldCollapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message