lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trey <solrt...@gmail.com>
Subject Re: [jira] Commented: (SOLR-236) Field collapsing
Date Tue, 09 Feb 2010 02:25:40 GMT
I also think the isTokenized() check/exception should be removed.  It is
probably a common use-case to have a single-valued "tokenized" field - i.e.
a case insensitive string (a text field where the only filter applied is a
LowerCaseFilterFactory).  I think that as long as it's documented that field
collapsing "doesn't work" for fields with multiple tokens then it shouldn't
be an issue.  That certainly seems better to me than preventing a perfectly
valid use case, since you wouldn't get any results anyway.


 if (schemaField.getType().
isTokenized()) {
   throw new RuntimeException("Could not collapse, because collapse field is
tokenized");
 }

I agree that it would be better to "check" if the field has multiple values
or not.  In the mean-time, though, perhaps the "remove the check and log a
warning" approach would suffice?


-Trey


On Tue, Jan 19, 2010 at 5:46 AM, Martijn van Groningen (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802186#action_12802186]
>
> Martijn van Groningen commented on SOLR-236:
> --------------------------------------------
>
> If the field is tokenized and has more than one token your field collapse
> result will become incorrect. What happens if I remember correctly is that
> it will only collapse on the field's last token. This off course leads to
> weird collapse groups. For the users that only have one token per collapse
> field are because of this check out of luck. Somehow I think we should make
> the user know that is not possible to collapse on a tokenized field (at
> least with multiple tokens). Maybe adding a warning in the response. Still I
> think the exception is more clear, but also prohibits it off course.
>
> bq. Or someone could come after me and write a patch that checks for
> multi-tokened fields somehow and throws an exception.
> Checking if a tokenized field contains only one token is really
> inefficient, because you have the check all every collapse field of all
> documents. Now do check is done based on the field's definition in the
> schema.
>
> > Field collapsing
> > ----------------
> >
> >                 Key: SOLR-236
> >                 URL: https://issues.apache.org/jira/browse/SOLR-236
> >             Project: Solr
> >          Issue Type: New Feature
> >          Components: search
> >    Affects Versions: 1.3
> >            Reporter: Emmanuel Keller
> >            Assignee: Shalin Shekhar Mangar
> >             Fix For: 1.5
> >
> >         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
> collapsing-patch-to-1.3.0-ivan.patch,
> collapsing-patch-to-1.3.0-ivan_2.patch,
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-5.patch, field-collapse-5.patch,
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
> SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch,
> SOLR-236_collapsing.patch
> >
> >
> > This patch include a new feature called "Field collapsing".
> > "Used in order to collapse a group of results with similar value for a
> given field to a single entry in the result set. Site collapsing is a
> special case of this, where all results for a given web site is collapsed
> into one or two entries in the result set, typically with an associated
> "more documents from this site" link. See also Duplicate detection."
> > http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> > The implementation add 3 new query parameters (SolrParams):
> > "collapse.field" to choose the field used to group results
> > "collapse.type" normal (default value) or adjacent
> > "collapse.max" to select how many continuous results are allowed before
> collapsing
> > TODO (in progress):
> > - More documentation (on source code)
> > - Test cases
> > Two patches:
> > - "field_collapsing.patch" for current development version
> > - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> > P.S.: Feedback and misspelling correction are welcome ;-)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message