lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Karich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Thu, 04 Mar 2010 09:46:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841147#action_12841147
] 

Peter Karich commented on SOLR-236:
-----------------------------------

regarding the OutOfMemory problem: we are now testing the suggested change in production.

I replaced the float array with a TreeMap<Integer, Float>. The change was nearly trivial
(I cannot provide a patch easily, because we are using an older patch, althoug I could post
the 3 changed files.)

The point why I used a TreeMap instead a HashMap was that in the method advance in the class
NonAdjacentDocumentCollapser.PredefinedScorer I needed the tailMap method:

{noformat} 
public int advance(int target) throws IOException {
            // now we need a treemap method:
            iter = scores.tailMap(target).entrySet().iterator();
            if (iter.hasNext())
                return target;
            else
                return NO_MORE_DOCS;
}
{noformat} 

Then -  I think - I discovered a bug/inconsistent behaviour: If I run the test FieldCollapsingIntegrationTest.testNonAdjacentCollapse_withFacetingBefore
then the scores arrays will be created ala new float[maxDocs] in the old version. But the
array will never be filled with some values so Float value1 = values.get(doc1); will return
null in the method NonAdjacentDocumentCollapser.FloatValueFieldComparator.compare (the size
of TreeMap is 0!); I work around this via 

{noformat} 

if (value1 == null)
                value1 = 0f;
if (value2 == null)
                value2 = 0f;

{noformat} 

although the compare method should be called if no docs are in the scores array ... ?

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.5
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch,
collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch,
field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message