lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Tuška (JIRA) <j...@apache.org>
Subject [jira] Commented: (SOLR-236) Field collapsing
Date Wed, 04 Aug 2010 15:41:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895297#action_12895297
] 

David Tuška commented on SOLR-236:
----------------------------------

Hello, I find some bug in "Field collapsing",
I will tested it for solr-1.4.1-patch and try test for trunk-patch(rev.955615) too.

1) No collapse_counts/results will be returned if collapseCount==1, 
although no-collapse will be returned.

http://localhost:8080/solr_tour/select/?q=nl_counter%3A1%0D%0A&start=0&rows=10&indent=on&sort=c_price_from_orig+asc&collapse.field=nl_tour_id&collapse.threshold=1&collapse.type=adjacent&collapse.debug=true

{code:xml} 
<lst name="collapse_counts">
  <str name="field">nl_tour_id</str>
  <lst name="results"/>
  <lst name="debug">
    <str name="Docset type">HashDocSet(26)</str>
    <long name="Total collapsing time(ms)">0</long>
    <long name="Create uncollapsed docset(ms)">0</long>
    <long name="Get fieldvalues from fieldcache (ms)">0</long>
    <long name="AdjacentDocumentCollapser collapsing time(ms)">0</long>
    <long name="Creating collapseinfo time(ms)">0</long>
    <long name="Convert to bitset time(ms)">0</long>
    <long name="Create collapsed docset time(ms)">0</long>
  </lst>
</lst>
<result name="response" numFound="26" start="0">
10x <doc></doc> 
...
{code}

If I look into code, I find some problematic part of code:

In NonAdjacentDocumentCollapser.java in function doCollapsing is bad condition and priorityQueue:

{code:title=NonAdjacentDocumentCollapser.java}
protected void doCollapsing(DocSet uncollapsedDocset, FieldCache.StringIndex values) {

  for (DocIterator i = uncollapsedDocset.iterator(); i.hasNext();) {
    int currentId = i.nextDoc();
    String currentValue = values.lookup[values.order[currentId]];

    NonAdjacentCollapseGroup collapseDoc = collapsedDocs.get(currentValue);

    if (collapseDoc == null) {
      ..
    }

    Integer dropOutId = (Integer) collapseDoc.priorityQueue.insertWithOverflow(currentId);

    // IMHO HERE must be >= NO > !!!!
    if (++collapseDoc.totalCount > collapseThreshold) {
      collapseDoc.collapsedDocuments++;

      //HERE IS PROBLEM TOO, if doc is only one, then is not returned by collapseDoc.priorityQueue.insertWithOverflow
for collapse.threshold=1
      if (dropOutId != null)
      {
        for (CollapseCollector collector : collectors) {
          collector.documentCollapsed(dropOutId, collapseDoc, collapseContext);
        }
      }
    }
}
{code} 

In AdjacentDocumentCollapser.java in doCollapsing is problem in Initializing condition, 
if doc is only one, then only inicializing condition is process, else-if, else part not will
be processed and collector.documentCollapsed or collector.documentHead not will be call.


{code:title=NonAdjacentDocumentCollapser.java}
protected void doCollapsing(DocSet uncollapsedDocset, FieldCache.StringIndex values) {
  ...
  String collapseValue = null;
  ...
  for (DocIterator i = uncollapsedDocset.iterator(); i.hasNext();) {
    int currentId = i.nextDoc();
    String currentValue = values.lookup[values.order[currentId]];

    // Initializing
    if (collapseValue == null) {
      repeatCount = 0;
      collapseCount = 0;
      collapseId = currentId;
      collapseValue = currentValue;

      // Collapse the document if the field value is the same and
      // we have a run of at least collapseThreshold uncollapsedDocset.
    }
    //IMHO HERE MUST BE if NO else-if !!!!    
    else if (collapseValue.equals(currentValue))
    {
      if (++repeatCount >= collapseThreshold) {
        collapseCount++;
        for (CollapseCollector collector : collectors) {
          CollapseGroup valueToCollapse = new AdjacentCollapseGroup(collapseId, currentValue);
          collector.documentCollapsed(currentId, valueToCollapse, collapseContext);
        }
      } else {
        addDoc(currentId);
      }
    }
    else
    {
      ...
    }
    ...
  }
  ...
}
{code} 

2) I have problem with sorting, I need sort CollapseGroup by c_price_from_orig field, 
but if I have in request "sort=c_price_from_orig+asc",
returned CollapseGroup will be sorted by c_price_from_orig (minimum of collapsed doc in group),
but some CollapseGroup will be skiped and doc with c_price_from_orig will not be returned
firts !!!

I try debug this problem and report this better.


thanks for your reply,
sorry for my english and

best regards
David

> Field collapsing
> ----------------
>
>                 Key: SOLR-236
>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Emmanuel Keller
>            Assignee: Shalin Shekhar Mangar
>             Fix For: Next
>
>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch,
collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java,
field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch,
field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java,
NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-1_4_1.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch,
SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given field to
a single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message