lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-1854) wrong calc of numFound in DistributedSearch
Date Wed, 22 Aug 2012 13:31:42 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439482#comment-13439482
] 

Yonik Seeley commented on SOLR-1854:
------------------------------------

Distributed search removes *some* duplicates (only those in the top N it retrieves from each
shard).
Duplicate docs are actually an error condition and Solr is just trying to degrade gracefully.
                
> wrong calc of numFound in DistributedSearch
> -------------------------------------------
>
>                 Key: SOLR-1854
>                 URL: https://issues.apache.org/jira/browse/SOLR-1854
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Lutz Pumpenmeier
>
> When I search two indices with shard param in a distributed search, the numFound parameter
in the result is incorrect when the count of the found rows in the second index is smaller
than the &rows parameter in the query string and there are many identical hits in both
index.
> Simple example: use the same index for both shards. try a distributed search with a query
that will find lets say 100 hits in each index. numFound will be 190,  if default for rows
is 10. It should be 100. If you add &rows=200 to the query string, numFound is correct.
> I think the error is in QueryComponent.mergeIds:
>  for (int i=0; i<docs.size(); i++) {
>           SolrDocument doc = docs.get(i);
>           Object id = doc.getFieldValue(uniqueKeyField.getName());
>           String prevShard = uniqueDoc.put(id, srsp.getShard());
>           if (prevShard != null) {
>             // duplicate detected
>             numFound--;
> because the comparison for identical ids is only done for doc.size() documents.
> thanks
>  lutz

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message