lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Black (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-3109) group=true requests result in numerous redundant shard requests
Date Wed, 08 Feb 2012 22:40:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204065#comment-13204065
] 

Russell Black edited comment on SOLR-3109 at 2/8/12 10:39 PM:
--------------------------------------------------------------

Martijn, I also noticed that {{TopGroupsShardResponseProcessor}} can't to deal with multiple
ShardRequests (although it looks like it wouldn't be to hard to add this ability).  At any
rate, your approach of returning a single ShardRequest containing all relevant shards sounds
like the right one.  I went one step further and refactored {{TopGroupsShardRequestFactory.java}}
because there was significant code duplication in the class's two primary methods.  

In my testing I also discovered a closely related problem.  The bug is in the data structure
used to map search groups to the shards which contain them.  {{ResponseBuilder.searchGroupToShard}}
assumes that a given search group only resides on one shard.  I could not find this assumption
documented anywhere, nor can I find a reason such a restriction need be imposed.  This structure
is populated by {{SearchGroupShardResponseProcessor}}.  There is a race condition there, wherein
the last shard to report a search group will be assumed to be the only shard containing the
search group.  This data structure is used in {{TopGroupsShardRequestFactory.createRequestForSpecificShards()}}
to know which shards to query.  This means you can get a different set of shards to query
depending on shard query order.  

I have changed the structure to allow a search group to be present in multiple shards.  

Patch to follow.  
                
      was (Author: rblack):
    Martijn, I also noticed that {{TopGroupsShardResponseProcessor}} can't to deal with multiple
ShardRequests (although it looks like it wouldn't be to hard to add this ability).  At any
rate, your approach of returning a single ShardRequest containing all relevant shards sounds
like the right one.  I went one step further and refactored {{TopGroupsShardRequestFactory.java}}
because there was significant code duplication in the class's two primary methods.  

In my testing I also discovered a closely related problem.  The bug is in the data structure
used to map search groups to the shards which contain them.  {{ResponseBuilder.searchGroupToShard}}
assumes that a given search group only resides on one shard.  I could not find this assumption
documented anywhere, nor can I find a reason such a restriction need be imposed.  This structure
is populated by {{SearchGroupShardResponseProcessor}}.  There is a race condition there, wherein
the last shard to report a search group will be assumed to be the only shard containing the
search group.  This data structure is used in {{TopGroupsShardRequestFactory.createRequestForSpecificShards()}}
to known which shards to query.  This means you can get a different set of shards to query
depending on shard query order.  

I have changed the structure to allow a search group to be present in multiple shards.  

Patch to follow.  
                  
> group=true requests result in numerous redundant shard requests
> ---------------------------------------------------------------
>
>                 Key: SOLR-3109
>                 URL: https://issues.apache.org/jira/browse/SOLR-3109
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 3.5, 4.0
>         Environment: 64-bit Linux, sharded environment
>            Reporter: Russell Black
>            Assignee: Martijn van Groningen
>            Priority: Critical
>              Labels: patch, performance
>         Attachments: SOLR-3109.patch, SOLR-3109.patch, SOLR-3109.patch
>
>
> During the second phase of a group query, the collator sends a query to each of the shards.
 The purpose of this query is for shards to respond with the doc ids that match the set of
group ids returned from the first phase.  The problem is that it sends this second query to
each shard multiple times.  Specifically, in an environment with n shards, each shard will
be hit with an identical query n times during the second phase of query processing, resulting
in O(_n_ ^2^) performance where _n_ is the number of shards.
> I have traced this bug down to a single line in {{TopGroupsShardRequestFactory.java}},
and I am attaching a patch. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message