lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Laird (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Thu, 14 Jun 2012 06:44:42 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294837#comment-13294837
] 

Andy Laird commented on SOLR-2592:
----------------------------------

I have tried out Michael's patch and would like to provide some feedback to the community.
 We are using a very-recent build from the 4x branch but I grabbed this patch from trunk and
tried it out anyway...

Our needs were driven by the fact that, currently, the counts returned when using field collapse
are only accurate when the documents getting collapsed together are all on the same shard
(see comments for https://issues.apache.org/jira/browse/SOLR-2066).  For our case we collapse
on a field, xyz, so we need to ensure that all documents with the same value for xyz are on
the same shard (overall distribution is not a problem here) if we want counting to work.

I grabbed the latest patch (dbq_fix.patch) in hopes of finding a solution to our problem.
 The great news is that Michael's patch worked like a charm for what we needed -- thank you
kindly, Michael, for this effort!  The not-so-good news is that for our particular issue we
needed a way to get at data other than the uniqueKey (the only data available with ShardKeyParser)
-- in our case we need access to the xyz field data.  Since this implementation provides nothing
but uniqueKey we had to encode the xyz data in our uniqueKey (e.g. newUniqueKey = what-used-to-be-our-uniqueKey
+ xyz), which is certainly less-than-ideal and adds unsavory coupling.

Nonetheless, as a fix to a last-minute gotcha (our counts with field collapse need to be accurate
in a multi-shard environment) I was happily surprised at how easy it was to find a solution
to our particular problem with this patch.  I would definitely like to see a second iteration
that incorporates the ability to get at other document data, then you could do whatever you
want by looking at dates and other fields, etc. though I understand that that probably goes
quite a bit deeper in the codebase, especially with distributed search.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message