lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amrit Sarkar <sarkaramr...@gmail.com>
Subject Re: Solr Document Routing
Date Thu, 01 Jun 2017 08:42:07 GMT
Sorry, The confluence link:
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 2:11 PM, Amrit Sarkar <sarkaramrit2@gmail.com> wrote:

> Sathyam,
>
> It seems your interpretation is wrong as CloudSolrClient calculates
> (hashes the document id and determine the range it belongs to) which shard
> the document incoming belongs to. As you have 10 shards, the document will
> belong to one of them, that is what being calculated and eventually pushed
> to the leader of that shard.
>
> The confluence link provides the insights in much detail:
> https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
> Another useful link: https://lucidworks.com/2013/06/13/solr-cloud-
> document-routing/
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Thu, Jun 1, 2017 at 11:52 AM, Sathyam <sathyam.doraswamy@gmail.com>
> wrote:
>
>> HI,
>>
>> I am indexing documents to a 10 shard collection (testcollection, having
>> no
>> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a
>> lot
>> of peer to peer document distribution going on when I looked at the solr
>> logs.
>>
>> An example log statement is as follows:
>> 2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
>> s:shard8 r:core_node7 x:testcollection_shard8_replica1]
>> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
>>  webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
>> http://10.199.42.29:8983/solr/testcollection_shard7_replica1
>> /&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
>> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
>> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
>> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25
>>
>> When I went through the code of CloudSolrClient on grepcode I saw that the
>> client itself finds out which server it needs to hit by using the message
>> id hash and getting the shard range information from state.json.
>> Then it is quite confusing to me why there is a distribution of data
>> between peers as there is no replication and each shard is a leader.
>>
>> I would like to know why this is happening and how to avoid it or if the
>> above log statement means something else and I am misinterpreting
>> something.
>>
>> --
>> Sathyam Doraswamy
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message