lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Document Routing
Date Thu, 01 Jun 2017 16:14:10 GMT
Can you check if those IDs are on shard8? You can do this by pointing
the URL at the core and specifying &distrib=false...

Best,
Erick

On Thu, Jun 1, 2017 at 1:42 AM, Amrit Sarkar <sarkaramrit2@gmail.com> wrote:
> Sorry, The confluence link:
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Thu, Jun 1, 2017 at 2:11 PM, Amrit Sarkar <sarkaramrit2@gmail.com> wrote:
>
>> Sathyam,
>>
>> It seems your interpretation is wrong as CloudSolrClient calculates
>> (hashes the document id and determine the range it belongs to) which shard
>> the document incoming belongs to. As you have 10 shards, the document will
>> belong to one of them, that is what being calculated and eventually pushed
>> to the leader of that shard.
>>
>> The confluence link provides the insights in much detail:
>> https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
>> Another useful link: https://lucidworks.com/2013/06/13/solr-cloud-
>> document-routing/
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Thu, Jun 1, 2017 at 11:52 AM, Sathyam <sathyam.doraswamy@gmail.com>
>> wrote:
>>
>>> HI,
>>>
>>> I am indexing documents to a 10 shard collection (testcollection, having
>>> no
>>> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a
>>> lot
>>> of peer to peer document distribution going on when I looked at the solr
>>> logs.
>>>
>>> An example log statement is as follows:
>>> 2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
>>> s:shard8 r:core_node7 x:testcollection_shard8_replica1]
>>> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
>>>  webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
>>> http://10.199.42.29:8983/solr/testcollection_shard7_replica1
>>> /&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
>>> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
>>> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
>>> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25
>>>
>>> When I went through the code of CloudSolrClient on grepcode I saw that the
>>> client itself finds out which server it needs to hit by using the message
>>> id hash and getting the shard range information from state.json.
>>> Then it is quite confusing to me why there is a distribution of data
>>> between peers as there is no replication and each shard is a leader.
>>>
>>> I would like to know why this is happening and how to avoid it or if the
>>> above log statement means something else and I am misinterpreting
>>> something.
>>>
>>> --
>>> Sathyam Doraswamy
>>>
>>
>>

Mime
View raw message