lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Separating Search and Indexing in SolrCloud
Date Sat, 17 Dec 2016 20:24:24 GMT
bq: I am more concerned with indexing memory requirements at volume

By and large this isn't much of a problem. RAMBufferSizeMB in
solrconfig.xml governs how much memory is consumed in Solr for
indexing. When that limit is exceeded, the buffer is flushed to disk.
I've rarely heard of indexing being a memory issue. Anecdotally I
haven't seen throughput benefit with buffer sizes over 128M.

You're correct in that master/slave style replication would use less
memory on the slave, although there are other costs. I.e. rather than
the data for document X being sent to the replicas once as in
SolrCloud, that data is re-sent to the slave every time it's merged
into a new segment.

That said, memory issues are _far_ more prevalent on the search side
of things so unless this is a proven issue in your environment I would
fight other fires.....

Best,
Erick

On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski <me@jarekrozanski.com> wrote:
> Thanks, that issue looks interesting!
>
> On 16/12/16 16:38, Pushkar Raste wrote:
>> This kind of separation is not supported yet.  There however some work
>> going on,  you can read about it on
>> https://issues.apache.org/jira/browse/SOLR-9835
>>
>> This unfortunately would not support soft commits and hence would not be a
>> good solution for near real time indexing.
>>
>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <me@jarekrozanski.com> wrote:
>>
>>> Sorry, not what I meant.
>>>
>>> Leader is responsible for distributing update requests to replica. So
>>> eventually all replicas have same state as leader. Not a problem.
>>>
>>> It is more about the performance of such. If I gather correctly normal
>>> replication happens by standard update request. Not by, say, segment copy.
>>>
>>> Which means update on leader is as "expensive" as on replica.
>>>
>>> Hence, if my understanding is correct, sending search request to replica
>>> only, in index heavy environment, would bring no benefit.
>>>
>>> So the question is: is there a mechanism, in SolrCloud (not legacy
>>> master/slave set-up) to make one node take a load of indexing which
>>> other nodes focus on searching.
>>>
>>> This is not a question of SolrClient cause that is clear how to direct
>>> search request to specific nodes. This is more about index optimization
>>> so that certain nodes (ie. replicas) could suffer less due to high
>>> volume indexing while serving search requests.
>>>
>>>
>>>
>>>
>>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>>> The leader is the source of truth. You expect to make the replica the
>>>> source of truth or something???Doesn't make sense?
>>>> What people do, is send write to leader/master and reads to
>>> replicas/slaves
>>>> in other solr/other-dbs.
>>>>
>>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <me@jarekrozanski.com
>>>>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> According to documentation, in normal operation (not recovery) in Solr
>>>>> Cloud configuration the leader sends updates it receives to all the
>>>>> replicas.
>>>>>
>>>>> This means and all nodes in the shard perform same effort to index
>>>>> single document. Correct?
>>>>>
>>>>> Is there then a benefit to *not* to send search requests to leader, but
>>>>> only to replicas?
>>>>>
>>>>> Given index & search heavy Solr Cloud system, is it possible to separate
>>>>> search from indexing nodes?
>>>>>
>>>>>
>>>>> RE: Solr 5.5.0
>>>>>
>>>>> --
>>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>
>>>
>>
>
> --
> Jaroslaw Rozanski | e: me@jarekrozanski.com
> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>

Mime
View raw message