lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaroslaw Rozanski ...@jarekrozanski.com>
Subject Re: Separating Search and Indexing in SolrCloud
Date Sat, 17 Dec 2016 20:54:53 GMT
Hi Erick,

So what does this buffer represent? What does it actually store? Raw
update request or analyzed document?

The documentation suggest that it stores actual update requests.

Obviously analyzed document can and will occupy much more space than raw
one. Also analysis with create a lot of new allocations and subsequent
GC work.

Yes, you are probably right that search puts more stress and is main
memory user but combination of:
- non-trivial analysis,
- high volume of updates and
- search on the same node

seems adding fuel to the fire.

From previous response by Pushkar, it is clear that separation is not
achievable with existing SolrCloud mechanism.

Thanks


On 17/12/16 20:24, Erick Erickson wrote:
> bq: I am more concerned with indexing memory requirements at volume
> 
> By and large this isn't much of a problem. RAMBufferSizeMB in
> solrconfig.xml governs how much memory is consumed in Solr for
> indexing. When that limit is exceeded, the buffer is flushed to disk.
> I've rarely heard of indexing being a memory issue. Anecdotally I
> haven't seen throughput benefit with buffer sizes over 128M.
> 
> You're correct in that master/slave style replication would use less
> memory on the slave, although there are other costs. I.e. rather than
> the data for document X being sent to the replicas once as in
> SolrCloud, that data is re-sent to the slave every time it's merged
> into a new segment.
> 
> That said, memory issues are _far_ more prevalent on the search side
> of things so unless this is a proven issue in your environment I would
> fight other fires.....
> 
> Best,
> Erick
> 
> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski <me@jarekrozanski.com> wrote:
>> Thanks, that issue looks interesting!
>>
>> On 16/12/16 16:38, Pushkar Raste wrote:
>>> This kind of separation is not supported yet.  There however some work
>>> going on,  you can read about it on
>>> https://issues.apache.org/jira/browse/SOLR-9835
>>>
>>> This unfortunately would not support soft commits and hence would not be a
>>> good solution for near real time indexing.
>>>
>>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <me@jarekrozanski.com> wrote:
>>>
>>>> Sorry, not what I meant.
>>>>
>>>> Leader is responsible for distributing update requests to replica. So
>>>> eventually all replicas have same state as leader. Not a problem.
>>>>
>>>> It is more about the performance of such. If I gather correctly normal
>>>> replication happens by standard update request. Not by, say, segment copy.
>>>>
>>>> Which means update on leader is as "expensive" as on replica.
>>>>
>>>> Hence, if my understanding is correct, sending search request to replica
>>>> only, in index heavy environment, would bring no benefit.
>>>>
>>>> So the question is: is there a mechanism, in SolrCloud (not legacy
>>>> master/slave set-up) to make one node take a load of indexing which
>>>> other nodes focus on searching.
>>>>
>>>> This is not a question of SolrClient cause that is clear how to direct
>>>> search request to specific nodes. This is more about index optimization
>>>> so that certain nodes (ie. replicas) could suffer less due to high
>>>> volume indexing while serving search requests.
>>>>
>>>>
>>>>
>>>>
>>>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>>>> The leader is the source of truth. You expect to make the replica the
>>>>> source of truth or something???Doesn't make sense?
>>>>> What people do, is send write to leader/master and reads to
>>>> replicas/slaves
>>>>> in other solr/other-dbs.
>>>>>
>>>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <me@jarekrozanski.com
>>>>>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> According to documentation, in normal operation (not recovery) in
Solr
>>>>>> Cloud configuration the leader sends updates it receives to all the
>>>>>> replicas.
>>>>>>
>>>>>> This means and all nodes in the shard perform same effort to index
>>>>>> single document. Correct?
>>>>>>
>>>>>> Is there then a benefit to *not* to send search requests to leader,
but
>>>>>> only to replicas?
>>>>>>
>>>>>> Given index & search heavy Solr Cloud system, is it possible
to separate
>>>>>> search from indexing nodes?
>>>>>>
>>>>>>
>>>>>> RE: Solr 5.5.0
>>>>>>
>>>>>> --
>>>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>
>>>>
>>>
>>
>> --
>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>

-- 
Jaroslaw Rozanski | e: me@jarekrozanski.com
695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D


Mime
View raw message