lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Sharding and Replication
Date Sat, 22 Jun 2013 16:23:51 GMT
Yeah, there's been talk of making this configurable, but there are
more pressing priorities so far.

So just to be clear, is this theoretical or practical? I know of several very
high-performance situations where 1,000 updates/sec (and I'm assuming
that it's 1,000 docs/sec not 1,000 batches of 1,000 docs) hasn't caused
problems here. So unless you're actually seeing performance problems
as opposed to fearing that there _might_ be, I'd just go on the to the next
urgent problem.

Best
Erick

On Fri, Jun 21, 2013 at 8:34 PM, Asif <tallasif@gmail.com> wrote:
> Erick,
>
> Thanks for your reply.
>
> You are right about 10 updates being batch up - It was hard to figure out
> due to large number of updates/logging that happens in our system.
>
> We are batching 1000 updates every time.
>
> Here is my observation from leader and replica -
>
> 1. Leader logs are clearly indicating that 1000 updates arrived - [ (1000
> adds)],commit=]
> 2. On replica - for each 1000 document adds on leader - I see a lot of
> requests on replica - with no indication of how many updates in each
> request.
>
> Digging a little bit into Solr code  I figured this variable I am
> interested in - maxBufferedAddsPerServer is set to 10 -
>
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java?view=markup
>
> This means for a batch update of 1000 documents - we will be seeing 100
> requests for replica - which translates into 100 writes per collection per
> second in our system.
>
> Should this variable be made configurable via solrconfig.xml (or any other
> appropriate place)?
>
> A little background about a system we are trying to build - real time
> analytics solution using the Solr Cloud + Atomic updates - we have very
> high amount of writes - going as high as 1000 updates a second (possibly
> more in long run).
>
> - Asif
>
>
>
>
>
> On Sat, Jun 22, 2013 at 4:21 AM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> Update are batched, but it's on a per-request basis. So, if
>> you're sending one document at a time you'll won't get any
>> batching. If you send 10 docs at a time and they happen to
>> go to 10 different shards, you'll get 10 different update
>> requests.
>>
>> If you're sending 1,000 docs per update you' should be seeing
>> some batching going on.
>>
>> bq:  but why not batch them up or give a option to batch N
>> updates in either of the above case
>>
>> I suspect what you're seeing is that you're not sending very
>> many docs per update request and so are being mislead.
>>
>> But that's a guess since you haven't provided much in the
>> way of data on _how_ you're updating.
>>
>> bq: the cloud eventually starts to fail
>> How? Details matter.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 19, 2013 at 4:23 AM, Asif <tallasif@gmail.com> wrote:
>> > Hi,
>> >
>> > I had questions on implementation of Sharding and Replication features of
>> > Solr/Cloud.
>> >
>> > 1. I noticed that when sharding is enabled for a collection - individual
>> > requests are sent to each node serving as a shard.
>> >
>> > 2. Replication too follows above strategy of sending individual documents
>> > to the nodes serving as a replica.
>> >
>> > I am working with a system that requires massive number of writes - I
>> have
>> > noticed that due to above reason - the cloud eventually starts to fail
>> > (Even though I am using a ensemble).
>> >
>> > I do understand the reason behind individual updates - but why not batch
>> > them up or give a option to batch N updates in either of the above case
>> - I
>> > did come across a presentation that talked about batching 10 updates for
>> > replication at least, but I do not think this is the case.
>> > - Asif
>>

Mime
View raw message