Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D579410DFD for ; Fri, 9 Aug 2013 20:20:34 +0000 (UTC) Received: (qmail 12630 invoked by uid 500); 9 Aug 2013 20:20:31 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 12576 invoked by uid 500); 9 Aug 2013 20:20:31 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 12568 invoked by uid 99); 9 Aug 2013 20:20:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Aug 2013 20:20:31 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=6933218E39=Alexey_Kozhemiakin@epam.com designates 217.21.63.34 as permitted sender) Received: from [217.21.63.34] (HELO EVBYMINSA0038.epam.com) (217.21.63.34) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Aug 2013 20:20:24 +0000 Received: from EPBYMINSA0032.epam.com ([169.254.2.70]) by EVBYMINSA0038.epam.com ([10.6.110.8]) with mapi id 14.02.0328.009; Fri, 9 Aug 2013 23:17:26 +0300 From: Alexey Kozhemiakin To: "solr-user@lucene.apache.org" Subject: RE: Sharding and Replication Thread-Topic: Sharding and Replication Thread-Index: AQHObN+QTWAuLvKlf0CaWIYrRqRhiZlAl5eAgABPGYCAANcJgIAA19qAgADQAgCASgDpYA== Date: Fri, 9 Aug 2013 20:17:24 +0000 Message-ID: References: In-Reply-To: Accept-Language: ru-RU, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.6.17.203] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org +1 I'd like to vote for this issue https://issues.apache.org/jira/browse/SO= LR-4956 It would be useful to have this parameters configurable. When we index hundreds of millions of documents to 4 shard SolrCloud in ba= tches of 20K - overhead of this chatty conversation with replicas and othe= r shards is significant, we didn't perform detailed measurements, but incre= ase of this hardcoded value improved our indexing throughput from 1.2mln up= to 3mln docs per minute. It agree that in general case it is more correct to reduce the value, but i= t would be nice to control it for specific cases and environments. Alex -----Original Message----- From: Erick Erickson [mailto:erickerickson@gmail.com]=20 Sent: Sunday, June 23, 2013 20:41 To: solr-user@lucene.apache.org Subject: Re: Sharding and Replication Asif: Thanks, this is great info and may add to the priority of making this confi= gurable. I raised a JIRA, see: https://issues.apache.org/jira/browse/SOLR-4956 and feel free to add anything you'd like or correct anything I didn't get r= ight. Best Erick On Sat, Jun 22, 2013 at 10:16 PM, Asif wrote: > Erick, > > Its a completely practical problem - we are exploring Solr to build a=20 > real time analytics/data solution for a system handling about 1000=20 > qps. We have various metrics that are stored as different collections=20 > on the cloud, which means very high amount of writes. The cloud also=20 > needs to support about 300-400 qps. > > We initially tested with a single Solr node on a 16 core / 24 GB box =20 > for a single metric. We saw that writes were not a issue at all - Solr=20 > was handling it extremely well. We were also able to achieve about 200=20 > qps from a single node. > > When we set up the cloud ( a ensemble on 6 boxes), we saw very high=20 > CPU usage on the replicas. Up to 10 cores were getting used for writes=20 > on the replicas. Hence my concern with respect to batch updates for the r= eplicas. > > BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU=20 > usage is very similar to single node installation. > > - Asif > > > > > > > On Sat, Jun 22, 2013 at 9:53 PM, Erick Erickson = wrote: > >> Yeah, there's been talk of making this configurable, but there are=20 >> more pressing priorities so far. >> >> So just to be clear, is this theoretical or practical? I know of=20 >> several very high-performance situations where 1,000 updates/sec (and=20 >> I'm assuming that it's 1,000 docs/sec not 1,000 batches of 1,000=20 >> docs) hasn't caused problems here. So unless you're actually seeing=20 >> performance problems as opposed to fearing that there _might_ be, I'd=20 >> just go on the to the next urgent problem. >> >> Best >> Erick >> >> On Fri, Jun 21, 2013 at 8:34 PM, Asif wrote: >> > Erick, >> > >> > Thanks for your reply. >> > >> > You are right about 10 updates being batch up - It was hard to=20 >> > figure out due to large number of updates/logging that happens in our = system. >> > >> > We are batching 1000 updates every time. >> > >> > Here is my observation from leader and replica - >> > >> > 1. Leader logs are clearly indicating that 1000 updates arrived - [=20 >> > (1000 adds)],commit=3D] 2. On replica - for each 1000 document adds=20 >> > on leader - I see a lot of requests on replica - with no indication=20 >> > of how many updates in each request. >> > >> > Digging a little bit into Solr code I figured this variable I am=20 >> > interested in - maxBufferedAddsPerServer is set to 10 - >> > >> > >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/ >> apache/solr/update/SolrCmdDistributor.java?view=3Dmarkup >> > >> > This means for a batch update of 1000 documents - we will be seeing=20 >> > 100 requests for replica - which translates into 100 writes per=20 >> > collection >> per >> > second in our system. >> > >> > Should this variable be made configurable via solrconfig.xml (or=20 >> > any >> other >> > appropriate place)? >> > >> > A little background about a system we are trying to build - real=20 >> > time analytics solution using the Solr Cloud + Atomic updates - we=20 >> > have very high amount of writes - going as high as 1000 updates a=20 >> > second (possibly more in long run). >> > >> > - Asif >> > >> > >> > >> > >> > >> > On Sat, Jun 22, 2013 at 4:21 AM, Erick Erickson=20 >> >> >wrote: >> > >> >> Update are batched, but it's on a per-request basis. So, if you're=20 >> >> sending one document at a time you'll won't get any batching. If=20 >> >> you send 10 docs at a time and they happen to go to 10 different=20 >> >> shards, you'll get 10 different update requests. >> >> >> >> If you're sending 1,000 docs per update you' should be seeing some=20 >> >> batching going on. >> >> >> >> bq: but why not batch them up or give a option to batch N updates=20 >> >> in either of the above case >> >> >> >> I suspect what you're seeing is that you're not sending very many=20 >> >> docs per update request and so are being mislead. >> >> >> >> But that's a guess since you haven't provided much in the way of=20 >> >> data on _how_ you're updating. >> >> >> >> bq: the cloud eventually starts to fail How? Details matter. >> >> >> >> Best >> >> Erick >> >> >> >> On Wed, Jun 19, 2013 at 4:23 AM, Asif wrote: >> >> > Hi, >> >> > >> >> > I had questions on implementation of Sharding and Replication >> features of >> >> > Solr/Cloud. >> >> > >> >> > 1. I noticed that when sharding is enabled for a collection - >> individual >> >> > requests are sent to each node serving as a shard. >> >> > >> >> > 2. Replication too follows above strategy of sending individual >> documents >> >> > to the nodes serving as a replica. >> >> > >> >> > I am working with a system that requires massive number of=20 >> >> > writes - I >> >> have >> >> > noticed that due to above reason - the cloud eventually starts=20 >> >> > to fail (Even though I am using a ensemble). >> >> > >> >> > I do understand the reason behind individual updates - but why=20 >> >> > not >> batch >> >> > them up or give a option to batch N updates in either of the=20 >> >> > above >> case >> >> - I >> >> > did come across a presentation that talked about batching 10=20 >> >> > updates >> for >> >> > replication at least, but I do not think this is the case. >> >> > - Asif >> >> >>