Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1822918209 for ; Fri, 8 Jan 2016 14:57:19 +0000 (UTC) Received: (qmail 15337 invoked by uid 500); 8 Jan 2016 14:57:15 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 15269 invoked by uid 500); 8 Jan 2016 14:57:15 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 15252 invoked by uid 99); 8 Jan 2016 14:57:15 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jan 2016 14:57:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DC79AC0D2C for ; Fri, 8 Jan 2016 14:57:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.194 X-Spam-Level: **** X-Spam-Status: No, score=4.194 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_HEX=1.313, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id bW5F8lIblGa6 for ; Fri, 8 Jan 2016 14:57:06 +0000 (UTC) Received: from mail-vk0-f45.google.com (mail-vk0-f45.google.com [209.85.213.45]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3F80F43A2B for ; Fri, 8 Jan 2016 14:57:06 +0000 (UTC) Received: by mail-vk0-f45.google.com with SMTP id a123so161813161vkh.1 for ; Fri, 08 Jan 2016 06:57:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hHzr7tmapAJ+ln1FHAo2jvNGsYmJZYdUPDdOutKz4qA=; b=HGlhniYhloW4GN6RmU7LmfZZVr9UOCMbUAcIc9FiygMGnj5cRtkirV5O2d6o+rNizB 0F/orOOywE/XTZfy3wno153f0fdcTlPirR+pXuMqdl2alRGckPSM6NmjwhCdH4iOgVtp lQh7YO6GFxZdu2J1mjb+WJf461slqUEiH+vTVBtsqm1XWttBZINHEhe2XgEHqf+iRM1c Bwdq81Kysf/ax9o4umxAdQCE8cNPzn6BGXQMggSublLX1zZ9Qd1kqH8T6iJhUuSmprgC FS20N8m217dRlJ6JP8mJBAIRytTqHAZ15FuKktFpFl9ZWpdGDAeXL01YXaXsbMnzmbQc ZiAw== MIME-Version: 1.0 X-Received: by 10.31.158.18 with SMTP id h18mr71580591vke.65.1452265019932; Fri, 08 Jan 2016 06:56:59 -0800 (PST) Received: by 10.103.22.135 with HTTP; Fri, 8 Jan 2016 06:56:59 -0800 (PST) In-Reply-To: References: Date: Fri, 8 Jan 2016 15:56:59 +0100 Message-ID: Subject: Re: SOLR replicas performance From: Luca Quarello To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11415f8ca2039a0528d3ccd1 --001a11415f8ca2039a0528d3ccd1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Erick, I used solr5.3.1 and I sincerely expected response times with replica configuration near to response times without replica configuration. Do you agree with me? I read here http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html = that "Queries do not need to be routed to leaders; they can be handled by any replica in a shard. Leaders are only needed for handling update requests. " I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas on VM2 but analyzing core utilization during a request is 100% on both machines. Why? Best, Luca On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson wrote: > What version of Solr? Prior to 5.2 the replicas were doing lots of > unnecessary work/being blocked, see: > > https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-= twice-fast/ > > Best, > Erick > > On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla > wrote: > > Hi Luca, > > not sure if I understood well. Your question is > > "Why are index times on a solr cloud collecton with 2 replicas higher > than > > on solr cloud with 1 replica" right? > > Well with 2 replicas all docs have to be deparately indexed in 2 places > and > > solr has to confirm that both indexing went well. > > Indexing times are lower on a solrcloud collection with 2 shards (just > one > > replica, the leader, per shard) because docs are indexed just once and > the > > load is spread on 2 servers instead of one > > > > 2015-12-30 2:03 GMT+01:00 Luca Quarello : > > > >> Hi, > >> > >> I have an 260M documents index (90GB) with this structure: > >> > >> > >> >> multiValued=3D"false" termVectors=3D"false" termPositions=3D"false" > >> termOffsets=3D"false" /> > >> > >> >> multiValued=3D"false"/> > >> > >> >> stored=3D"true" multiValued=3D"false"/> > >> > >> >> multiValued=3D"false"/> > >> > >> stored=3D"true" > >> multiValued=3D"false"/> > >> > >> >> multiValued=3D"false"/> > >> > >> >> multiValued=3D"false"/> > >> > >> >> multiValued=3D"false"/> > >> > >> > >> > >> >> multiValued=3D"true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> where the fragmetnt field contains XML messagges. > >> > >> There is a search function that provide the messagges satisfying a > search > >> criterion. > >> > >> > >> TARGET: > >> > >> To find the best configuration to optimize the response time of a two > solr > >> instances cloud with 2 VM with 8 core and 32 GB > >> > >> > >> TEST RESULTS: > >> > >> > >> 1. > >> > >> Configurations: > >> 1. > >> > >> the better configuration without replicas > >> - CONF1: 16 shards of 17M documents (8 per VM) > >> 1. > >> > >> configuration with replica > >> - CONF 2: 8 shards of 35M documents with replication factor of 1 > >> - CONF 3: 16 shards of 35M documents with replication factor > of 1 > >> > >> > >> > >> 1. > >> > >> Executed tests > >> > >> > >> - sequential requests > >> - 5 parallel requests > >> - 10 parallel requests > >> - 20 parallel requests > >> > >> in two scenarios: during an indexing phase and not > >> > >> > >> Call are: http://localhost:8983/solr/sepa/select? > >> q=3D+fragment%3A*AAA*+&fq=3Dmarked%3AT&fq=3D-fragmentContentType > >> %3ABULK&start=3D0&rows=3D100&sort=3DcreationTimestamp+desc%2Cid+asc > >> > >> > >> 1. > >> > >> Test results > >> > >> All the test have point out an I/O utilization of 100MB/s > during > >> > >> loading data on disk cache, disk cache utilization of 20GB and core > >> utilization of 100% (all 8 cores) > >> > >> > >> > >> - > >> > >> No indexing > >> - > >> > >> CONF1 (time average and maximum time) > >> - > >> > >> sequential: 4,1 6,9 > >> - > >> > >> 5 parallel: 15,6 19,1 > >> - > >> > >> 10 parallel: 23,6 30,2 > >> - > >> > >> 20 parallel: 48 52,2 > >> - > >> > >> CONF2 > >> - > >> > >> sequential: 12,3 17,4 > >> - > >> > >> 5 parallel: 32,5 34,2 > >> - > >> > >> 10 parallel: 45,4 49 > >> - > >> > >> 20 parallel: 64,6 74 > >> - > >> > >> CONF3 > >> - > >> > >> sequential: 6,9 9,9 > >> - > >> > >> 5 parallel: 33,2 37,5 > >> - > >> > >> 10 parallel: 46 51 > >> - > >> > >> 20 parallel: 68 83 > >> > >> > >> > >> - > >> > >> Indexing (into the solr admin console is it possible to view the > >> total throughput? > >> I find it only relative to a single shard). > >> > >> > >> CONF1 > >> > >> - > >> > >> sequential: 7,7 9,5 > >> - > >> > >> 5 parallel: 26,8 28,4 > >> - > >> > >> 10 parallel: 31,8 37,8 > >> - > >> > >> 20 parallel: 42 52,5 > >> - > >> > >> CONF2 > >> - > >> > >> sequential: 12,3 19 > >> - > >> > >> 5 parallel: 39 40,8 > >> - > >> > >> 10 parallel: 56,6 62,9 > >> - > >> > >> 20 parallel: 79 116 > >> - > >> > >> CONF3 > >> - > >> > >> sequential: 10 18,9 > >> - > >> > >> 5 parallel: 36,5 41,9 > >> - > >> > >> 10 parallel: 63,7 64,1 > >> - > >> > >> 20 parallel: 85 120 > >> > >> > >> > >> I have two question: > >> > >> - > >> > >> the response times of the configuration with replica are worse (in > test > >> case of sequential requests worse of about three time) than the > response > >> times of the configuration without replica. Is it an expected resul= t? > >> - Why during index inserting and updating replicas doesn=E2=80=99t= help to > >> reduce the response time? > >> > --001a11415f8ca2039a0528d3ccd1--