From solr-user-return-144765-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu Nov 1 14:14:20 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B678D180652 for ; Thu, 1 Nov 2018 14:14:19 +0100 (CET) Received: (qmail 41027 invoked by uid 500); 1 Nov 2018 13:14:17 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 41013 invoked by uid 99); 1 Nov 2018 13:14:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2018 13:14:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 85D07C2C01 for ; Thu, 1 Nov 2018 13:14:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id c4YroWqXFu-a for ; Thu, 1 Nov 2018 13:14:15 +0000 (UTC) Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8A6095F48F for ; Thu, 1 Nov 2018 13:14:14 +0000 (UTC) Received: by mail-lj1-f181.google.com with SMTP id u6-v6so1668782ljd.1 for ; Thu, 01 Nov 2018 06:14:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=nVtStkOJwYU3XqmwlL/uV4iDQRfNBzu4XIYOMUt3rFI=; b=uQ7AekrQdncnV5hLXChMlG9XDoc0wfZtSh74rb98oVPQrcM/TymzlQPox/2msVH45x Vi4Dp79WXGp/wQ+eYmL4BtFQJTqMKn41VkrWNpv/h1YUtPtVuaPgP02cPkgAyzsqyhd5 xJ1R3N1eKSfro2OKTcmhkcp2dhjgjU6ltJWogWcqY6q7cIHu2UzNnnfH87XkGn9C6s9G UFt8whoyZeSjGLpk883iSMsOCxrKkVkoJq5Qh5g0yBtbJ1JLrcKrg/anuE5U20whYbMj htdcYib+gPv3VBNJxlvW/DQWyCsjpS4JH6gZlkwDu7nmuHIThIQ/3Jc9G45fCLYab3Gq 6WOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=nVtStkOJwYU3XqmwlL/uV4iDQRfNBzu4XIYOMUt3rFI=; b=EiMHjuzRaM9JY2YgYZSlm8hYC4P0l35JlBvvJAjqsLKqcAWBV8KgVm6XTIzNRqBRXQ RkEBo7nfTveXtgOcp+wwFlOfMhZvrnrxSs1YGXmi+SM2YUYq9ZxdejuJcA8GWVJyh6fz vXGJeb33Z9QfK0rJgfOlU5FiPTAJ1Y5Jbv0K52pzNqalDPCXLpuo680IzK9xPB+JouWe cxmUP1DoJsWIqc+pgvuiCzUm14JZ2R+x7gKMPzNMxgYua4OsFaOAw7PzuRqRJ2eSuOur V4tU/qvi0LgSyhMVYDtEC1KTtujRfmAtg0Rk9+3+qv3PltKiS19IerG75tplbSjw/5dW VjYw== X-Gm-Message-State: AGRZ1gIG1T0EIY14Z4szUaOgR11mIwTivy2qUSXJ7EfJIeL9CKo1VyBe 6MEZ74aukOxNBxDULO6+g3640heey93PTipH8j11LKex X-Google-Smtp-Source: AJdET5eLVxmixTbLkuajMu4cwlbFc0wEYRGl0V7vJjTMgAd5IH4U7I3w+63URu5+ZtS240uf/iMr20TCn7CQxQRdeko= X-Received: by 2002:a2e:e09:: with SMTP id 9-v6mr4738362ljo.159.1541078052183; Thu, 01 Nov 2018 06:14:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vidhya Kailash Date: Thu, 1 Nov 2018 09:14:00 -0400 Message-ID: Subject: Re: Solr cluster tuning To: solr-user@lucene.apache.org Content-Type: multipart/related; boundary="000000000000df652905799a31a6" --000000000000df652905799a31a6 Content-Type: multipart/alternative; boundary="000000000000df652405799a31a5" --000000000000df652405799a31a5 Content-Type: text/plain; charset="UTF-8" Thank you Erick and Daniel for your prompt responses. We were trying a few things (moving to G1GC, optimizing by throwing away some fields that need not be indexed & stored) and hence the late response. First of all, thought of giving a overview of the environment... We have a four node Solr Cloud cluster. We have 2 indexes which is spread across 4 shards and has 2 replicas. We have a total of 30GB on each of the nodes (all dedicated to running the Solr Cloud alone). Of which 15GB are allocated to the JVM and the rest for the OS to manage. All the indexes together take up just 1.4GB on the disk. Running version 7.4 with a dedicated Zookeeper cluster. Something of concern I see on the Solr Admin is the use of that memory. [image: image.png] this is what I see by running Top: [image: image.png] Is there a general calculation on how much to leave for OS caching for an index of 2GB? To answer Ericks question, no we are not indexing at the same time. In fact we have stopped indexing just to test the theory and dont see any improvements. I dont think I need to worry about autocommit then right? Daniel, we did try what you mentioned here (that is warm up the cache and then do a slow and a fast test) and we still see the slow test yielding slower results. Any thoughts anyone? Much appreciate your responses.... thanks Vidhya On Wed, Oct 24, 2018 at 6:40 PM Erick Erickson wrote: > To add to Daniel's comments: Are you indexing at the same time? Say > your autocommit time is 10 seconds. For the sake of argument let's say > it takes 15 queries to warm your searcher. Let's further say that the > average time for those 15 queries is 500ms each and once the searcher > is warmed the average time drops to 100ms. You'll have an average > close to 100ms. > > OTOH, if you only fire 15 queries over that 10 seconds, the average > would be 500ms. > > My guess is your autowarm counts for filterCache and queryResult cache > are the default 0 and if you set them to, say, 20 each much of your > problem would disappear. Ditto if you stopped indexing. Both point to > the searchers having to pull data into memory from disk and/or rebuild > caches. > > Best, > Erick > On Wed, Oct 24, 2018 at 1:37 PM Davis, Daniel (NIH/NLM) [C] > wrote: > > > > Usually, responses are due to I/O waits getting the data off of the > disk. So, to me, this seems more likely because as you bombard the server > with queries, you cause more and more of the data needed to answer the > query into memory. > > > > To verify this, I'd bombard your server with queries to warm it up, and > then repeat your test with the queries coming in slowly or quickly. > > > > If it still holds up, then there is something other than Solr going on > with that server, and taking memory from Solr or your index is somewhat too > big for your server. Linux likes to overcommit memory - try setting vm > swappiness to something low, like 10, rather than the default 60. Look > for anything on the server with Solr that may be competing with it for I/O > resources, and causing its pages to swap out. > > > > Also, look at the size of your index data. > > > > These are general advises in dealing with inverted indexes - some of the > Solr engineers on this list may have some very specific ideas, such as > merging activity or other background tasks running when the query load is > lighter. I wouldn't know how to check for these things, but would thing > they wouldn't affect query response time that badly. > > > > -----Original Message----- > > From: Vidhya Kailash > > Sent: Wednesday, October 24, 2018 4:22 PM > > To: solr-user@lucene.apache.org > > Subject: Solr cluster tuning > > > > We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch > data from collections. We recently deployed our code to production and > noticed that response time is more if the number of incoming requests are > less. > > > > But strangely, if we bombard the system with more and more requests we > get much better response time. > > > > My suspicion is client is closing the connections sooner in case of > slower requests and slower in case of faster requests. > > > > We tried tuning by passing custom HTTPClient to SolrJ and also by > updating HttpShardHandlerFactory settings. For example we made - > maxThreadIdleTime = 60000 socketTimeOut = 180000 > > > > Wondering what other tuning we can do to make this perform the same > irrespective of the number of requests. > > > > Thanks! > > > > Vidhya > -- Vidhya Kailash --000000000000df652405799a31a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you Erick and Daniel for your prompt responses. We w= ere trying a few things (moving to G1GC, optimizing by throwing away some f= ields that need not be indexed & stored) and hence the late response.= =C2=A0

First of all, thought of giving a overview of the= environment... We have a four node Solr Cloud cluster. We have 2 indexes w= hich is spread across 4 shards and has 2 replicas. We have a total of 30GB = on each of the nodes (all dedicated to running the Solr Cloud alone). Of wh= ich 15GB are allocated to the JVM and the rest for the OS to manage. All th= e indexes together take up just 1.4GB on the disk. Running version 7.4 with= a dedicated Zookeeper cluster.

Something of conce= rn I see on the Solr Admin is the use of that memory.=C2=A0
= 3D"image.png"
this is what I see by running Top:
<= br>

Is there a general calculation on how mu= ch to leave for OS caching for an index of 2GB?=C2=A0
To ans= wer Ericks question, no we are not indexing at the same time. In fact we ha= ve stopped indexing just to test the theory and dont see any improvements. = I dont think I need to worry about autocommit then right?=C2=A0
D= aniel, we did try what you mentioned here (that is warm up the cache and th= en do a slow and a fast test) and we still see the slow test yielding slowe= r results.=C2=A0


Any thoughts= anyone? Much appreciate your responses....


thanks
Vidhya


On Wed, Oct 24, 2018 at 6:40 PM Erick Eri= ckson <erickerickson@gmail.co= m> wrote:
To add to Daniel&#= 39;s comments: Are you indexing at the same time? Say
your autocommit time is 10 seconds. For the sake of argument let's say<= br> it takes 15 queries to warm your searcher. Let's further say that the average time for those 15 queries is 500ms each and once the searcher
is warmed the average time drops to 100ms. You'll have an average
close to 100ms.

OTOH, if you only fire 15 queries over that 10 seconds, the average
would be 500ms.

My guess is your autowarm counts for filterCache and queryResult cache
are the default 0 and if you set them to, say, 20 each much of your
problem would disappear.=C2=A0 Ditto if you stopped indexing. Both point to=
the searchers having to pull data into memory from disk and/or rebuild
caches.

Best,
Erick
On Wed, Oct 24, 2018 at 1:37 PM Davis, Daniel (NIH/NLM) [C]
<daniel.davis@= nih.gov> wrote:
>
> Usually, responses are due to I/O waits getting the data off of the di= sk.=C2=A0 =C2=A0So, to me, this seems more likely because as you bombard th= e server with queries, you cause more and more of the data needed to answer= the query into memory.
>
> To verify this, I'd bombard your server with queries to warm it up= , and then repeat your test with the queries coming in slowly or quickly. >
> If it still holds up, then there is something other than Solr going on= with that server, and taking memory from Solr or your index is somewhat to= o big for your server.=C2=A0 Linux likes to overcommit memory - try setting= vm swappiness to something low, like 10, rather than the default 60.=C2=A0= =C2=A0Look for anything on the server with Solr that may be competing with= it for I/O resources, and causing its pages to swap out.
>
> Also, look at the size of your index data.
>
> These are general advises in dealing with inverted indexes - some of t= he Solr engineers on this list may have some very specific ideas, such as m= erging activity or other background tasks running when the query load is li= ghter.=C2=A0 =C2=A0I wouldn't know how to check for these things, but w= ould thing they wouldn't affect query response time that badly.
>
> -----Original Message-----
> From: Vidhya Kailash <vidhya.kailash@gmail.com>
> Sent: Wednesday, October 24, 2018 4:22 PM
> To: s= olr-user@lucene.apache.org
> Subject: Solr cluster tuning
>
> We are currently using Solr Cloud Version 7.4 with SolrJ api to fetch = data from collections. We recently deployed our code to production and noti= ced that response time is more if the number of incoming requests are less.=
>
> But strangely, if we bombard the system with more and more requests we= get much better response time.
>
> My suspicion is client is closing the connections sooner in case of sl= ower requests and slower in case of faster requests.
>
> We tried tuning by passing custom HTTPClient to SolrJ and also by upda= ting HttpShardHandlerFactory settings. For example we made - maxThreadIdleT= ime =3D 60000 socketTimeOut =3D 180000
>
> Wondering what other tuning we can do to make this perform the same ir= respective of the number of requests.
>
> Thanks!
>
> Vidhya


--
Vidhya Kailas= h
--000000000000df652405799a31a5-- --000000000000df652905799a31a6--