Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAN4YXveoHV=1Ow5G7JtCb0-uXT9kWxev-aNZGPPeqqJXxrheFA@mail.gmail.com>
References: 
 <CAGDrbH1Yhjz+4mEK9iGYTqT18fHhmTV9CuMk-JT+mecZtBm8sA@mail.gmail.com>
	<CAEW0--iUQX4WeuNNeR4jGdfnSG2kExJqaZzVkuJJ0z8aPx+2Hg@mail.gmail.com>
	<CAN4YXveoHV=1Ow5G7JtCb0-uXT9kWxev-aNZGPPeqqJXxrheFA@mail.gmail.com>
Date: Fri, 8 Jan 2016 15:56:59 +0100
Message-ID: 
 <CAGDrbH0tzUO89md+73sCnPm1BzAA4SprBY8V=Vtnxb4=6=rM_g@mail.gmail.com>
Subject: Re: SOLR replicas performance
From: Luca Quarello <lucaquarello@gmail.com>
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=001a11415f8ca2039a0528d3ccd1

--001a11415f8ca2039a0528d3ccd1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Erick,
I used solr5.3.1 and I sincerely expected response times with replica
configuration near  to response times without replica configuration.

Do you agree with me?

I read here
http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html =
that
"Queries do not need to be routed to leaders; they can be handled by any
replica in a shard. Leaders are only needed for handling update requests. "

I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
on VM2 but analyzing core utilization during a request is 100% on both
machines. Why?

Best,
Luca

On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> What version of Solr? Prior to 5.2 the replicas were doing lots of
> unnecessary work/being blocked, see:
>
> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-=
twice-fast/
>
> Best,
> Erick
>
> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <matteo.grolla@gmail.com>
> wrote:
> > Hi Luca,
> >       not sure if I understood well. Your question is
> > "Why are index times on a solr cloud collecton with 2 replicas higher
> than
> > on solr cloud with 1 replica" right?
> > Well with 2 replicas all docs have to be deparately indexed in 2 places
> and
> > solr has to confirm that both indexing went well.
> > Indexing times are lower on a solrcloud collection with 2 shards (just
> one
> > replica, the leader, per shard) because docs are indexed just once and
> the
> > load is spread on 2 servers instead of one
> >
> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lucaquarello@gmail.com>:
> >
> >> Hi,
> >>
> >> I have an 260M documents index (90GB) with this structure:
> >>
> >>
> >> <field name=3D"fragment" type=3D"text_general" indexed=3D"true" stored=
=3D"true"
> >> multiValued=3D"false" termVectors=3D"false" termPositions=3D"false"
> >> termOffsets=3D"false" />
> >>
> >>   <field name=3D"parentId" type=3D"long" indexed=3D"false" stored=3D"t=
rue"
> >> multiValued=3D"false"/>
> >>
> >>   <field name=3D"fragmentContentType" type=3D"string" indexed=3D"false=
"
> >> stored=3D"true" multiValued=3D"false"/>
> >>
> >>   <field name=3D"creationDate" type=3D"date" indexed=3D"true" stored=
=3D"true"
> >> multiValued=3D"false"/>
> >>
> >>   <field name=3D"creationTimestamp" type=3D"date" indexed=3D"true"
> stored=3D"true"
> >> multiValued=3D"false"/>
> >>
> >>   <field name=3D"visibility" type=3D"string" indexed=3D"true" stored=
=3D"true"
> >> multiValued=3D"false"/>
> >>
> >>   <field name=3D"category" type=3D"string" indexed=3D"true" stored=3D"=
true"
> >> multiValued=3D"false"/>
> >>
> >>   <field name=3D"marked" type=3D"string" indexed=3D"true" stored=3D"tr=
ue"
> >> multiValued=3D"false"/>
> >>
> >>    <!-- catchall field, containing all other searchable text fields
> >> (implemented
> >>
> >>    via copyField further on in this schema  -->
> >>
> >>   <field name=3D"text" type=3D"text_general" indexed=3D"true" stored=
=3D"false"
> >> multiValued=3D"true"/>
> >>
> >>   <copyField source=3D"fragment" dest=3D"text"/>
> >>
> >>   <copyField source=3D"parentId" dest=3D"text"/>
> >>
> >>   <copyField source=3D"fragmentContentType" dest=3D"text"/>
> >>
> >>   <copyField source=3D"creationDate" dest=3D"text"/>
> >>
> >>   <copyField source=3D"visibility" dest=3D"text"/>
> >>
> >>   <copyField source=3D"category" dest=3D"text"/>
> >>
> >>   <copyField source=3D"marked" dest=3D"text"/>
> >>
> >>
> >> where the fragmetnt field contains XML messagges.
> >>
> >> There is a search function that provide the messagges satisfying a
> search
> >> criterion.
> >>
> >>
> >> TARGET:
> >>
> >> To find the best configuration to optimize the response time of a two
> solr
> >> instances cloud with 2 VM with 8 core and 32 GB
> >>
> >>
> >> TEST RESULTS:
> >>
> >>
> >>    1.
> >>
> >>    Configurations:
> >>    1.
> >>
> >>       the better configuration without replicas
> >>       - CONF1: 16 shards of 17M documents (8 per VM)
> >>       1.
> >>
> >>       configuration with replica
> >>       - CONF 2: 8 shards of 35M documents with replication factor of 1
> >>          - CONF 3: 16 shards of 35M documents with replication factor
> of 1
> >>
> >>
> >>
> >>    1.
> >>
> >>    Executed tests
> >>
> >>
> >>    - sequential requests
> >>       - 5 parallel requests
> >>       - 10 parallel requests
> >>       - 20 parallel requests
> >>
> >> in two scenarios: during an indexing phase and not
> >>
> >>
> >> Call are: http://localhost:8983/solr/sepa/select?
> >> q=3D+fragment%3A*AAA*+&fq=3Dmarked%3AT&fq=3D-fragmentContentType
> >> %3ABULK&start=3D0&rows=3D100&sort=3DcreationTimestamp+desc%2Cid+asc
> >>
> >>
> >>    1.
> >>
> >>    Test results
> >>
> >>            All the test have point out an I/O utilization of 100MB/s
> during
> >>
> >> loading data on disk cache, disk cache utilization of 20GB and core
> >> utilization of 100% (all 8 cores)
> >>
> >>
> >>
> >>    -
> >>
> >>    No indexing
> >>    -
> >>
> >>       CONF1 (time average and maximum time)
> >>       -
> >>
> >>          sequential: 4,1 6,9
> >>          -
> >>
> >>          5 parallel: 15,6 19,1
> >>          -
> >>
> >>          10 parallel: 23,6 30,2
> >>          -
> >>
> >>          20 parallel: 48 52,2
> >>          -
> >>
> >>       CONF2
> >>       -
> >>
> >>          sequential: 12,3 17,4
> >>          -
> >>
> >>          5 parallel: 32,5 34,2
> >>          -
> >>
> >>          10 parallel: 45,4 49
> >>          -
> >>
> >>          20 parallel: 64,6 74
> >>          -
> >>
> >>       CONF3
> >>       -
> >>
> >>          sequential: 6,9 9,9
> >>          -
> >>
> >>          5 parallel: 33,2 37,5
> >>          -
> >>
> >>          10 parallel: 46 51
> >>          -
> >>
> >>          20 parallel: 68 83
> >>
> >>
> >>
> >>    -
> >>
> >>    Indexing (into the solr admin console is it possible to view the
> >> total throughput?
> >>    I find it only relative to a single shard).
> >>
> >>
> >> CONF1
> >>
> >>    -
> >>
> >>       sequential: 7,7 9,5
> >>       -
> >>
> >>       5 parallel: 26,8 28,4
> >>       -
> >>
> >>       10 parallel: 31,8 37,8
> >>       -
> >>
> >>       20 parallel: 42 52,5
> >>       -
> >>
> >>    CONF2
> >>    -
> >>
> >>       sequential: 12,3 19
> >>       -
> >>
> >>       5 parallel: 39 40,8
> >>       -
> >>
> >>       10 parallel: 56,6 62,9
> >>       -
> >>
> >>       20 parallel: 79 116
> >>       -
> >>
> >>    CONF3
> >>    -
> >>
> >>       sequential: 10 18,9
> >>       -
> >>
> >>       5 parallel: 36,5 41,9
> >>       -
> >>
> >>       10 parallel: 63,7 64,1
> >>       -
> >>
> >>       20 parallel: 85 120
> >>
> >>
> >>
> >> I have two question:
> >>
> >>    -
> >>
> >>    the response times of the configuration with replica are worse (in
> test
> >>    case of sequential requests worse of about three time) than the
> response
> >>    times of the configuration without replica. Is it an expected resul=
t?
> >>    - Why during  index inserting and updating replicas doesn=E2=80=99t=
 help to
> >>    reduce the response time?
> >>
>

--001a11415f8ca2039a0528d3ccd1--