lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Quarello <lucaquare...@gmail.com>
Subject Re: SOLR replicas performance
Date Fri, 08 Jan 2016 15:36:57 GMT
Hi Matteo,
the questions are two:

   - "Why are response times on a solr cloud collecton with 1 replica
   higher than on solr cloud without replica"

           Configuration1: solrCloud with two 8 cores VMs each with 8
shards of 17M docs
           Configuration2: solrClous with two 8 cores VMs each with 8
shards of 17M docs (8 master and 8 replicas)

I registered worst response time for replicas configuration (conf2) when:

   - Scenario1: I do queries without inserting record into the index
   - Scenario2: I do queries inserting record into the index

I expect similar response times in Scenario1 and better response times for
configuration2 in Scenario2.

Is it correct?

Thanks,
Luca

On Fri, Jan 8, 2016 at 3:56 PM, Luca Quarello <lucaquarello@gmail.com>
wrote:

> Hi Erick,
> I used solr5.3.1 and I sincerely expected response times with replica
> configuration near  to response times without replica configuration.
>
> Do you agree with me?
>
> I read here
> http://lucene.472066.n3.nabble.com/Solr-Cloud-Query-Scaling-td4110516.html that
> "Queries do not need to be routed to leaders; they can be handled by any
> replica in a shard. Leaders are only needed for handling update requests.
>  "
>
> I haven't found this behaviour. In my case CONF2 e CONF3 have all replicas
> on VM2 but analyzing core utilization during a request is 100% on both
> machines. Why?
>
> Best,
> Luca
>
> On Tue, Jan 5, 2016 at 5:08 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> What version of Solr? Prior to 5.2 the replicas were doing lots of
>> unnecessary work/being blocked, see:
>>
>> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/
>>
>> Best,
>> Erick
>>
>> On Tue, Jan 5, 2016 at 6:09 AM, Matteo Grolla <matteo.grolla@gmail.com>
>> wrote:
>> > Hi Luca,
>> >       not sure if I understood well. Your question is
>> > "Why are index times on a solr cloud collecton with 2 replicas higher
>> than
>> > on solr cloud with 1 replica" right?
>> > Well with 2 replicas all docs have to be deparately indexed in 2 places
>> and
>> > solr has to confirm that both indexing went well.
>> > Indexing times are lower on a solrcloud collection with 2 shards (just
>> one
>> > replica, the leader, per shard) because docs are indexed just once and
>> the
>> > load is spread on 2 servers instead of one
>> >
>> > 2015-12-30 2:03 GMT+01:00 Luca Quarello <lucaquarello@gmail.com>:
>> >
>> >> Hi,
>> >>
>> >> I have an 260M documents index (90GB) with this structure:
>> >>
>> >>
>> >> <field name="fragment" type="text_general" indexed="true" stored="true"
>> >> multiValued="false" termVectors="false" termPositions="false"
>> >> termOffsets="false" />
>> >>
>> >>   <field name="parentId" type="long" indexed="false" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="fragmentContentType" type="string" indexed="false"
>> >> stored="true" multiValued="false"/>
>> >>
>> >>   <field name="creationDate" type="date" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="creationTimestamp" type="date" indexed="true"
>> stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="visibility" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="category" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>   <field name="marked" type="string" indexed="true" stored="true"
>> >> multiValued="false"/>
>> >>
>> >>    <!-- catchall field, containing all other searchable text fields
>> >> (implemented
>> >>
>> >>    via copyField further on in this schema  -->
>> >>
>> >>   <field name="text" type="text_general" indexed="true" stored="false"
>> >> multiValued="true"/>
>> >>
>> >>   <copyField source="fragment" dest="text"/>
>> >>
>> >>   <copyField source="parentId" dest="text"/>
>> >>
>> >>   <copyField source="fragmentContentType" dest="text"/>
>> >>
>> >>   <copyField source="creationDate" dest="text"/>
>> >>
>> >>   <copyField source="visibility" dest="text"/>
>> >>
>> >>   <copyField source="category" dest="text"/>
>> >>
>> >>   <copyField source="marked" dest="text"/>
>> >>
>> >>
>> >> where the fragmetnt field contains XML messagges.
>> >>
>> >> There is a search function that provide the messagges satisfying a
>> search
>> >> criterion.
>> >>
>> >>
>> >> TARGET:
>> >>
>> >> To find the best configuration to optimize the response time of a two
>> solr
>> >> instances cloud with 2 VM with 8 core and 32 GB
>> >>
>> >>
>> >> TEST RESULTS:
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Configurations:
>> >>    1.
>> >>
>> >>       the better configuration without replicas
>> >>       - CONF1: 16 shards of 17M documents (8 per VM)
>> >>       1.
>> >>
>> >>       configuration with replica
>> >>       - CONF 2: 8 shards of 35M documents with replication factor of 1
>> >>          - CONF 3: 16 shards of 35M documents with replication factor
>> of 1
>> >>
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Executed tests
>> >>
>> >>
>> >>    - sequential requests
>> >>       - 5 parallel requests
>> >>       - 10 parallel requests
>> >>       - 20 parallel requests
>> >>
>> >> in two scenarios: during an indexing phase and not
>> >>
>> >>
>> >> Call are: http://localhost:8983/solr/sepa/select?
>> >> q=+fragment%3A*AAA*+&fq=marked%3AT&fq=-fragmentContentType
>> >> %3ABULK&start=0&rows=100&sort=creationTimestamp+desc%2Cid+asc
>> >>
>> >>
>> >>    1.
>> >>
>> >>    Test results
>> >>
>> >>            All the test have point out an I/O utilization of 100MB/s
>> during
>> >>
>> >> loading data on disk cache, disk cache utilization of 20GB and core
>> >> utilization of 100% (all 8 cores)
>> >>
>> >>
>> >>
>> >>    -
>> >>
>> >>    No indexing
>> >>    -
>> >>
>> >>       CONF1 (time average and maximum time)
>> >>       -
>> >>
>> >>          sequential: 4,1 6,9
>> >>          -
>> >>
>> >>          5 parallel: 15,6 19,1
>> >>          -
>> >>
>> >>          10 parallel: 23,6 30,2
>> >>          -
>> >>
>> >>          20 parallel: 48 52,2
>> >>          -
>> >>
>> >>       CONF2
>> >>       -
>> >>
>> >>          sequential: 12,3 17,4
>> >>          -
>> >>
>> >>          5 parallel: 32,5 34,2
>> >>          -
>> >>
>> >>          10 parallel: 45,4 49
>> >>          -
>> >>
>> >>          20 parallel: 64,6 74
>> >>          -
>> >>
>> >>       CONF3
>> >>       -
>> >>
>> >>          sequential: 6,9 9,9
>> >>          -
>> >>
>> >>          5 parallel: 33,2 37,5
>> >>          -
>> >>
>> >>          10 parallel: 46 51
>> >>          -
>> >>
>> >>          20 parallel: 68 83
>> >>
>> >>
>> >>
>> >>    -
>> >>
>> >>    Indexing (into the solr admin console is it possible to view the
>> >> total throughput?
>> >>    I find it only relative to a single shard).
>> >>
>> >>
>> >> CONF1
>> >>
>> >>    -
>> >>
>> >>       sequential: 7,7 9,5
>> >>       -
>> >>
>> >>       5 parallel: 26,8 28,4
>> >>       -
>> >>
>> >>       10 parallel: 31,8 37,8
>> >>       -
>> >>
>> >>       20 parallel: 42 52,5
>> >>       -
>> >>
>> >>    CONF2
>> >>    -
>> >>
>> >>       sequential: 12,3 19
>> >>       -
>> >>
>> >>       5 parallel: 39 40,8
>> >>       -
>> >>
>> >>       10 parallel: 56,6 62,9
>> >>       -
>> >>
>> >>       20 parallel: 79 116
>> >>       -
>> >>
>> >>    CONF3
>> >>    -
>> >>
>> >>       sequential: 10 18,9
>> >>       -
>> >>
>> >>       5 parallel: 36,5 41,9
>> >>       -
>> >>
>> >>       10 parallel: 63,7 64,1
>> >>       -
>> >>
>> >>       20 parallel: 85 120
>> >>
>> >>
>> >>
>> >> I have two question:
>> >>
>> >>    -
>> >>
>> >>    the response times of the configuration with replica are worse (in
>> test
>> >>    case of sequential requests worse of about three time) than the
>> response
>> >>    times of the configuration without replica. Is it an expected
>> result?
>> >>    - Why during  index inserting and updating replicas doesn’t help to
>> >>    reduce the response time?
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message