lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kent Mu <solr.st...@gmail.com>
Subject Re: solrcloud consumes more time than solr when write index
Date Thu, 14 Jul 2016 12:27:52 GMT
correct the URL.

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3CCAMCstK6rv0NWH3tqG0MBo%3D1kccDHTH4JQP-sNFvTuEzd2mUYFA%40mail.gmail.com%3E

2016-07-14 1:17 GMT+08:00 Jeff Wartes <jwartes@whitepages.com>:

> There’s another thread on this list going on right now touching on the
> need to optimize, might be worth reading.
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3C61f3d01f-c3ef-2d71-7112-6a88b01458f6@elyograg.org%3E
>
>
> On 7/12/16, 6:25 PM, "Kent Mu" <solr.study@gmail.com> wrote:
>
> >Dear Mr. Wartes,
> >Thanks for your reply. well, I see. for solr we do have replicas, and for
> >solrcloud, we have 5 shards and each shards with one leader and one
> >replica. and the data number is nearly 100 million, you mean we do not
> need
> >to optimize the index data?
> >
> >Thanks!
> >Kent
> >
> >2016-07-12 23:02 GMT+08:00 Jeff Wartes <jwartes@whitepages.com>:
> >
> >> Well, two thoughts:
> >>
> >>
> >> 1. If you’re not using solrcloud, presumably you don’t have any
> replicas.
> >> If you are, presumably you do. This makes for a biased comparison,
> because
> >> SolrCloud won’t acknowledge a write until it’s been safely written to
> all
> >> replicas. In short, solrcloud write time is max(per-replica write time).
> >> The more replicas you add, the bigger the chance some replica randomly
> >> takes longer (gc pause, perhaps?), and the longer your overall write
> time,
> >> assuming a fixed number of indexing threads.
> >> 2. The parallelism of the optimize operation across replicas has gone
> back
> >> and forth a bit, and I’m not sure what it was doing in 4.9. However, at
> one
> >> point the optimize happened per-replica, serially. So it’d do
> >> shard1_replica1, then when that was done, do shard1_replica2, then
> >> shard2_replica1, etc. Other versions of Solr would do those at the same
> >> time. Again, I don’t know if you’re comparing to a non-replicated solr
> >> index, but that could explain some of the difference.
> >>
> >> There’s a sort of an obligatory comment at this point that optimize
> >> doesn’t necessarily save you a lot. There are certainly cases where it
> >> does, but if you haven’t already, you’ll want to validate that you have
> one
> >> of them and that you’re not just doing unnecessary work.
> >>
> >>
> >> On 7/12/16, 7:41 AM, "Kent Mu" <solr.study@gmail.com> wrote:
> >>
> >> >hello, does anybody also come across the issue? can anybody help me?
> >> >
> >> >2016-07-11 23:17 GMT+08:00 Kent Mu <solr.study@gmail.com>:
> >> >
> >> >> Hi friends!
> >> >>
> >> >> solr version: 4.9.0.
> >> >>
> >> >> we use solr and solrcloud in our project, that means we use sorl and
> >> >> solrcloud at the same time.
> >> >> but we find a phenomenon that sorlcoud consumes more time than solr
> when
> >> >> write index. it takes nearly 5 or more times longer. I wonder that
is
> >> why?
> >> >>
> >> >> in our project, we have a scheduler job to add index, and then
> execute
> >> the
> >> >> the method of "optimize(false, true, 2)" to optimize the added index.
> >> >> I wonder if it is caused by solrcloud internal that when writing
> index,
> >> >> solrcloud needs to just which shard it should be stored? and when
> >> >> optimizing the replicate needs to take some time to synchronize the
> data
> >> >> from leader?
> >> >>
> >> >> and I wonder what about query?  will solrcloud also take more time
> than
> >> >> solr when query data?
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message