lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: SolrCloud scaling/optimization for high request rate
Date Mon, 29 Oct 2018 07:51:18 GMT
What does your cache statistics look like? What's the hit ratio, size,
evictions etc?

More comments inline:

On Sat, Oct 27, 2018 at 8:23 AM Erick Erickson <erickerickson@gmail.com>
wrote:

> Sofiya:
>
> I haven't said so before, but it's a great pleasure to work with
> someone who's done a lot of homework before pinging the list. The only
> unfortunate bit is that it usually means the simple "Oh, I can fix
> that without thinking about it much" doesn't work ;)
>
> 2.  I'll clarify a bit here. Any TLOG replica can become the leader.
> Here's the process for an update:
> > doc comes in to the leader (may be TLOG)
> > doc is forwarded to all TLOG replicas, _but it is not indexed there_.
> > If the leader fails, the other TLOG replicas have enough documents in
> _their_ tlogs to "catch up" and one is elected
> > You're totally right that PULL replicas cannot become leaders
> > having all TLOG replicas means that the CPU cycles otherwise consumed by
> indexing are available for query processing.
>
> The point here is that TLOG replicas don't need to expend CPU cycles
> to index documents, freeing up all those cycles for serving queries.
>
> Now, that said you report that QPS rate doesn't particularly seem to
> be affected by whether you're indexing or not, so that makes using
> TLOG and PULL replicas less likely to solve your problem. I was
> thinking about your statement that you index as fast as possible....
>
>
> 6. This is a little surprising. Here's my guess: You're  indexing in
> large batches and the batch is only really occupying a thread or two
> so it's effectively serialized thus not consuming a huge amount of
> resources.
>

The CloudSolrClient parallelizes updates to each shard leader. But in this
case, there is only 1 shard so all updates are serialized. All indexing
activity is therefore being performed by a single CPU at a time.


>
> So unless G1 really solves a lot of problems, more replicas are
> indicated. On machines with large amounts of RAM and lots of CPUs, one
> other option is to run multiple JVMs per physical node that's
> sometimes helpful.
>
> One other possibility. In Solr 7.5, you have a ton of metrics
> available. If you hit the admin/metrics end point you'll see 150-200
> available metrics. Apart from running  a profiler to see what's
> consuming the most cycles, the metrics can give you a view into what
> Solr is doing and may help you pinpoint what's using the most cycles.
>
> Best,
> Erick
> On Fri, Oct 26, 2018 at 12:23 PM Toke Eskildsen <toes@kb.dk> wrote:
> >
> > David Hastings <hastings.recursive@gmail.com> wrote:
> > > Would adding the docValues in the schema, but not reindexing, cause
> > > errors?  IE, only apply the doc values after the next reindex, but in
> the
> > > meantime keep functioning as there were none until then?
> >
> > As soon as you specify in the schema that a field has docValues=true,
> Solr treats all existing documents as having docValues enabled for that
> field. As there is no docValue content, DocValues-aware functionality such
> as sorting and faceting will not work for that field, until the documents
> has been re-indexed.
> >
> > - Toke Eskildsen
>


-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message