lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Novin Novin <toe.al...@gmail.com>
Subject Re: Always use leader for searching queries
Date Wed, 03 Jan 2018 09:51:18 GMT
Hi Erick,

Thanks for your reply.

[ First of all, replicas can be off in terms of counts for the soft
commit interval. The commits don't all happen on the replicas at the
same wall-clock time. Solr promises eventual consistency, in this case
NOW-autocommit time.]

I realized that, to stop it. I have actually turned off auto soft commit
for a time being but nothing changed. Non leader replica still had extra
documents.

[ So my first question is whether the replicas in the shard are
inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
factor of 10 seconds earlier just to be sure I was past autowarming.
This does require that there be a time stamp. Absent a timestamp, you
could suspend indexing for a few minutes and run the test like below.]

When data was indexing at that time I was checking how the counts are in
both replica. What I found leader replica has 3 doc less than other replica
always. I don't think so they were of by NOW-soft_commit_time, CloudSolrClient
add some thing like this "_stateVer_=main:114" in query which I assume is
for results to be consistent between both replica search.

[Adding &distrib=false to your command and directing it at a specific
_core_ (something like collection1_shard1_replica1) will only return
data from that core.]
I probably not need to do this because I have only one shard but I did
anyway count was different.

[When you say you index every minute, I'm guessing you only index for
part of that minute, is that true? In that case you might get more
consistency if, instead of relying totally on your autoconfig
settings, specify commitWithin on your update command. That should
force the commits to happen more closely in-sync, although still not
perfect.]

We receive data every minute, so whenever we have new data we send it to
Solr cloud using queue. You said don't rely on auto config. Do you mean I
should turn off autocommit and use commitWithin using solrj or leave
autoCommit as it is and also use commitWithin from solrj client.

I apologize If I am not clear, thanks for your help again.

Thanks in advance,
Navin





On Tue, 2 Jan 2018 at 18:05 Erick Erickson <erickerickson@gmail.com> wrote:

> First of all, replicas can be off in terms of counts for the soft
> commit interval. The commits don't all happen on the replicas at the
> same wall-clock time. Solr promises eventual consistency, in this case
> NOW-autocommit time.
>
> So my first question is whether the replicas in the shard are
> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
> factor of 10 seconds earlier just to be sure I was past autowarming.
> This does require that there be a time stamp. Absent a timestamp, you
> could suspend indexing for a few minutes and run the test like below.
>
> Adding &distrib=false to your command and directing it at a specific
> _core_ (something like collection1_shard1_replica1) will only return
> data from that core.
>
> When you say you index every minute, I'm guessing you only index for
> part of that minute, is that true? In that case you might get more
> consistency if, instead of relying totally on your autoconfig
> settings, specify commitWithin on your update command. That should
> force the commits to happen more closely in-sync, although still not
> perfect.
>
> Another option if you're totally and completely sure that your commits
> happen _only_ from your indexing program is to fire the commit at the
> end of the run from your SolrJ program.
>
> Let us know,
> Erick
>
> On Tue, Jan 2, 2018 at 9:33 AM, Novin Novin <toe.alean@gmail.com> wrote:
> > Hi Erick,
> >
> > You are right, it is XY Problem.
> >
> > Allow me to explain best I can, I have two replica of one collection
> called
> > "Main". When I was using search feature in my application I get two
> > different numFound count. So I start digging after spending 2 3 hours I
> > found the one replica has numFound count higher than other (higher count
> > was not leader). I am not sure how It got end up like that. This count
> > difference affects paging on my application side not solr side.
> >
> > Extra info might be useful to know
> > Same query not a single letter difference.
> > auto soft commit 20000
> > soft commit 60000
> > indexing data every minute.
> >
> > Let me know if you need to know anything else. Any help would highly
> > appreciated.
> >
> > Thanks in advance,
> > Navin
> >
> >
> >
> > On Tue, 2 Jan 2018 at 15:14 Erick Erickson <erickerickson@gmail.com>
> wrote:
> >
> >> This seems like an XY problem. You're asking how to do X
> >> because you think it will solve problem Y without telling
> >> us what Y is.
> >>
> >> I say this because on the surface this seems to defeat the
> >> purpose behind SolrCloud. Why would you want to only make
> >> use of one piece of hardware? That will limit your throughput,
> >> so why bother to have replicas in the first place?
> >>
> >> Or is this some kind of diagnostic you're trying to implement?
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jan 2, 2018 at 5:08 AM, Novin Novin <toe.alean@gmail.com>
> wrote:
> >> > Hi guys,
> >> >
> >> > I am using solr 5.5.4 and same version for solrj. My question is there
> >> any
> >> > way I can tell cloud solr client to use only leader for queries.
> >> >
> >> > Thanks in advance.
> >> > Navin
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message