lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Always use leader for searching queries
Date Wed, 03 Jan 2018 17:09:21 GMT
If you have a field for the indexed datetime, you can use a filter query to get rid of recent
updates that might be in transit. I’d use double the autocommit time, to leave time for
the followers to index.

If the autocommit interval is one minute:

fq=indexed_datetime:[* TO NOW-2MIN]

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 3, 2018, at 8:58 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> [I probably not need to do this because I have only one shard but I did
> anyway count was different.]
> 
> This isn't what I meant. I meant to query each replica directly
> _within_ the same shard. Your problem statement is that the leader and
> replicas (I use "followers") have different document counts. How are
> you verifying this? Through the admin UI? Using &distrib=false is
> useful when you want to query each core directly (and you have to use
> the core name) in some automated fashion.
> 
> [I have actually turned off auto soft commit for a time being but
> nothing changed]
> 
> OK, I'm assuming then that you issue a manual commit sometime, right?
> Here's what I'd do:
> 1> turn off indexing
> 2> issue a commit (soft or hard-with-opensearcher-true)
> 3> now look at your doc counts on each replica.
> 
> If the counts are different then something's not right, Solr tries
> very hard to not lose data, it's concerning if the leader and replicas
> have different counts.
> 
> Best,
> Erick
> 
> On Wed, Jan 3, 2018 at 1:51 AM, Novin Novin <toe.alean@gmail.com> wrote:
>> Hi Erick,
>> 
>> Thanks for your reply.
>> 
>> [ First of all, replicas can be off in terms of counts for the soft
>> commit interval. The commits don't all happen on the replicas at the
>> same wall-clock time. Solr promises eventual consistency, in this case
>> NOW-autocommit time.]
>> 
>> I realized that, to stop it. I have actually turned off auto soft commit
>> for a time being but nothing changed. Non leader replica still had extra
>> documents.
>> 
>> [ So my first question is whether the replicas in the shard are
>> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
>> factor of 10 seconds earlier just to be sure I was past autowarming.
>> This does require that there be a time stamp. Absent a timestamp, you
>> could suspend indexing for a few minutes and run the test like below.]
>> 
>> When data was indexing at that time I was checking how the counts are in
>> both replica. What I found leader replica has 3 doc less than other replica
>> always. I don't think so they were of by NOW-soft_commit_time, CloudSolrClient
>> add some thing like this "_stateVer_=main:114" in query which I assume is
>> for results to be consistent between both replica search.
>> 
>> [Adding &distrib=false to your command and directing it at a specific
>> _core_ (something like collection1_shard1_replica1) will only return
>> data from that core.]
>> I probably not need to do this because I have only one shard but I did
>> anyway count was different.
>> 
>> [When you say you index every minute, I'm guessing you only index for
>> part of that minute, is that true? In that case you might get more
>> consistency if, instead of relying totally on your autoconfig
>> settings, specify commitWithin on your update command. That should
>> force the commits to happen more closely in-sync, although still not
>> perfect.]
>> 
>> We receive data every minute, so whenever we have new data we send it to
>> Solr cloud using queue. You said don't rely on auto config. Do you mean I
>> should turn off autocommit and use commitWithin using solrj or leave
>> autoCommit as it is and also use commitWithin from solrj client.
>> 
>> I apologize If I am not clear, thanks for your help again.
>> 
>> Thanks in advance,
>> Navin
>> 
>> 
>> 
>> 
>> 
>> On Tue, 2 Jan 2018 at 18:05 Erick Erickson <erickerickson@gmail.com> wrote:
>> 
>>> First of all, replicas can be off in terms of counts for the soft
>>> commit interval. The commits don't all happen on the replicas at the
>>> same wall-clock time. Solr promises eventual consistency, in this case
>>> NOW-autocommit time.
>>> 
>>> So my first question is whether the replicas in the shard are
>>> inconsistent as of, say, NOW-your_soft_commit_time. I'd add a fudge
>>> factor of 10 seconds earlier just to be sure I was past autowarming.
>>> This does require that there be a time stamp. Absent a timestamp, you
>>> could suspend indexing for a few minutes and run the test like below.
>>> 
>>> Adding &distrib=false to your command and directing it at a specific
>>> _core_ (something like collection1_shard1_replica1) will only return
>>> data from that core.
>>> 
>>> When you say you index every minute, I'm guessing you only index for
>>> part of that minute, is that true? In that case you might get more
>>> consistency if, instead of relying totally on your autoconfig
>>> settings, specify commitWithin on your update command. That should
>>> force the commits to happen more closely in-sync, although still not
>>> perfect.
>>> 
>>> Another option if you're totally and completely sure that your commits
>>> happen _only_ from your indexing program is to fire the commit at the
>>> end of the run from your SolrJ program.
>>> 
>>> Let us know,
>>> Erick
>>> 
>>> On Tue, Jan 2, 2018 at 9:33 AM, Novin Novin <toe.alean@gmail.com> wrote:
>>>> Hi Erick,
>>>> 
>>>> You are right, it is XY Problem.
>>>> 
>>>> Allow me to explain best I can, I have two replica of one collection
>>> called
>>>> "Main". When I was using search feature in my application I get two
>>>> different numFound count. So I start digging after spending 2 3 hours I
>>>> found the one replica has numFound count higher than other (higher count
>>>> was not leader). I am not sure how It got end up like that. This count
>>>> difference affects paging on my application side not solr side.
>>>> 
>>>> Extra info might be useful to know
>>>> Same query not a single letter difference.
>>>> auto soft commit 20000
>>>> soft commit 60000
>>>> indexing data every minute.
>>>> 
>>>> Let me know if you need to know anything else. Any help would highly
>>>> appreciated.
>>>> 
>>>> Thanks in advance,
>>>> Navin
>>>> 
>>>> 
>>>> 
>>>> On Tue, 2 Jan 2018 at 15:14 Erick Erickson <erickerickson@gmail.com>
>>> wrote:
>>>> 
>>>>> This seems like an XY problem. You're asking how to do X
>>>>> because you think it will solve problem Y without telling
>>>>> us what Y is.
>>>>> 
>>>>> I say this because on the surface this seems to defeat the
>>>>> purpose behind SolrCloud. Why would you want to only make
>>>>> use of one piece of hardware? That will limit your throughput,
>>>>> so why bother to have replicas in the first place?
>>>>> 
>>>>> Or is this some kind of diagnostic you're trying to implement?
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Tue, Jan 2, 2018 at 5:08 AM, Novin Novin <toe.alean@gmail.com>
>>> wrote:
>>>>>> Hi guys,
>>>>>> 
>>>>>> I am using solr 5.5.4 and same version for solrj. My question is
there
>>>>> any
>>>>>> way I can tell cloud solr client to use only leader for queries.
>>>>>> 
>>>>>> Thanks in advance.
>>>>>> Navin
>>>>> 
>>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message