lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sgaron cse <sgaron....@gmail.com>
Subject Re: Realtime get not always returning existing data
Date Thu, 27 Sep 2018 17:48:27 GMT
So this is a SOLR core where we keep configuration data so it is almost
never written to. The statistics for the core say its been last modified 4
hours ago, yet I got doc:null from the API an hour ago. And also you don't
have to have a lot of data into the core. For example, this core has only
11 documents in it. The document I'm trying to fetch is about 45KB if that
matters.

Other things to note, this SOLR cloud instance is running multiple cores (9
cores total) and some of them are getting completely hammered. But I
figured that each core is it's own thing, I may be wrong.

BTW, I'm not 100% familiar with SOLR cloud but I see in the Replication
section that the Master (saerching) and the Master (Replicable) are running
different version / different gen. Not sure if that matters, not sure what
that means.

Thanks for your help,
Steve

On Thu, Sep 27, 2018 at 1:30 PM Erick Erickson <erickerickson@gmail.com>
wrote:

> Steve:
>
> Thanks. So theoretically I should be able to set up a cluster, index a
> bunch of docs to it and then just hammer RTG calls against those IDs
> and sometime see a failure?
>
> Hmmm, I guess a follow-up question is whether there's any indexing
> gong on at all when this happens. Or, more specifically, if there's
> any time when you see this problem when there's _no_ indexing going
> on.
>
> I understand that it's not recently-indexed docs that are not being
> found, but if there's indexing going on searchers are being opened,
> caches flushed and the like so if this happens even when there's no
> indexing going on it'd help reproduce/track
>
> Erick
> On Thu, Sep 27, 2018 at 10:11 AM sgaron cse <sgaron.cse@gmail.com> wrote:
> >
> > Hey Erick,
> >
> > We're using SOLR 7.3.1, which is not the latest but still not too far
> back.
> >
> > No the document has not been recently indexed, in fact, I can use the
> > /search API endpoint to find the document. But I need a fast way to find
> > document that have not necessarily been indexed yet so /search is out of
> > the question. Also to put you in context, last time the doc was modified
> > was 3 days ago but we are still seing the occasional doc:null return from
> > the Realtime Get API.
> >
> > Steve
> >
> > On Thu, Sep 27, 2018 at 12:52 PM Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > What version of Solr are you running? Mostly that's for curiosity.
> > >
> > > Is the doc that's not returned something you've recently indexed?
> > > Here's a possible scenario:
> > > You send the doc out to be indexed. The primary forwards the doc to
> > > the followers. Before the follower has a chance to process (but not
> > > commit), you issue a RTG against that doc and it happens to be routed
> > > to a node that hasn't received it from the leader yet. Does this sound
> > > plausible in your scenario?
> > >
> > > Hmmm, I suppose it's not even a requirement that the request gets sent
> > > to a follower, it could easily be "in process" on the leader/primary.
> > >
> > > Best,
> > > Erick
> > > On Wed, Sep 26, 2018 at 11:55 AM sgaron cse <sgaron.cse@gmail.com>
> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > We're trying to use SOLR for our document store and are facing some
> > > issues
> > > > with the Realtime Get api. Basically, we're doing an api call from
> > > multiple
> > > > endpoint to retrieve configuration data. The document that we are
> > > > retrieving does not change at all but sometimes the API returns a
> null
> > > > document ({doc:null}). I'd say 99.99% of the time we can retrieve the
> > > > document fine but once in a blue moon we get the null document. The
> > > problem
> > > > is that for us, if SOLR returns null, that means that the document
> does
> > > not
> > > > exist but because this is a document that should be there it causes
> all
> > > > sort of problems in our system.
> > > >
> > > > The API I call is the following:
> > > > http://{server_ip}/solr/config/get?id={id}&wt=json&fl=_source_
> > > >
> > > > As far as I understand reading the documentation, the Realtime Get
> API
> > > > should get me the document no matter what. Even if the document is
> not
> > > yet
> > > > committed to the index.
> > > >
> > > > I see no errors whatsoever in the SOLR logs that could help me with
> this
> > > > problem. in fact there are no error at all.
> > > >
> > > > As for our setup, because we're still in testing phase, we only have
> two
> > > > SOLR instances running on the same box in cloud mode with
> replication=1
> > > > which means that the core that we run the Realtime Get on is only
> present
> > > > in one of the two instances. Our script randomly chooses which
> instances
> > > it
> > > > does the query on but as far as I understand, in cloud mode the API
> call
> > > > should be dispatched automatically to the right instance.
> > > >
> > > > Am I missing anything here? Is it possible that there is a race
> condition
> > > > in the Realtime Get API that could return null data even if the
> document
> > > > exist?
> > > >
> > > > Thanks,
> > > > Steve
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message