lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Realtime get not always returning existing data
Date Fri, 28 Sep 2018 17:21:15 GMT
On 9/28/2018 6:09 AM, sgaron cse wrote:
> because this is a test deployment replica is set to 1 so as far as I
> understand, data will not be replicated for this core. Basically we have
> two SOLR instances running on the same box. One on port 8983, the other on
> port 8984. We have 9 cores on this SOLR cloud deployment, 5 of which on the
> instance on port 8983 and the other 4 on port 8984.

A question that isn't really related to the problem you're investigating 
now:  Why are you running two Solr instances on the same machine?  9 
cores is definitely not too many for one Solr instance.

> As far as I can tell
> all cores suffer from the occasional null document. But the one that I can
> easily see error from is a config core where we store configuration data
> for our system. Since the configuration data should always be there we
> throw exceptions as soon as we get a null document which is why I noticed
> the problem.

When you say "null document" do you mean that you get no results, or 
that you get a result with a document, but that document has nothing in 
it?  Are there any errors returned or logged by Solr when this happens?

> Our client code that connects to the APIs randomly chooses between all the
> different ports because it does not know which instance it should ask. So
> no, we did not try sending directly to the instance that has the data but
> since there is no replica there is no way that this should get out of sync.

I was suggesting this as a troubleshooting step, not a change to how you 
use Solr.  Basically trying to determine what happens if you send a 
request directly to the instance and core that contains the document 
with distrib=false, to see if it behaves differently than when it's a 
more generic collection-directed query.  The idea was to try and narrow 
down exactly where to look for a problem.

If you wait a few seconds, does the problem go away?  When using real 
time get, a new document must be written to a segment and a new realtime 
searcher must be created before you can get that document.  These things 
typically happen very quickly, but it's not instantaneous.

> To add up to what Chris was saying, although the core that is seeing the
> issue is not hit very hard, other core in the setup will be. We are
> building a clustering environment that has auto-scaling so if we are under
> heavy load, we can easily have 200-300 client hitting the SOLR instance
> simultaneously.

That much traffic is going to need multiple replicas on separate 
hardware, with something in place to do load balancing. Unless your code 
is Java and you can use CloudSolrClient, I would recommend an external 
load balancer.


View raw message