lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Sat, 09 Dec 2017 12:46:10 GMT

for the HDFS case wouldn't it be nice if there was a mode in which the 
replicas just read the same index files as the leader? I mean after all 
the data is already on a shared readable file system so why would one 
even need to replicate the transaction log files?


On 08.12.2017 21:07, Erick Erickson wrote:
> bq: Will TLOG replicas use less network bandwidth?
> No, probably more bandwidth. TLOG replicas work like this:
> 1> the raw docs are forwarded
> 2> the old-style master/slave replication is used
> So what you do save is CPU processing on the TLOG replica in exchange
> for increased bandwidth.
> Since the only thing forwarded in NRT replicas (outside of recovery)
> is the raw documents, I expect that TLOG replicas would _increase_
> network usage. The deal is that TLOG replicas can take over leadership
> if the leader goes down so they must have an
> up-to-date-after-last-index-sync set of tlogs.
> At least that's my current understanding...
> Best,
> Erick
> On Fri, Dec 8, 2017 at 12:01 PM, Joe Obernberger
> <> wrote:
>> Anyone have any thoughts on this?  Will TLOG replicas use less network
>> bandwidth?
>> -Joe
>> On 12/4/2017 12:54 PM, Joe Obernberger wrote:
>>> Hi All - this same problem happened again, and I think I partially
>>> understand what is going on.  The part I don't know is what caused any of
>>> the replicas to go into full recovery in the first place, but once they do,
>>> they cause network interfaces on servers to go fully utilized in both in/out
>>> directions.  It appears that when a solr replica needs to recover, it calls
>>> on the leader for all the data.  In HDFS, the data from the leader's point
>>> of view goes:
>>> HDFS --> Solr Leader Process -->Network--> Replica Solr Process -->HDFS
>>> Do I have this correct?  That poor network in the middle becomes a
>>> bottleneck and causes other replicas to go into recovery, which causes more
>>> network traffic.  Perhaps going to TLOG replicas with 7.1 would be better
>>> with HDFS?  Would it be possible for the leader to send a message to the
>>> replica to instead get the data straight from HDFS instead of going from one
>>> solr process to another?  HDFS would better be able to use the cluster since
>>> each block has 3x replicas.  Perhaps there is a better way to handle
>>> replicas with a shared file system.
>>> Our current plan to fix the issue is to go to Solr 7.1.0 and use TLOG.
>>> Good idea?  Thank you!
>>> -Joe
>>> On 11/22/2017 8:17 PM, Erick Erickson wrote:
>>>> Hmm. This is quite possible. Any time things take "too long" it can be
>>>>    a problem. For instance, if the leader sends docs to a replica and
>>>> the request times out, the leader throws the follower into "Leader
>>>> Initiated Recovery". The smoking gun here is that there are no errors
>>>> on the follower, just the notification that the leader put it into
>>>> recovery.
>>>> There are other variations on the theme, it all boils down to when
>>>> communications fall apart replicas go into recovery.....
>>>> Best,
>>>> Erick
>>>> On Wed, Nov 22, 2017 at 11:02 AM, Joe Obernberger
>>>> <> wrote:
>>>>> Hi Shawn - thank you for your reply. The index is 29.9TBytes as reported
>>>>> by:
>>>>> hadoop fs -du -s -h /solr6.6.0
>>>>> 29.9 T  89.9 T  /solr6.6.0
>>>>> The 89.9TBytes is due to HDFS having 3x replication.  There are about
>>>>> 1.1
>>>>> billion documents indexed and we index about 2.5 million documents per
>>>>> day.
>>>>> Assuming an even distribution, each node is handling about 680GBytes
>>>>> index.  So our cache size is 1.4%. Perhaps 'relatively small block
>>>>> cache'
>>>>> was an understatement! This is why we split the largest collection into
>>>>> two,
>>>>> where one is data going back 30 days, and the other is all the data.
>>>>> Most
>>>>> of our searches are not longer than 30 days back.  The 30 day index is
>>>>> 2.6TBytes total.  I don't know how the HDFS block cache splits between
>>>>> collections, but the 30 day index performs acceptable for our specific
>>>>> application.
>>>>> If we wanted to cache 50% of the index, each of our 45 nodes would need
>>>>> a
>>>>> block cache of about 350GBytes.  I'm accepting offers of DIMMs!
>>>>> What I believe caused our 'recovery, fail, retry loop' was one of our
>>>>> servers died.  This caused HDFS to start to replicate blocks across the
>>>>> cluster and produced a lot of network activity.  When this happened,
>>>>> believe there was high network contention for specific nodes in the
>>>>> cluster
>>>>> and their network interfaces became pegged and requests for HDFS blocks
>>>>> timed out.  When that happened, SolrCloud went into recovery which
>>>>> caused
>>>>> more network traffic.  Fun stuff.
>>>>> -Joe
>>>>> On 11/22/2017 11:44 AM, Shawn Heisey wrote:
>>>>>> On 11/22/2017 6:44 AM, Joe Obernberger wrote:
>>>>>>> Right now, we have a relatively small block cache due to the
>>>>>>> requirements that the servers run other software.  We tried to
>>>>>>> the best balance between block cache size, and RAM for programs,
>>>>>>> still giving enough for local FS cache.  This came out to be
84 128M
>>>>>>> blocks - or about 10G for the cache per node (45 nodes total).
>>>>>> How much data is being handled on a server with 10GB allocated for
>>>>>> caching HDFS data?
>>>>>> The first message in this thread says the index size is 31TB, which
>>>>>> *enormous*.  You have also said that the index takes 93TB of disk
>>>>>> space.  If the data is distributed somewhat evenly, then the answer
>>>>>> my question would be that each of those 45 Solr servers would be
>>>>>> handling over 2TB of data.  A 10GB cache is *nothing* compared to
>>>>>> When index data that Solr needs to access for an operation is not
>>>>>> the
>>>>>> cache and Solr must actually wait for disk and/or network I/O, the
>>>>>> resulting performance usually isn't very good.  In most cases you
>>>>>> need to have enough memory to fully cache the index data ... but
>>>>>> than half a percent is not going to be enough.
>>>>>> Thanks,
>>>>>> Shawn
>>>>>> ---
>>>>>> This email has been checked for viruses by AVG.

View raw message