lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hendrik Haddorp <hendrik.hadd...@gmx.net>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Sat, 09 Dec 2017 12:46:10 GMT
Hi,

for the HDFS case wouldn't it be nice if there was a mode in which the 
replicas just read the same index files as the leader? I mean after all 
the data is already on a shared readable file system so why would one 
even need to replicate the transaction log files?

regards,
Hendrik

On 08.12.2017 21:07, Erick Erickson wrote:
> bq: Will TLOG replicas use less network bandwidth?
>
> No, probably more bandwidth. TLOG replicas work like this:
> 1> the raw docs are forwarded
> 2> the old-style master/slave replication is used
>
> So what you do save is CPU processing on the TLOG replica in exchange
> for increased bandwidth.
>
> Since the only thing forwarded in NRT replicas (outside of recovery)
> is the raw documents, I expect that TLOG replicas would _increase_
> network usage. The deal is that TLOG replicas can take over leadership
> if the leader goes down so they must have an
> up-to-date-after-last-index-sync set of tlogs.
>
> At least that's my current understanding...
>
> Best,
> Erick
>
> On Fri, Dec 8, 2017 at 12:01 PM, Joe Obernberger
> <joseph.obernberger@gmail.com> wrote:
>> Anyone have any thoughts on this?  Will TLOG replicas use less network
>> bandwidth?
>>
>> -Joe
>>
>>
>> On 12/4/2017 12:54 PM, Joe Obernberger wrote:
>>> Hi All - this same problem happened again, and I think I partially
>>> understand what is going on.  The part I don't know is what caused any of
>>> the replicas to go into full recovery in the first place, but once they do,
>>> they cause network interfaces on servers to go fully utilized in both in/out
>>> directions.  It appears that when a solr replica needs to recover, it calls
>>> on the leader for all the data.  In HDFS, the data from the leader's point
>>> of view goes:
>>>
>>> HDFS --> Solr Leader Process -->Network--> Replica Solr Process -->HDFS
>>>
>>> Do I have this correct?  That poor network in the middle becomes a
>>> bottleneck and causes other replicas to go into recovery, which causes more
>>> network traffic.  Perhaps going to TLOG replicas with 7.1 would be better
>>> with HDFS?  Would it be possible for the leader to send a message to the
>>> replica to instead get the data straight from HDFS instead of going from one
>>> solr process to another?  HDFS would better be able to use the cluster since
>>> each block has 3x replicas.  Perhaps there is a better way to handle
>>> replicas with a shared file system.
>>>
>>> Our current plan to fix the issue is to go to Solr 7.1.0 and use TLOG.
>>> Good idea?  Thank you!
>>>
>>> -Joe
>>>
>>>
>>> On 11/22/2017 8:17 PM, Erick Erickson wrote:
>>>> Hmm. This is quite possible. Any time things take "too long" it can be
>>>>    a problem. For instance, if the leader sends docs to a replica and
>>>> the request times out, the leader throws the follower into "Leader
>>>> Initiated Recovery". The smoking gun here is that there are no errors
>>>> on the follower, just the notification that the leader put it into
>>>> recovery.
>>>>
>>>> There are other variations on the theme, it all boils down to when
>>>> communications fall apart replicas go into recovery.....
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, Nov 22, 2017 at 11:02 AM, Joe Obernberger
>>>> <joseph.obernberger@gmail.com> wrote:
>>>>> Hi Shawn - thank you for your reply. The index is 29.9TBytes as reported
>>>>> by:
>>>>> hadoop fs -du -s -h /solr6.6.0
>>>>> 29.9 T  89.9 T  /solr6.6.0
>>>>>
>>>>> The 89.9TBytes is due to HDFS having 3x replication.  There are about
>>>>> 1.1
>>>>> billion documents indexed and we index about 2.5 million documents per
>>>>> day.
>>>>> Assuming an even distribution, each node is handling about 680GBytes
of
>>>>> index.  So our cache size is 1.4%. Perhaps 'relatively small block
>>>>> cache'
>>>>> was an understatement! This is why we split the largest collection into
>>>>> two,
>>>>> where one is data going back 30 days, and the other is all the data.
>>>>> Most
>>>>> of our searches are not longer than 30 days back.  The 30 day index is
>>>>> 2.6TBytes total.  I don't know how the HDFS block cache splits between
>>>>> collections, but the 30 day index performs acceptable for our specific
>>>>> application.
>>>>>
>>>>> If we wanted to cache 50% of the index, each of our 45 nodes would need
>>>>> a
>>>>> block cache of about 350GBytes.  I'm accepting offers of DIMMs!
>>>>>
>>>>> What I believe caused our 'recovery, fail, retry loop' was one of our
>>>>> servers died.  This caused HDFS to start to replicate blocks across the
>>>>> cluster and produced a lot of network activity.  When this happened,
I
>>>>> believe there was high network contention for specific nodes in the
>>>>> cluster
>>>>> and their network interfaces became pegged and requests for HDFS blocks
>>>>> timed out.  When that happened, SolrCloud went into recovery which
>>>>> caused
>>>>> more network traffic.  Fun stuff.
>>>>>
>>>>> -Joe
>>>>>
>>>>>
>>>>> On 11/22/2017 11:44 AM, Shawn Heisey wrote:
>>>>>> On 11/22/2017 6:44 AM, Joe Obernberger wrote:
>>>>>>> Right now, we have a relatively small block cache due to the
>>>>>>> requirements that the servers run other software.  We tried to
find
>>>>>>> the best balance between block cache size, and RAM for programs,
while
>>>>>>> still giving enough for local FS cache.  This came out to be
84 128M
>>>>>>> blocks - or about 10G for the cache per node (45 nodes total).
>>>>>> How much data is being handled on a server with 10GB allocated for
>>>>>> caching HDFS data?
>>>>>>
>>>>>> The first message in this thread says the index size is 31TB, which
is
>>>>>> *enormous*.  You have also said that the index takes 93TB of disk
>>>>>> space.  If the data is distributed somewhat evenly, then the answer
to
>>>>>> my question would be that each of those 45 Solr servers would be
>>>>>> handling over 2TB of data.  A 10GB cache is *nothing* compared to
2TB.
>>>>>>
>>>>>> When index data that Solr needs to access for an operation is not
in
>>>>>> the
>>>>>> cache and Solr must actually wait for disk and/or network I/O, the
>>>>>> resulting performance usually isn't very good.  In most cases you
don't
>>>>>> need to have enough memory to fully cache the index data ... but
less
>>>>>> than half a percent is not going to be enough.
>>>>>>
>>>>>> Thanks,
>>>>>> Shawn
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> This email has been checked for viruses by AVG.
>>>>>> http://www.avg.com
>>>>>>


Mime
View raw message