lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <joseph.obernber...@gmail.com>
Subject Re: Recovery Issue - Solr 6.6.1 and HDFS
Date Wed, 22 Nov 2017 19:02:08 GMT
Hi Shawn - thank you for your reply.  The index is 29.9TBytes as 
reported by:
hadoop fs -du -s -h /solr6.6.0
29.9 T  89.9 T  /solr6.6.0

The 89.9TBytes is due to HDFS having 3x replication.  There are about 
1.1 billion documents indexed and we index about 2.5 million documents 
per day.  Assuming an even distribution, each node is handling about 
680GBytes of index.  So our cache size is 1.4%. Perhaps 'relatively 
small block cache' was an understatement! This is why we split the 
largest collection into two, where one is data going back 30 days, and 
the other is all the data.  Most of our searches are not longer than 30 
days back.  The 30 day index is 2.6TBytes total.  I don't know how the 
HDFS block cache splits between collections, but the 30 day index 
performs acceptable for our specific application.

If we wanted to cache 50% of the index, each of our 45 nodes would need 
a block cache of about 350GBytes.  I'm accepting offers of DIMMs!

What I believe caused our 'recovery, fail, retry loop' was one of our 
servers died.  This caused HDFS to start to replicate blocks across the 
cluster and produced a lot of network activity.  When this happened, I 
believe there was high network contention for specific nodes in the 
cluster and their network interfaces became pegged and requests for HDFS 
blocks timed out.  When that happened, SolrCloud went into recovery 
which caused more network traffic.  Fun stuff.

-Joe


On 11/22/2017 11:44 AM, Shawn Heisey wrote:
> On 11/22/2017 6:44 AM, Joe Obernberger wrote:
>> Right now, we have a relatively small block cache due to the
>> requirements that the servers run other software.  We tried to find
>> the best balance between block cache size, and RAM for programs, while
>> still giving enough for local FS cache.  This came out to be 84 128M
>> blocks - or about 10G for the cache per node (45 nodes total).
> How much data is being handled on a server with 10GB allocated for
> caching HDFS data?
>
> The first message in this thread says the index size is 31TB, which is
> *enormous*.  You have also said that the index takes 93TB of disk
> space.  If the data is distributed somewhat evenly, then the answer to
> my question would be that each of those 45 Solr servers would be
> handling over 2TB of data.  A 10GB cache is *nothing* compared to 2TB.
>
> When index data that Solr needs to access for an operation is not in the
> cache and Solr must actually wait for disk and/or network I/O, the
> resulting performance usually isn't very good.  In most cases you don't
> need to have enough memory to fully cache the index data ... but less
> than half a percent is not going to be enough.
>
> Thanks,
> Shawn
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>


Mime
View raw message