lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Obernberger <>
Subject Re: Solr on HDFS
Date Fri, 02 Aug 2019 12:57:53 GMT
Thank you.  No, while the cluster is using Cloudera for HDFS, we do not 
use Cloudera to manager the solr cluster.  If it is a 
configuration/architecture issue, what can I do to fix it?  I'd like a 
system where servers can come and go, but the indexes stay available and 
recover automatically.  Is that possible with HDFS?
While adding an alias to other collections would be an option, if that 
collection is the only collection, or one that is currently needed, in a 
live system, we can't bring it down, re-create it, and re-index when 
that process may take weeks to do.

Any ideas?


On 8/1/2019 6:15 PM, Angie Rabelero wrote:
> I don’t think you’re using claudera or ambari, but ambari has an option to delete
the locks. This seems more a configuration/architecture isssue than a realibility issue. You
may want to spin up an alias while you bring down, clear locks and directories, recreate and
index the affected collection, while you work your other isues.
> On Aug 1, 2019, at 16:40, Joe Obernberger <> wrote:
> Been using Solr on HDFS for a while now, and I'm seeing an issue with redundancy/reliability.
 If a server goes down, when it comes back up, it will never recover because of the lock files
in HDFS. That solr node needs to be brought down manually, the lock files deleted, and then
brought back up.  At that point, it appears to copy all the data for its replicas.  If the
index is large, and new data is being indexed, in some cases it will never recover. The replication
retries over and over.
> How can we make a reliable Solr Cloud cluster when using HDFS that can handle servers
coming and going?
> Thank you!
> -Joe
> ---
> This email has been checked for viruses by AVG.

View raw message