lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Running Solr on HDFS - Disk space
Date Thu, 07 Jun 2018 13:30:36 GMT
On 6/7/2018 6:41 AM, Greenhorn Techie wrote:
> As HDFS has got its own replication mechanism, with a HDFS replication
> factor of 3, and then SolrCloud replication factor of 3, does that mean
> each document will probably have around 9 copies replicated underneath of
> HDFS? If so, is there a way to configure HDFS or Solr such that only three
> copies are maintained overall?

Yes, that is exactly what happens.

SolrCloud replication assumes that each of its replicas is a completely 
independent index.  I am not aware of anything in Solr's HDFS support 
that can use one HDFS index directory for multiple replicas.  At the 
most basic level, a Solr index is a Lucene index.  Lucene goes to great 
lengths to make sure that an index *CANNOT* be used in more than one place.

Perhaps somebody who is more familiar with HDFSDirectoryFactory can 
offer you a solution.  But as far as I know, there isn't one.

Thanks,
Shawn


Mime
View raw message