lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Solr on HDFS: AutoAddReplica does not add a replica
Date Thu, 12 Jan 2017 16:42:25 GMT
On 1/11/2017 7:14 PM, Chetas Joshi wrote:
> This is what I understand about how Solr works on HDFS. Please correct me
> if I am wrong.
>
> Although solr shard replication Factor = 1, HDFS default replication = 3.
> When the node goes down, the solr server running on that node goes down and
> hence the instance (core) representing the replica goes down. The data in
> on HDFS (distributed across all the datanodes of the hadoop cluster with 3X
> replication).  This is the reason why I have kept replicationFactor=1.
>
> As per the link:
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> One benefit to running Solr in HDFS is the ability to automatically add new
> replicas when the Overseer notices that a shard has gone down. Because the
> "gone" index shards are stored in HDFS, a new core will be created and the
> new core will point to the existing indexes in HDFS.
>
> This is the expected behavior of Solr overseer which I am not able to see.
> After a couple of hours a node was assigned to host the shard but the
> status of the shard is still "down" and the instance dir is missing on that
> node for that particular shard_replica.

As I said before, I know very little about HDFS, so the following could
be wrong, but it makes sense so I'll say it:

I would imagine that Solr doesn't know or care what your HDFS
replication is ... the only replicas it knows about are the ones that it
is managing itself.  The autoAddReplicas feature manages *SolrCloud*
replicas, not HDFS replicas.

I have seen people say that multiple SolrCloud replicas will take up
additional space in HDFS -- they do not point at the same index files. 
This is because proper Lucene operation requires that it lock an index
and prevent any other thread/process from writing to the index at the
same time.  When you index, SolrCloud updates all replicas independently
-- the only time indexes are replicated is when you add a new replica or
a serious problem has occurred and an index needs to be recovered.

Thanks,
Shawn


Mime
View raw message