lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6305) Ability to set the replication factor for index files created by HDFSDirectoryFactory
Date Thu, 04 May 2017 02:54:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996083#comment-15996083
] 

Harsh J commented on SOLR-6305:
-------------------------------

[~thelabdude] is right here in the description BTW. Hadoop APIs let you pass any arbitrary
replication value via the FileSystem.create API - this overrides the local default (dfs.replication
config) when passed. In Solr, the API usage is effectively asking the NameNode what its default
replication factor is, and then creates a file with that value, ignoring the local configuration.
As a result, you cannot specifically control the replication factor of index files in Solr
without changing the whole HDFS cluster's default.

> Ability to set the replication factor for index files created by HDFSDirectoryFactory
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-6305
>                 URL: https://issues.apache.org/jira/browse/SOLR-6305
>             Project: Solr
>          Issue Type: Improvement
>          Components: hdfs
>         Environment: hadoop-2.2.0
>            Reporter: Timothy Potter
>
> HdfsFileWriter doesn't allow us to create files in HDFS with a different replication
factor than the configured DFS default because it uses:     
> {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
> Since we have two forms of replication going on when using HDFSDirectoryFactory, it would
be nice to be able to set the HDFS replication factor for the Solr directories to a lower
value than the default. I realize this might reduce the chance of data locality but since
Solr cores each have their own path in HDFS, we should give operators the option to reduce
it.
> My original thinking was to just use Hadoop setrep to customize the replication factor,
but that's a one-time shot and doesn't affect new files created. For instance, I did:
> {{hadoop fs -setrep -R 1 solr49/coll1}}
> My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an example
> Then added some more docs to the coll1 and did:
> {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
> 3 <-- should be 1
> So it looks like new files don't inherit the repfact from their parent directory.
> Not sure if we need to go as far as allowing different replication factor per collection
but that should be considered if possible.
> I looked at the Hadoop 2.2.0 code to see if there was a way to work through this using
the Configuration object but nothing jumped out at me ... and the implementation for getServerDefaults(path)
is just:
>   public FsServerDefaults getServerDefaults(Path p) throws IOException {
>     return getServerDefaults();
>   }
> Path is ignored ;-)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message