accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode
Date Fri, 24 May 2013 14:14:19 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666314#comment-13666314
] 

Dave Marion edited comment on ACCUMULO-118 at 5/24/13 2:12 PM:
---------------------------------------------------------------

Personally I am not a fan of the hash idea. I would rather see a mapping of namespace prefix
to NN in the configuration (ns1 = hdfs://host:port, ns2 = hdfs://host:port). I'm thinking
forward to table file load balancing across namespaces and backups (see my comment from 3/Apr/12).
If for example you quiesced the database and performed a backup, then you could change the
namespace mapping such that ns1 and ns2 point to the same hdfs://host:port if for some reason
you lost the 2nd hdfs instance (it crashed, you wanted to remove it, etc). 

This could also allow for an upgrade of Hadoop wile Accumulo is still running. Think about
the scenario where ns1 is on racks 1&2 and ns2 is on racks 3&4 of a cluster and the
files of table T are spread across ns1 and ns2. You could change the configuration of the
table file load balancer (new feature) that puts new files on ns2. You recompact the table
so now all new files are on ns2. When done for all tables (and walogs), then you can shutdown
ns1 and upgrade to a new version of Hadoop.
                
      was (Author: dlmarion):
    Personally I am not a fan of the hash idea. I would rather see a mapping of namespace
prefix to NN in the configuration (ns1 = hdfs://host:port, ns2 = hdfs://host:port). I'm thinking
forward to table file load balancing across namespaces and backups (see my comment from 3/Apr/12).
If for example you quiesced the database and performed a backup, then you could change the
namespace mapping such that ns1 and ns2 point to the same hdfs://host:port if for some reason
you lost the 2nd hdfs instance (it crashed, you wanted to remove it, etc). 

This could also allow for of Hadoop wile Accumulo is still running. Think about the scenario
where ns1 is on racks 1&2 and ns2 is on racks 3&4 of a cluster and the files of table
T are spread across ns1 and ns2. You could change the configuration of the table file load
balancer (new feature) that puts new files on ns2. You recompact the table so now all new
files are on ns2. When done for all tables (and walogs), then you can shutdown ns1 and upgrade
to a new version of Hadoop.
                  
> accumulo could work across HDFS instances, which would help it to scale past a single
namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to access the
files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up
the namespace.
> We may need a pluggable strategy to determine namespace for new files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message