accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode
Date Tue, 28 May 2013 15:26:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668366#comment-13668366
] 

Keith Turner commented on ACCUMULO-118:
---------------------------------------

bq.  I would rather see a mapping of namespace prefix to NN in the configuration (ns1 = hdfs://host:port,
ns2 = hdfs://host:port)

I agree, I think making this explicit and straightforward is the way to go. Although, storing
the namespace config in accumulo-site.xml seems error prone.   In the worst case node1 defines
ns1=hdfs://nn3 and node2 defines ns2=hdfs://nn4.  I would advocate only storing this mapping
in zookeeper.   

bq.  I'm thinking forward to table file load balancing across namespaces and backups (see
my comment from 3/Apr/12). 

[~dlmarion] Would avoding storing direct pointers to namenodes in the metadata table be sufficient
to satisfy this?  Always have a level of indirection like viewfs?  

[~ecn] Would there be any reason for a single tablet to spread its files across multiple name
nodes?  Tablets currently have a directory column that tells a tablet where to create new
files.  This could be converted to an absolute path/url. When a tablet creates a new file,
it uses this path.   There may be some small efficiency gain when opening multiple files for
tablet if all of the calls went to the same namenode.

{noformat}
   1< srv:dir  namespace://ns1/accumulo/tables/abc
   1< file:namespace://ns1/accumulo/tables/abc/F0000002.rf []    196,1
{noformat}

bq. Perhaps a per-table configuration of the hash function?

Could possibly have a plugin thats called to choose the value of srv:dir for a new tablet.
  The input to this function could be the KeyExtent and list of available namespaces and it
could return a namespace/url.  The default implemention could hash+mod the tablets end row
and use that index into the list of namespaces.

[~ecn] once a design is settled on, I think it would be useful if the design doc outlined
how this new feature will interact with bulk import, clone table, export/import table, and
offline map reduce.   

                
> accumulo could work across HDFS instances, which would help it to scale past a single
namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to access the
files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up
the namespace.
> We may need a pluggable strategy to determine namespace for new files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message