accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode
Date Tue, 28 May 2013 15:26:20 GMT


Keith Turner commented on ACCUMULO-118:

bq.  I would rather see a mapping of namespace prefix to NN in the configuration (ns1 = hdfs://host:port,
ns2 = hdfs://host:port)

I agree, I think making this explicit and straightforward is the way to go. Although, storing
the namespace config in accumulo-site.xml seems error prone.   In the worst case node1 defines
ns1=hdfs://nn3 and node2 defines ns2=hdfs://nn4.  I would advocate only storing this mapping
in zookeeper.   

bq.  I'm thinking forward to table file load balancing across namespaces and backups (see
my comment from 3/Apr/12). 

[~dlmarion] Would avoding storing direct pointers to namenodes in the metadata table be sufficient
to satisfy this?  Always have a level of indirection like viewfs?  

[~ecn] Would there be any reason for a single tablet to spread its files across multiple name
nodes?  Tablets currently have a directory column that tells a tablet where to create new
files.  This could be converted to an absolute path/url. When a tablet creates a new file,
it uses this path.   There may be some small efficiency gain when opening multiple files for
tablet if all of the calls went to the same namenode.

   1< srv:dir  namespace://ns1/accumulo/tables/abc
   1< file:namespace://ns1/accumulo/tables/abc/F0000002.rf []    196,1

bq. Perhaps a per-table configuration of the hash function?

Could possibly have a plugin thats called to choose the value of srv:dir for a new tablet.
  The input to this function could be the KeyExtent and list of available namespaces and it
could return a namespace/url.  The default implemention could hash+mod the tablets end row
and use that index into the list of namespaces.

[~ecn] once a design is settled on, I think it would be useful if the design doc outlined
how this new feature will interact with bulk import, clone table, export/import table, and
offline map reduce.   

> accumulo could work across HDFS instances, which would help it to scale past a single
> ----------------------------------------------------------------------------------------------
>                 Key: ACCUMULO-118
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>         Attachments: ACCUMULO-118-01.txt
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
> Consider using full path names to files, which would allow the servers to access the
files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up
the namespace.
> We may need a pluggable strategy to determine namespace for new files.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message