accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode
Date Tue, 28 May 2013 19:57:20 GMT


Keith Turner commented on ACCUMULO-118:

I was looking at some docs on viewfs.  If possible, I am thinking we should not do anything
that would preclude using viewfs.   It seems like if URIs were supported for tablet dirs and
files (along with a way to choose a tablet dir) that this would almost be enough to support

  1;m srv:dir viewfs://clusterX/accumulo1/tables/abc
  1;m file:viewfs://ns1/accumulo1/tables/abc/F0000002.rf []    196,1

  1< srv:dir viewfs://clusterX/accumulo2/tables/abc
  1< file:viewfs://ns1/accumulo2/tables/abc/F0000003.rf []    196,1

If we want to further develop our own indirection layer, then maybe we should define our own
URI prefix.   Something like ans://.  How independent should this URI be?  Something like
ans://<namespace name>/<path> would assume that you know where to look <namespace
name> up.  If the URI were like ans://<zookeepers>+<instance id>+<namespace
name>/<path> then it would be more self contained.   I do not think its necessary
to make it self contained, its for internal use and would be translated by as needed.

I was thinking about how bulk import will work in this federated world.  Below is one way
this could work.

 * Client calls import dir w/ /foo1
 * Accumlo client code uses local config to convert /foo1 to URI hdfs://nn1/foo1
 * hdfs://nn1/foo1 is passed to Accumulo server code via thrift
 * Accumulo server code looks at URI to determine where to move to, determines it has accumulo
dir hdfs://nn1/accumulo.
 * moves files in hdfs://nn1/foo1 to hdfs://nn1/accumulo/tables/abc
 * Replaces hdfs://nn1/accumulo/tables/abc with ans://ns1/accumulo/tables/abc
 * Does bulk import of files in ans://ns1/accumulo/tables/abc

Is this how this should work?  The scenario above implies that Accumulo needs a dir on each
namenode and way of mapping URIs to the appropriate Accumulo dir.  Need to wor through this
scenario w/ viewfs also.  

> accumulo could work across HDFS instances, which would help it to scale past a single
> ----------------------------------------------------------------------------------------------
>                 Key: ACCUMULO-118
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>         Attachments: ACCUMULO-118-01.txt
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
> Consider using full path names to files, which would allow the servers to access the
files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up
the namespace.
> We may need a pluggable strategy to determine namespace for new files.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message