accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode
Date Sat, 01 Feb 2014 17:56:12 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888661#comment-13888661
] 

Eric Newton commented on ACCUMULO-118:
--------------------------------------

bq.  I think this feature was merged in before it was complete

Probably.  But it was a pretty massive change, and maintaining it as a patch set, even with
git's help, would have been very hard.

bq. I did not realize all of the problems absolute paths could cause

Nor would we have if it was not merged in.

bq. should have started with administrative use cases

I think we are getting better at this.  For example, I can think of lots of ways that the
initial WAL implementation caused a lot of grief for unsuspecting administrators.  We fixed
this after it was released into the wild based on feedback from the administrators. Ultimately
these were fixed by moving the WAL to HDFS, and then ferreting out all the settings to make
HDFS an appropriate store for the WAL.

I think the use case of "what if administrators change the URL of a NN?" is a reasonable one,
but was certainly not anything I was thinking about when I was changing thousands of lines
of code to use full paths.  The more subtle issues of determining aliases for namespaces (hdfs://example:9000
vs hdfs://example.com:9000), and recognizing real namespaces under viewfs are the sort of
subtle things that we will only find through actual use.

My initial goal of using concrete paths to simplify debugging might have been the wrong choice.
 Using some kind of indirect configuration that points to a real namespace (like viewfs) may
have been better.  But, that requires that you value "administrators should be able to easily
move a NN to a new URL."  The ability to do this with the old relative paths was not a design
goal, so much as a useful result of using the shortest name possible for each file.

bq. These really seem to be the long poll in the tent for the 1.6 release 

Seems to me to not be so far behind namespaces. Constructive criticism includes suggestions
on how to make things better.  Working code is even more constructive.

> accumulo could work across HDFS instances, which would help it to scale past a single
namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt, ACCUMULO-118-02.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to access the
files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to break up
the namespace.
> We may need a pluggable strategy to determine namespace for new files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message