hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Another thought on client-side support of HDFS federation
Date Fri, 27 May 2016 19:40:41 GMT
Hello Tianyi HE,

I noticed that a similar design for a federation proxying model has just
been proposed on Apache JIRA HDFS-10467.  You might want to join the
conversation there.

https://issues.apache.org/jira/browse/HDFS-10467


--Chris Nauroth




On 5/2/16, 10:32 AM, "Colin McCabe" <cmccabe@apache.org> wrote:

>Hi Tianyi HE,
>
>Thanks for sharing this!  This reminds me of the httpfs daemon.  This
>daemon basically sits in front of an HDFS cluster and accepts requests,
>which it serves by forwarding them to the underlying HDFS instance.
>There is some documentation about it here:
>https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html
>
>Since httpfs uses an org.apache.hadoop.fs.FileSystem instance, it seems
>like you could plug in the apache.hadoop.fs.viewfs.ViewFileSystem class
>and be up and running with federation.  I haven't tried this, but I
>would expect that it would work, unless there are bugs in ViewFS itself.
>
>The big advantage of httpfs is that it provides a webhdfs-style REST
>interface.  As you said, this kind of interface makes it simple to use
>any language with REST bindings, without worrying about using a thick
>client.
>
>The big disadvantage of httpfs is that you must move both metadata and
>data operations through the httpfs daemon.  This could become a
>performance bottleneck.  It seems like you are concerned about this
>bottleneck.
>
>We also have webhdfs.  Unlike httpfs, webhdfs doesn't require all the
>data to move through its daemon.  With webhdfs, the client talks to
>DataNodes directly.
>
>I wonder if extending httpfs or webhdfs would be a better approach than
>starting from scratch.  There is a maintenance burden for adding new
>services and daemons.  This was our motivation for removing hftp, for
>example.  It's certainly something to think about.
>
>best,
>Colin
>
>
>On Thu, Apr 28, 2016, at 17:55, 何天一 wrote:
>> Hey guys,
>> 
>> My associates have investigated HDFS federation recently, which, turns
>> out
>> to be a quite good solution for improving scalability on
>> NameNode/DataNode
>> side.
>> 
>> However, we encountered some problem on client-side. Since:
>> A) For historical reason, we use clients in multiple languages to access
>> HDFS, (i.e. python-snakebite, or perhaps libhdfs++). So we either
>> implement
>> multiple versions of ViewFS or we give up the consistency view (which
>>can
>> be confusing to user).
>> B) We have hadoop client configuration deployed on client nodes, which
>>we
>> do not have control over . Also, releasing new configuration could be a
>> real heavy operation because it needs to be pushed to several thousand
>>of
>> nodes, as well as maintaining consistency (say a node is down throughout
>> the operation, then come back online. it could still possess a stale
>> version of configuration).
>> 
>> So we intended to explore another solution to these problems, and came
>>up
>> with a proxy model.
>> That is, build a RPC proxy in front of NameNodes.
>> All clients talk to proxy when they need to consult NameNode, then proxy
>> decide which NameNode should the request go to according to mount table.
>> This solved our problem. All clients are seamlessly upgraded with
>> federation support.
>> We open sourced the proxy recently: https://github.com/bytedance/nnproxy
>> (BTW, all kinds of feedbacks are welcomed)
>> 
>> But there are still a few issues. For example, several modifications
>> needs
>> to be done inside hadoop ipc to support rpc forwarding. We released
>>patch
>> according to which with nnproxy project (
>> https://github.com/bytedance/nnproxy/tree/master/hadoop-patches). But it
>> could be better to have these merged to apache trunk. Does someone think
>> it's worth?
>> 
>> 
>> -- 
>> Cheers,
>> Tianyi HE
>> (+86) 185 0042 4096
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Mime
View raw message