hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: problem of large distributed system access hdfs
Date Wed, 30 Nov 2011 14:38:41 GMT
You could check out Hoop[1], a REST interface for accessing HDFS.
Since it's REST based, you can easily load balance clients across
multiple servers. You'll have to write the C/C++ code for
communicating with Hoop, but that shouldn't require too much more than
a thin wrapper around an HTTP client library. Also, Hoop will be
included in a futre version of Hadoop[2].

-Joey

[1] http://cloudera.github.com/hoop/docs/latest/index.html
[2] https://issues.apache.org/jira/browse/HDFS-2178

On Wed, Nov 30, 2011 at 9:25 AM, Zhanwei Wang <hadoop@wangzw.org> wrote:
> Hi everyone
>
>
>
> I have a problem when I want to enable our distributed system to access
> hdfs.
>
>
>
> The background:
>
> In our system, we have 4 ~ 6 segment instance on one physical node, and each
> segment forks a new process to deal with a new session. So if a client
> connect to our system, we will have 4~6 processes working on this session in
> one physical node. We need to handle 250 sessions at the same time, that
> means 1000~1500 processes on one physical node. That is ok for our system.
>
>
>
> Now we want to enable our system to access hdfs. We use libhdfs to do that.
> It works ok if we only deal with a few concurrent session, but if too many
> client connect to the system, the system cannot access hdfs anymore.
>
> The reason is libhdfs create one JVM for each process, and our machine
> cannot afford 1000~1500 JVMs. Libhdfs report the error that it cannot create
> JVM due to memory limitation.
>
>
>
> We try to walk around with this problem using a few processes as proxies to
> access hdfs and exchange data with other processes, but proxy process could
> become the performance bottleneck.
>
>
>
> My questions:
>
> 1)      Can different processes share JVM if we use libhdfs to access hdfs?
>
> 2)      Is there any other good solution? Maybe I am stupid to find it out.
>
> 3)      Is there any ongoing project to implement a C/C++ hdfs client,
> without JVM? Is it a good idea to create such a project? If it is, I will be
> very glad to contribute my time.
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> Best Regards
>
>
>
> ------------------------------
>
>
>
> Zhanwei Wang
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Mime
View raw message