hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhanwei Wang" <had...@wangzw.org>
Subject problem of large distributed system access hdfs
Date Wed, 30 Nov 2011 14:25:47 GMT
Hi everyone


I have a problem when I want to enable our distributed system to access hdfs.


The background:

In our system, we have 4 ~ 6 segment instance on one physical node, and each segment forks
a new process to deal with a new session. So if a client connect to our system, we will have
4~6 processes working on this session in one physical node. We need to handle 250 sessions
at the same time, that means 1000~1500 processes on one physical node. That is ok for our


Now we want to enable our system to access hdfs. We use libhdfs to do that. It works ok if
we only deal with a few concurrent session, but if too many client connect to the system,
the system cannot access hdfs anymore.

The reason is libhdfs create one JVM for each process, and our machine cannot afford 1000~1500
JVMs. Libhdfs report the error that it cannot create JVM due to memory limitation. 


We try to walk around with this problem using a few processes as proxies to access hdfs and
exchange data with other processes, but proxy process could become the performance bottleneck.


My questions:

1)      Can different processes share JVM if we use libhdfs to access hdfs?

2)      Is there any other good solution? Maybe I am stupid to find it out. 

3)      Is there any ongoing project to implement a C/C++ hdfs client, without JVM? Is it
a good idea to create such a project? If it is, I will be very glad to contribute my time.







Best Regards




Zhanwei Wang


View raw message