hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Loddengaard <a...@cloudera.com>
Subject Re: hdfs on public internet/wan
Date Thu, 28 May 2009 02:02:30 GMT
It sounds like HDFS probably isn't the right application for you.  When new
nodes add themselves to the cluster, the administrator needs to rebalance
the cluster in order for the new nodes to get data.  Without rebalancing,
new data will be stored on those new nodes, but old data will not be
distributed to these new nodes.

In the case when a node leaves the cluster for 10 minutes, the master will
start replicating the blocks that were on that node onto other nodes in the
cluster.  The point is is that -- though HDFS can handle nodes dying and new
nodes being added -- it's not designed for this to happen all the time.

Similararly, HDFS doesn't have any security.  You would have to configure
your own firewall to limit access.  I imagine doing so would be really
annoying when not all machines are behind the same router.

So anyway, you may want to consider other file systems (perhaps there is
something P2P out there?) for what you're trying to do.

Hope this helps.

Alex

On Wed, May 27, 2009 at 1:11 PM, Lukasz Szybalski <szybalski@gmail.com>wrote:

> Hello,
> I wanted to setup hdfs to be used as a public like file system where,
> aside from few core computer that will be running masters, you would
> have x amount of data nodes/computers that would be located through
> the internet?
>
> How do I setup master servers, and then 3-65+ slave servers, where
> each server can come or leave at any time they want.
> How would I control how slave servers are added? assuming they would
> give me their ip, available size, and in return I would need to
> provide then with...?
> Should the ssh account that is used be created in some special way? No
> shell access? or some restrictions? (command?)
> Are there any specific differences that should be accounted for in
> this "public" version of hadoop cluster?
>
>
> Let me know.
>
> Thanks,
> Lucas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message