hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric <eric.x...@gmail.com>
Subject Re: How to get large data into HDFS
Date Tue, 29 Mar 2011 14:22:18 GMT
Hi Will,

In theory, your only bottleneck is the network and the amount of datanodes
you have running, so it should scale quite well. I'm very interested to hear
about your experiences after adding more writers.

Thanks,
Eric

2011/3/29 Will Maier <wcmaier@hep.wisc.edu>

> Hi Eric-
>
> On Tue, Mar 29, 2011 at 03:20:38PM +0200, Eric wrote:
> > I'm interested in hearing how you get data into and out of HDFS. Are you
> > using tools like Flume? Are you using fuse_dfs? Are you putting files on
> > HDFS with "hadoop dfs -put ..."?
> > And how does your method scale? Can you move terrabytes of data per day?
> Or
> > are we talking gigabytes?
>
> I'm currently migrating our ~600TB datastore to HDFS. To transfer the data,
> we iterate through the raw files stored on our legacy data servers and
> write
> them to HDFS using `hadoop fs -put`. So far, I've limited the number of
> servers
> participating in the migration, so we've only had on the order of 20
> parallel
> writers. This week, I plan to increase that by at least an order of
> magnitude.
> I expect to be able to scale the migration horizontally without impacting
> our
> current production system. Then, when the transfers are complete, we can
> cut our
> protocol endpoints over without significant downtime. At least, that's the
> plan.
> ;)
>
> --
>
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
>

Mime
View raw message