hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: How to create a lot files in HDFS quickly?
Date Mon, 30 May 2011 03:52:01 GMT
First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many.

Secondly, file creation is bottle-necked by the name node so the files that
you can create can't be created at more than about 1000 per second (and
achieving more than half that rate is somewhat difficult).

Thirdly, you need to check your cluster size because each data node can only
store a limited number of blocks (exactly how many differs from version to
version of Hadoop).  For small clusters this is a more exigent limit than
the size limit of the name node.

Why is it that you need to do this?

Perhaps there is a work-around?  Consider for instance HAR files:


2011/5/29 ccxixicc <ccxixicc@foxmail.com>

> Hi all
> I'm doing a test and need create lots of files ( 100 million ) in HDFS, I
> use a shell script to do this , it's very very slow, how to create a lot
> files in HDFS quickly?
> Thanks

View raw message