hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: fs.local.block.size vs file.blocksize
Date Sun, 12 Aug 2012 13:48:06 GMT
Hi Ellis,

Note that when in Hadoop-land, a "block size" term generally means the
chunking size of HDFS writers and readers, and that is not the same as
the FS term "block size" in any way.

On Thu, Aug 9, 2012 at 6:40 PM, Ellis H. Wilson III <ellis@cse.psu.edu> wrote:
> Hi all!
> Can someone please briefly explain the difference?  I do not see deprecated
> warnings for fs.local.block.size when I run with them set and I see two
> copies of RawLocalFileSystem.java (the other is local/RawLocalFs.java).

The right param still seems to be "fs.local.block.size", when it comes
to using "getDefaultBlocksize" calls via the file:/// filesystems or
other filesystems that have not over-riden the default behavior.

> The things I really need to get answers to are:
> 1. Is the default boosted to 64MB from Hadoop 1.0 to Hadoop 2.0?  I believe
> it is, but want validation on that.

The dfs.blocksize, which applies to HDFS, has not changed from its 64
MB default.

> 2. Which one controls shuffle block-size?

There is no "shuffle block-size", as shuffle goes to local filesystems
and that has no block size concepts. Can you elaborate on this?

> 3. If I have a single machine non-distributed instance, and point it at
> file://, do both of these control the persistent data's block size or just
> one of them or what?

LocalFileSystem does not chunk files into blocks. It writes/reads
regular files as you would in any language.

> 4. Is there any way to run with say a 512MB blocksize for the persistent
> data and the default 64MB blocksize for the shuffled data?

See (2).

> Thanks!

Do let us know if you have further questions.

> ellis

Harsh J

View raw message