hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: TestDFSIO writes files on HDFS with wrong block size?
Date Thu, 20 May 2010 15:37:54 GMT
Kiyoshi,

Block size is set by the client, so no need to restart nor format nor
changing the configs.

$ ls -l testfile.txt
-rw-r--r-- 1 knoguchi users 202145 May  1  2009 testfile.txt
$ hadoop dfs -put testfile.txt /user/knoguchi/testfile.txt
$ hadoop dfs -Ddfs.block.size=10240 -put testfile.txt
/user/knoguchi/testfile2.txt
$ hadoop fsck /user/knoguchi/testfile.txt | grep "Total blocks"
 Total blocks (validated):      1 (avg. block size 202145 B)
$ hadoop fsck /user/knoguchi/testfile2.txt | grep "Total blocks"
 Total blocks (validated):      20 (avg. block size 10107 B)
$ 

Koji


On 5/20/10 7:56 AM, "Kiyoshi Mizumaru" <kiyoshi.mizumaru@gmail.com> wrote:

> Unfortunately it does not work as I expected.
> 
> Cleaned up previous Hadoop instance data by removing all the files/directories
> which exist in dfs.name.dir and dfs.data.dir, and formatted new HDFS with
> hadoop namenode -format gave me a new Hadoop instance as I expected.
> 
> It seems that changing configuration files and formatting HDFS (and restarting
> all daemons, of course) are not enough to change replication and block size,
> is it correct?
> 
> 
> On Wed, May 19, 2010 at 2:14 PM, Kiyoshi Mizumaru
> <kiyoshi.mizumaru@gmail.com> wrote:
>> Hi Koji,
>> 
>> Thank you for your reply.
>> I'll try what you wrote and see if it works as expected.
>> 
>> By the way, what does the `client-side config' mean?
>> dfs.replication and dfs.block.size are written in conf/hdfs-site.xml.
>> Where should I put them into?
>> 
>> 
>> On Tue, May 18, 2010 at 3:01 AM, Koji Noguchi <knoguchi@yahoo-inc.com> wrote:
>>> Hi Kiyoshi,
>>> 
>>> In case you haven't received a reply, try
>>> 
>>> hadoop jar hadoop-*-test.jar TestDFSIO -Ddfs.block.size=536870912 -D
>>> dfs.replication=1 ....
>>> 
>>> If that works, add them as part of your client-side config.
>>> 
>>> Koji
>>> 
>>> 
>>> On 5/13/10 11:38 PM, "Kiyoshi Mizumaru" <kiyoshi.mizumaru@gmail.com> wrote:
>>> 
>>>> Hi all, this is my first post to this list, and if i'm not in
>>>> appropriate place, please let me know.
>>>> 
>>>> 
>>>> I have just created a Hadoop instance and its HDFS is configured as:
>>>>   dfs.replication = 1
>>>>   dfs.block.size = 536870912 (512MB)
>>>> 
>>>> Then I typed the following command to run TestDFSIO against this instance:
>>>>   % hadoop jar hadoop-*-test.jar TestDFSIO -write -nrFiles 1 -fileSize 1024
>>>> 
>>>> One file with 1024MB size should consist of 2 blocks of size 512MB,
>>>> but filesystem browser shows that /benchmarks/TestDFSIO/io_data/test_io_0
>>>> consists of 16 blocks of size 64MB, and its replication is 3, so 48 blocks
>>>> are displayed in total..
>>>> 
>>>> This is not what I expected, does anyone know what's wrong?
>>>> 
>>>> I'm using Cloudera's Distribution for Hadoop (hadoop-0.20-0.20.2+228-1)
>>>> with Sun Java6 (jdk-6u19-linux-amd64).  Thanks in advance and sorry for
>>>> my poor English, I'm still leaning it.
>>>> --
>>>> Kiyoshi
>>> 
>>> 
>> 


Mime
View raw message