hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Puri, Aseem" <Aseem.P...@Honeywell.com>
Subject RE: More Replication on dfs
Date Thu, 16 Apr 2009 05:03:05 GMT
Hi
	My problem is not that my data is under replicated. I have 3
data nodes. In my hadoop-site.xml I also set the configuration as:

  <property>
  <name>dfs.replication</name>
  <value>2</value>
  </property>

But after this also data is replicated on 3 nodes instead of two nodes.

Now, please tell what can be the problem?

Thanks & Regards
Aseem Puri



-----Original Message-----
From: Raghu Angadi [mailto:rangadi@yahoo-inc.com] 
Sent: Wednesday, April 15, 2009 2:58 AM
To: core-user@hadoop.apache.org
Subject: Re: More Replication on dfs

Aseem,

Regd over-replication, it is mostly app related issue as Alex mentioned.

But if you are concerned about under-replicated blocks in fsck output :

These blocks should not stay under-replicated if you have enough nodes 
and enough space on them (check NameNode webui).

Try grep-ing for one of the blocks in NameNode log (and datnode logs as 
well, since you have just 3 nodes).

Raghu.

Puri, Aseem wrote:
> Alex,
> 
> Ouput of $ bin/hadoop fsck / command after running HBase data insert
> command in a table is:
> 
> .....
> .....
> .....
> .....
> .....
> /hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
> blk_-5193
> 695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
> .
> /hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
> replicated
> blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
> replica(s).
> .
> /hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
> replicated
>  blk_3934493034551838567_3132. Target Replicas is 3 but found 1
> replica(s).
> .
> /user/HadoopAdmin/hbase table.doc:  Under replicated
> blk_4339521803948458144_103
> 1. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/bin.doc:  Under replicated
> blk_-3661765932004150973_1030
> . Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/file01.txt:  Under replicated
> blk_2744169131466786624_10
> 01. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/file02.txt:  Under replicated
> blk_2021956984317789924_10
> 02. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/test.txt:  Under replicated
> blk_-3062256167060082648_100
> 4. Target Replicas is 3 but found 2 replica(s).
> ...
> /user/HadoopAdmin/output/part-00000:  Under replicated
> blk_8908973033976428484_1
> 010. Target Replicas is 3 but found 2 replica(s).
> Status: HEALTHY
>  Total size:    48510226 B
>  Total dirs:    492
>  Total files:   439 (Files currently being written: 2)
>  Total blocks (validated):      401 (avg. block size 120973 B) (Total
> open file
> blocks (not validated): 2)
>  Minimally replicated blocks:   401 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       399 (99.50124 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    2
>  Average block replication:     1.3117207
>  Corrupt blocks:                0
>  Missing replicas:              675 (128.327 %)
>  Number of data-nodes:          2
>  Number of racks:               1
> 
> 
> The filesystem under path '/' is HEALTHY
> Please tell what is wrong.
> 
> Aseem
> 
> -----Original Message-----
> From: Alex Loddengaard [mailto:alex@cloudera.com] 
> Sent: Friday, April 10, 2009 11:04 PM
> To: core-user@hadoop.apache.org
> Subject: Re: More Replication on dfs
> 
> Aseem,
> 
> How are you verifying that blocks are not being replicated?  Have you
> ran
> fsck?  *bin/hadoop fsck /*
> 
> I'd be surprised if replication really wasn't happening.  Can you run
> fsck
> and pay attention to "Under-replicated blocks" and "Mis-replicated
> blocks?"
> In fact, can you just copy-paste the output of fsck?
> 
> Alex
> 
> On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
> <Aseem.Puri@honeywell.com>wrote:
> 
>> Hi
>>        I also tried the command $ bin/hadoop balancer. But still the
>> same problem.
>>
>> Aseem
>>
>> -----Original Message-----
>> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
>> Sent: Friday, April 10, 2009 11:18 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: More Replication on dfs
>>
>> Hi Alex,
>>
>>        Thanks for sharing your knowledge. Till now I have three
>> machines and I have to check the behavior of Hadoop so I want
>> replication factor should be 2. I started my Hadoop server with
>> replication factor 3. After that I upload 3 files to implement word
>> count program. But as my all files are stored on one machine and
>> replicated to other datanodes also, so my map reduce program takes
> input
>> from one Datanode only. I want my files to be on different data node
> so
>> to check functionality of map reduce properly.
>>
>>        Also before starting my Hadoop server again with replication
>> factor 2 I formatted all Datanodes and deleted all old data manually.
>>
>> Please suggest what I should do now.
>>
>> Regards,
>> Aseem Puri
>>
>>
>> -----Original Message-----
>> From: Mithila Nagendra [mailto:mnagendr@asu.edu]
>> Sent: Friday, April 10, 2009 10:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Re: More Replication on dfs
>>
>> To add to the question, how does one decide what is the optimal
>> replication
>> factor for a cluster. For instance what would be the appropriate
>> replication
>> factor for a cluster consisting of 5 nodes.
>> Mithila
>>
>> On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <alex@cloudera.com>
>> wrote:
>>
>>> Did you load any files when replication was set to 3?  If so, you'll
>> have
>>> to
>>> rebalance:
>>>
>>>
>
<http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
>> r>
>>> <
>>>
>
http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
>> er
>>> Note that most people run HDFS with a replication factor of 3.
> There
>> have
>>> been cases when clusters running with a replication of 2 discovered
>> new
>>> bugs, because replication is so often set to 3.  That said, if you
> can
>> do
>>> it, it's probably advisable to run with a replication factor of 3
>> instead
>>> of
>>> 2.
>>>
>>> Alex
>>>
>>> On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
> <Aseem.Puri@honeywell.com
>>>> wrote:
>>>> Hi
>>>>
>>>>            I am a new Hadoop user. I have a small cluster with 3
>>>> Datanodes. In hadoop-site.xml values of dfs.replication property
> is
>> 2
>>>> but then also it is replicating data on 3 machines.
>>>>
>>>>
>>>>
>>>> Please tell why is it happening?
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Aseem Puri
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>


Mime
View raw message