hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: More Replication on dfs
Date Fri, 17 Apr 2009 04:26:38 GMT
That setting will instruct future file writes to replicate two-fold. This
has no bearing on existing files; replication can be set on a per-file
basis, so they already have their replications set in the DFS indivdually.

Use the command: bin/hadoop fs -setrep [-R] repl_factor filename...

to change the replication factor for files already in HDFS
- Aaron

On Wed, Apr 15, 2009 at 10:04 PM, Puri, Aseem <Aseem.Puri@honeywell.com>wrote:

> Hi
>        My problem is not that my data is under replicated. I have 3
> data nodes. In my hadoop-site.xml I also set the configuration as:
>
>  <property>
>  <name>dfs.replication</name>
>  <value>2</value>
>  </property>
>
> But after this also data is replicated on 3 nodes instead of two nodes.
>
> Now, please tell what can be the problem?
>
> Thanks & Regards
> Aseem Puri
>
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, April 15, 2009 2:58 AM
> To: core-user@hadoop.apache.org
> Subject: Re: More Replication on dfs
>
> Aseem,
>
> Regd over-replication, it is mostly app related issue as Alex mentioned.
>
> But if you are concerned about under-replicated blocks in fsck output :
>
> These blocks should not stay under-replicated if you have enough nodes
> and enough space on them (check NameNode webui).
>
> Try grep-ing for one of the blocks in NameNode log (and datnode logs as
> well, since you have just 3 nodes).
>
> Raghu.
>
> Puri, Aseem wrote:
> > Alex,
> >
> > Ouput of $ bin/hadoop fsck / command after running HBase data insert
> > command in a table is:
> >
> > .....
> > .....
> > .....
> > .....
> > .....
> > /hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
> > blk_-5193
> > 695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
> > replicated
> > blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
> > replicated
> >  blk_3934493034551838567_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /user/HadoopAdmin/hbase table.doc:  Under replicated
> > blk_4339521803948458144_103
> > 1. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/bin.doc:  Under replicated
> > blk_-3661765932004150973_1030
> > . Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file01.txt:  Under replicated
> > blk_2744169131466786624_10
> > 01. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file02.txt:  Under replicated
> > blk_2021956984317789924_10
> > 02. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/test.txt:  Under replicated
> > blk_-3062256167060082648_100
> > 4. Target Replicas is 3 but found 2 replica(s).
> > ...
> > /user/HadoopAdmin/output/part-00000:  Under replicated
> > blk_8908973033976428484_1
> > 010. Target Replicas is 3 but found 2 replica(s).
> > Status: HEALTHY
> >  Total size:    48510226 B
> >  Total dirs:    492
> >  Total files:   439 (Files currently being written: 2)
> >  Total blocks (validated):      401 (avg. block size 120973 B) (Total
> > open file
> > blocks (not validated): 2)
> >  Minimally replicated blocks:   401 (100.0 %)
> >  Over-replicated blocks:        0 (0.0 %)
> >  Under-replicated blocks:       399 (99.50124 %)
> >  Mis-replicated blocks:         0 (0.0 %)
> >  Default replication factor:    2
> >  Average block replication:     1.3117207
> >  Corrupt blocks:                0
> >  Missing replicas:              675 (128.327 %)
> >  Number of data-nodes:          2
> >  Number of racks:               1
> >
> >
> > The filesystem under path '/' is HEALTHY
> > Please tell what is wrong.
> >
> > Aseem
> >
> > -----Original Message-----
> > From: Alex Loddengaard [mailto:alex@cloudera.com]
> > Sent: Friday, April 10, 2009 11:04 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: More Replication on dfs
> >
> > Aseem,
> >
> > How are you verifying that blocks are not being replicated?  Have you
> > ran
> > fsck?  *bin/hadoop fsck /*
> >
> > I'd be surprised if replication really wasn't happening.  Can you run
> > fsck
> > and pay attention to "Under-replicated blocks" and "Mis-replicated
> > blocks?"
> > In fact, can you just copy-paste the output of fsck?
> >
> > Alex
> >
> > On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
> > <Aseem.Puri@honeywell.com>wrote:
> >
> >> Hi
> >>        I also tried the command $ bin/hadoop balancer. But still the
> >> same problem.
> >>
> >> Aseem
> >>
> >> -----Original Message-----
> >> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> >> Sent: Friday, April 10, 2009 11:18 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: RE: More Replication on dfs
> >>
> >> Hi Alex,
> >>
> >>        Thanks for sharing your knowledge. Till now I have three
> >> machines and I have to check the behavior of Hadoop so I want
> >> replication factor should be 2. I started my Hadoop server with
> >> replication factor 3. After that I upload 3 files to implement word
> >> count program. But as my all files are stored on one machine and
> >> replicated to other datanodes also, so my map reduce program takes
> > input
> >> from one Datanode only. I want my files to be on different data node
> > so
> >> to check functionality of map reduce properly.
> >>
> >>        Also before starting my Hadoop server again with replication
> >> factor 2 I formatted all Datanodes and deleted all old data manually.
> >>
> >> Please suggest what I should do now.
> >>
> >> Regards,
> >> Aseem Puri
> >>
> >>
> >> -----Original Message-----
> >> From: Mithila Nagendra [mailto:mnagendr@asu.edu]
> >> Sent: Friday, April 10, 2009 10:56 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: More Replication on dfs
> >>
> >> To add to the question, how does one decide what is the optimal
> >> replication
> >> factor for a cluster. For instance what would be the appropriate
> >> replication
> >> factor for a cluster consisting of 5 nodes.
> >> Mithila
> >>
> >> On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <alex@cloudera.com>
> >> wrote:
> >>
> >>> Did you load any files when replication was set to 3?  If so, you'll
> >> have
> >>> to
> >>> rebalance:
> >>>
> >>>
> >
> <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
> >> r>
> >>> <
> >>>
> >
> http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
> >> er
> >>> Note that most people run HDFS with a replication factor of 3.
> > There
> >> have
> >>> been cases when clusters running with a replication of 2 discovered
> >> new
> >>> bugs, because replication is so often set to 3.  That said, if you
> > can
> >> do
> >>> it, it's probably advisable to run with a replication factor of 3
> >> instead
> >>> of
> >>> 2.
> >>>
> >>> Alex
> >>>
> >>> On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
> > <Aseem.Puri@honeywell.com
> >>>> wrote:
> >>>> Hi
> >>>>
> >>>>            I am a new Hadoop user. I have a small cluster with 3
> >>>> Datanodes. In hadoop-site.xml values of dfs.replication property
> > is
> >> 2
> >>>> but then also it is replicating data on 3 machines.
> >>>>
> >>>>
> >>>>
> >>>> Please tell why is it happening?
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Aseem Puri
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message