That setting will instruct future file writes to replicate two-fold. This
has no bearing on existing files; replication can be set on a per-file
basis, so they already have their replications set in the DFS indivdually.
Use the command: bin/hadoop fs -setrep [-R] repl_factor filename...
to change the replication factor for files already in HDFS
- Aaron
On Wed, Apr 15, 2009 at 10:04 PM, Puri, Aseem <Aseem.Puri@honeywell.com>wrote:
> Hi
> My problem is not that my data is under replicated. I have 3
> data nodes. In my hadoop-site.xml I also set the configuration as:
>
> <property>
> <name>dfs.replication</name>
> <value>2</value>
> </property>
>
> But after this also data is replicated on 3 nodes instead of two nodes.
>
> Now, please tell what can be the problem?
>
> Thanks & Regards
> Aseem Puri
>
> -----Original Message-----
> From: Raghu Angadi [mailto:rangadi@yahoo-inc.com]
> Sent: Wednesday, April 15, 2009 2:58 AM
> To: core-user@hadoop.apache.org
> Subject: Re: More Replication on dfs
>
> Aseem,
>
> Regd over-replication, it is mostly app related issue as Alex mentioned.
>
> But if you are concerned about under-replicated blocks in fsck output :
>
> These blocks should not stay under-replicated if you have enough nodes
> and enough space on them (check NameNode webui).
>
> Try grep-ing for one of the blocks in NameNode log (and datnode logs as
> well, since you have just 3 nodes).
>
> Raghu.
>
> Puri, Aseem wrote:
> > Alex,
> >
> > Ouput of $ bin/hadoop fsck / command after running HBase data insert
> > command in a table is:
> >
> > .....
> > .....
> > .....
> > .....
> > .....
> > /hbase/test/903188508/tags/info/4897652949308499876: Under replicated
> > blk_-5193
> > 695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/data: Under
> > replicated
> > blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/index: Under
> > replicated
> > blk_3934493034551838567_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /user/HadoopAdmin/hbase table.doc: Under replicated
> > blk_4339521803948458144_103
> > 1. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/bin.doc: Under replicated
> > blk_-3661765932004150973_1030
> > . Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file01.txt: Under replicated
> > blk_2744169131466786624_10
> > 01. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file02.txt: Under replicated
> > blk_2021956984317789924_10
> > 02. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/test.txt: Under replicated
> > blk_-3062256167060082648_100
> > 4. Target Replicas is 3 but found 2 replica(s).
> > ...
> > /user/HadoopAdmin/output/part-00000: Under replicated
> > blk_8908973033976428484_1
> > 010. Target Replicas is 3 but found 2 replica(s).
> > Status: HEALTHY
> > Total size: 48510226 B
> > Total dirs: 492
> > Total files: 439 (Files currently being written: 2)
> > Total blocks (validated): 401 (avg. block size 120973 B) (Total
> > open file
> > blocks (not validated): 2)
> > Minimally replicated blocks: 401 (100.0 %)
> > Over-replicated blocks: 0 (0.0 %)
> > Under-replicated blocks: 399 (99.50124 %)
> > Mis-replicated blocks: 0 (0.0 %)
> > Default replication factor: 2
> > Average block replication: 1.3117207
> > Corrupt blocks: 0
> > Missing replicas: 675 (128.327 %)
> > Number of data-nodes: 2
> > Number of racks: 1
> >
> >
> > The filesystem under path '/' is HEALTHY
> > Please tell what is wrong.
> >
> > Aseem
> >
> > -----Original Message-----
> > From: Alex Loddengaard [mailto:alex@cloudera.com]
> > Sent: Friday, April 10, 2009 11:04 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: More Replication on dfs
> >
> > Aseem,
> >
> > How are you verifying that blocks are not being replicated? Have you
> > ran
> > fsck? *bin/hadoop fsck /*
> >
> > I'd be surprised if replication really wasn't happening. Can you run
> > fsck
> > and pay attention to "Under-replicated blocks" and "Mis-replicated
> > blocks?"
> > In fact, can you just copy-paste the output of fsck?
> >
> > Alex
> >
> > On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
> > <Aseem.Puri@honeywell.com>wrote:
> >
> >> Hi
> >> I also tried the command $ bin/hadoop balancer. But still the
> >> same problem.
> >>
> >> Aseem
> >>
> >> -----Original Message-----
> >> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> >> Sent: Friday, April 10, 2009 11:18 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: RE: More Replication on dfs
> >>
> >> Hi Alex,
> >>
> >> Thanks for sharing your knowledge. Till now I have three
> >> machines and I have to check the behavior of Hadoop so I want
> >> replication factor should be 2. I started my Hadoop server with
> >> replication factor 3. After that I upload 3 files to implement word
> >> count program. But as my all files are stored on one machine and
> >> replicated to other datanodes also, so my map reduce program takes
> > input
> >> from one Datanode only. I want my files to be on different data node
> > so
> >> to check functionality of map reduce properly.
> >>
> >> Also before starting my Hadoop server again with replication
> >> factor 2 I formatted all Datanodes and deleted all old data manually.
> >>
> >> Please suggest what I should do now.
> >>
> >> Regards,
> >> Aseem Puri
> >>
> >>
> >> -----Original Message-----
> >> From: Mithila Nagendra [mailto:mnagendr@asu.edu]
> >> Sent: Friday, April 10, 2009 10:56 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: More Replication on dfs
> >>
> >> To add to the question, how does one decide what is the optimal
> >> replication
> >> factor for a cluster. For instance what would be the appropriate
> >> replication
> >> factor for a cluster consisting of 5 nodes.
> >> Mithila
> >>
> >> On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <alex@cloudera.com>
> >> wrote:
> >>
> >>> Did you load any files when replication was set to 3? If so, you'll
> >> have
> >>> to
> >>> rebalance:
> >>>
> >>>
> >
> <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
> >> r>
> >>> <
> >>>
> >
> http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
> >> er
> >>> Note that most people run HDFS with a replication factor of 3.
> > There
> >> have
> >>> been cases when clusters running with a replication of 2 discovered
> >> new
> >>> bugs, because replication is so often set to 3. That said, if you
> > can
> >> do
> >>> it, it's probably advisable to run with a replication factor of 3
> >> instead
> >>> of
> >>> 2.
> >>>
> >>> Alex
> >>>
> >>> On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
> > <Aseem.Puri@honeywell.com
> >>>> wrote:
> >>>> Hi
> >>>>
> >>>> I am a new Hadoop user. I have a small cluster with 3
> >>>> Datanodes. In hadoop-site.xml values of dfs.replication property
> > is
> >> 2
> >>>> but then also it is replicating data on 3 machines.
> >>>>
> >>>>
> >>>>
> >>>> Please tell why is it happening?
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Aseem Puri
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
>
>
|