hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: replicating existing blocks?
Date Thu, 19 May 2011 17:54:44 GMT
Hi,
The replication config values state the number of copies of each block that will eventually
exist. So, 1 means each block will exist on only 1 node (no redundancy); 3 (the default) means
that each block will exist on at least 3 nodes.

BTW: the job file is copied to HDFS and HDFS takes care of the replication.


Friso



On 19 mei 2011, at 18:44, Steve Cohen wrote:

> One last question about these replication values. If dfs.replication
> and mapred.submit.replication are set to 1, does that mean they get
> copied one time so there are two dfs blocks and two job files or does
> it mean there is one dfs block and one job file?
> 
> Thanks,
> Steve Cohen
> 
> On Thu, May 19, 2011 at 2:43 AM, Friso van Vollenhoven
> <fvanvollenhoven@xebia.com> wrote:
>> I believe it's this:
>> 
>> <property>
>>  <name>mapred.submit.replication</name>
>>  <value>10</value>
>>  <description>The replication level for submitted job files.  This
>>  should be around the square root of the number of nodes.
>>  </description>
>> </property>
>> 
>> You can set it per job in the job specific conf and/or in mapred-site.xml.
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 19 mei 2011, at 03:42, Steve Cohen wrote:
>> 
>>> Where is the default replication factor on job files set? Is it different then
the dfs.replication setting in hdfs-site.xml?
>>> 
>>> Sent from my iPad
>>> 
>>> On May 18, 2011, at 9:10 PM, Joey Echeverria <joey@cloudera.com> wrote:
>>> 
>>>> Did you run a map reduce job?
>>>> 
>>>> I think the default replication factor on job files is 10, which
>>>> obviously doesn't work well on a psuedo-distributed cluster.
>>>> 
>>>> -Joey
>>>> 
>>>> On Wed, May 18, 2011 at 5:07 PM, Steve Cohen <mail4steve@gmail.com>
wrote:
>>>>> Thanks for the answer. Earlier, I asked about why I get occasional not
replicated yet errors. Now, I had dfs.replication set to one. What replication could it have
been doing? Did the error messages actually mean that the file couldn't get created in the
cluster?
>>>>> 
>>>>> Thanks,
>>>>> Steve Cohen
>>>>> 
>>>>> 
>>>>> 
>>>>> On May 18, 2011, at 6:39 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>>>> 
>>>>>> Tried to send this, but apparently SpamAssassin finds emails about
>>>>>> "replicas" to be spammy. This time with less rich text :)
>>>>>> 
>>>>>> On Wed, May 18, 2011 at 3:35 PM, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>>> 
>>>>>>> Hi Steve,
>>>>>>> Running setrep will indeed change those files. Changing "dfs.replication"
just changes the default replication value for files created in the future. Replication level
is a file-specific property.
>>>>>>> Thanks
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On Wed, May 18, 2011 at 3:32 PM, Steve Cohen <mail4steve@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>> Say I add a datanode to a pseudo cluster and I want to change
the
>>>>>>>> replication factor to 2. I see that I can either run hadoop
fs -setrep
>>>>>>>> or change the hdfs-site.xml value for dfs.replication. But
do either
>>>>>>>> of these cause the existing blocks to replicate?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Steve Cohen
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joseph Echeverria
>>>> Cloudera, Inc.
>>>> 443.305.9434
>> 
>> 


Mime
View raw message