hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy.had...@gmail.com>
Subject Re: Number of retries
Date Thu, 22 Mar 2012 20:00:05 GMT
Hi Mohit
     To add on, duplicates won't be there if your output is written to a hdfs file. Because
if one attempt of a task is completed only that output file is copied to the final output
destn and the files generated by other task attempts that are killed are just ignored.

Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: "Bejoy KS" <bejoy.hadoop@gmail.com>
Date: Thu, 22 Mar 2012 19:55:55 
To: <common-user@hadoop.apache.org>
Reply-To: bejoy.hadoop@gmail.com
Subject: Re: Number of retries

      If you are writing to a db from a job in an atomic way, this would pop up. You can avoid
this only by disabling speculative execution. 
Drilling down from web UI to a task level would get you the tasks where multiple attempts
were there.

------Original Message------
From: Mohit Anchlia
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Number of retries
Sent: Mar 23, 2012 01:21

I am seeing wierd problem where I am seeing duplicate rows in the database.
I am wondering if this is because of some internal retries that might be
causing this. Is there a way to look at which tasks were retried? I am not
sure what else might cause because when I look at the output data I don't
see any duplicates in the file.

Bejoy KS

Sent from handheld, please excuse typos.
View raw message