hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarus, Nathan" <jar...@amazon.com>
Subject RE: DBOutputWriter timing out writing to database
Date Fri, 03 Aug 2012 19:50:51 GMT
Thanks for the alternatives, but I'd ideally like to do all this inside the MR job itself as
I want to be able to programmatically run it regularly, and any additional steps just add

Looking through sample code on Google, I never see anybody using the Progressable passed in
to the output format, and pretty much every time someone has a problem with job timeouts they're
just told to increase the timeout. This seems to me like curing the symptoms but not the actual
problem. Does the progressable actually do anything?

From: Sonal Goyal [mailto:sonalgoyal4@gmail.com]
Sent: Thursday, August 02, 2012 10:35 PM
To: Jarus, Nathan
Subject: Re: DBOutputWriter timing out writing to database

Hi Nathan,

I saw your question on the mailing list. If your target database is MySQL, HIHO at https://github.com/sonalgoyal/hiho
is an open source tool which provides a highly optimized write operation to the db. The tool
is open source, please feel free to try out and let me know if you see any issues.

Best Regards,
Crux: Reporting for HBase<https://github.com/sonalgoyal/crux>
Nube Technologies<http://www.nubetech.co>

On Fri, Aug 3, 2012 at 12:34 AM, Jarus, Nathan <jarusn@amazon.com<mailto:jarusn@amazon.com>>

I'm running Hadoop 0.20.205 and am using the DBOutputFormat to write to a database. For small
datasets, my jobs work perfectly, but for larger jobs, writing to the database takes longer
than 600 seconds and Hadoop times out my reduce tasks. Looking at the source for DBOutputFormat,
it seems the Progressable never gets updated while the insert query is being run. How do I
modify/subclass DBOutputFormat to update this so my jobs can finish?


View raw message