hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1203) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key
Date Fri, 13 Nov 2009 02:04:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777331#action_12777331

Aaron Kimball commented on MAPREDUCE-1203:

That's a good point. I have not done a partial interrupt test. It would likely fail.

The current logic performs the updates via {{Statement.executeBatch()}}. This is relatively
poorly-specified, especially in the case of partial failures (implementations "may or may
not" continue processing statements after encountering an error in one of the batched statements).
So I'm not sure executeBatch() is the way to go. What are your thoughts on the following plan?

* Change from executeBatch() to a series of executeUpdate() statements
* Add another flag in DBConfiguration that adjusts the behavior of error handling (ignore
all errors / log errors but continue / fail on error). Default is fail on error.

So the default behavior would still be one transaction/task which fails on SQLException. But
that could be suppressed in the case where you want to bulk inject many records/task, and
don't care if this happens incrementally over multiple task attempts.

I haven't benchmarked it, but it is possible that executeBatch() is higher performance than
a loop around executeUpdate() due to the reduced number of round-trips between the client
and server. So maybe executeUpdate() should be used only in the intermediate-spill case, and
not in the atomic-commit case.


> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable object in value
not in key
> --------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-1203
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1203
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Alexander Schwid
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: HADOOP-4331.patch, patch.txt
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message