hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4331) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key
Date Mon, 06 Oct 2008 15:57:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637120#action_12637120
] 

Enis Soztutar commented on HADOOP-4331:
---------------------------------------

I am not convinced that further splitting the batch in reduces is the right way. It is better
to add all the values in the reduce once to keep atomicity. If some error occurs in the transaction,
none of the records in the reduce should be inserted, otherwise when the reduce is restarted,
some of the records might be duplicated. 

Is there a specific performance/driver-related reason to add batch sizes? 

> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable object in value
not in key
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4331
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4331
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Alexander Schwid
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: patch.txt
>
>
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message