hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4331) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key
Date Tue, 10 Nov 2009 21:37:28 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated HADOOP-4331:
----------------------------------

    Attachment: HADOOP-4331.patch

I would like to request that this issue be reopened. For doing exports from HDFS into a database,
the reducer is not always necessary. In a map-only job, the mapper tasks can write directly
to the database, saving significant effort over needing to run a shuffle/reduce step. But
some map tasks may be very large (e.g., when reading from gzipped files) which expand to 1MM
or more records per task. 

In this case, the user should be allowed to specify that a potential lack of atomicity is
allowed. (In most use cases, database users enforce that redundant rows are not entered via
primary keys or other uniqueness constraints in the database itself anyway.)

Attaching a new patch sync'd to mapreduce trunk; by default, this disables intermediate spills
to the db (spill size=0); but this allows you to set the spill size to another number of records,
instead.

> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable object in value
not in key
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4331
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4331
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Alexander Schwid
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: HADOOP-4331.patch, patch.txt
>
>
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message