hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-907) Sqoop should use more intelligent splits
Date Mon, 14 Sep 2009 16:59:57 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Kimball updated MAPREDUCE-907:
------------------------------------

    Attachment: MAPREDUCE-907.patch

Attaching a new patch which makes use of data-driven splits from MAPREDUCE-885. This allows
most databases to properly scan separate ranges of a table in parallel leading to much better
performance.

Some notable changes:

* The {{\-\-order-by}} parameter has been renamed to {{\-\-split-by}}. Entries are no longer
strictly ordered, eliminating a database scalability chokepoint.
** TestOrderBy has been renamed to TestSplitBy
* With data-driven splits, multiple mappers make sense again. This adds a {{\-\-num-mappers}}
/ {{\-m}} parameter to control the degree of parallelism in reading.
* DataDrivenDBInputFormat is currently incompatible with Oracle. Oracle still uses the old
DBInputFormat-based import path.

> Sqoop should use more intelligent splits
> ----------------------------------------
>
>                 Key: MAPREDUCE-907
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-907
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-907.patch
>
>
> Sqoop should use the new split generation / InputFormat in MAPREDUCE-885

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message