hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1502) Sqoop should run mysqldump in a mapper as opposed to a user-side process
Date Fri, 26 Feb 2010 17:55:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838963#action_12838963
] 

Aaron Kimball commented on MAPREDUCE-1502:
------------------------------------------

To be clear, the JDBC-based import mechanisms available in Sqoop  have always been accessing
non-distributed resources from within map tasks. This just puts mysqldump on an equal footing,
eliminates an extra machine (the client) from the main transfer path in the network, and allows
mysqldump to take advantage of Hadoop's ability to monitor and restart long-running processes
that get interrupted.

Sqoop provides users with explicit parallelism control; it will default to using 4 mappers,
and allows users to select a different number of tasks with the {{\-\-num-mappers}} argument.

> Sqoop should run mysqldump in a mapper as opposed to a user-side process
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1502
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1502
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1502.patch
>
>
> Sqoop currently runs mysqldump ("direct import mode") in the local user process with
a single thread. Better system performance and reliability could be achieved by running this
in a parallel set of mapper tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message