hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5967) Sqoop should only use a single map task
Date Thu, 04 Jun 2009 02:24:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716131#action_12716131
] 

Scott Carey commented on HADOOP-5967:
-------------------------------------

Some databases optimize multiple queries doing sequential scans on the same table at the same
time by having them 'tag along' with the same sequential scan (Postgres, at least) which avoids
the O( N^2 ) issue.  But LIMIT ... OFFSET is not guaranteed to return distinct, consistent
partitions unless it has an ORDER BY clause and is in the same transaction anyway.

> Sqoop should only use a single map task
> ---------------------------------------
>
>                 Key: HADOOP-5967
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5967
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>            Priority: Minor
>         Attachments: single-mapper.patch
>
>
> The current DBInputFormat implementation uses SELECT ... LIMIT ... OFFSET statements
to read from a database table. This actually results in several queries all accessing the
same table at the same time. Most database implementations will actually use a full table
scan for each such query, starting at row 1 and scanning down until the OFFSET is reached
before emitting data to the client. The upshot of this is that we see O(n^2) performance in
the size of the table when using a large number of mappers, when a single mapper would read
through the table in O(n) time in the number of rows.
> This patch sets the number of map tasks to 1 in the MapReduce job sqoop launches.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message