hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leonid Furman (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1339) Sqoop full table import job times out when using the split-by attribute
Date Tue, 29 Dec 2009 00:42:29 GMT
Sqoop full table import job times out when using the split-by attribute
-----------------------------------------------------------------------

                 Key: MAPREDUCE-1339
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/sqoop
    Affects Versions: 0.22.0
            Reporter: Leonid Furman
            Priority: Critical
             Fix For: 0.22.0


Problem
------------
When running sqoop command for full table import with split-by attribute specified, as follows:

sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD --table TABLE_NAME
--fields-terminated-by \\0x01 --as-textfile  --warehouse-dir OUTPUT_DIR split-by RECORD_ID

Sqoop is going to transform the split-by attribute to ORDER BY clause and run the following
query in SQL (say, Oracle):

SELECT * FROM TABLE_NAME ORDER BY RECORD_ID

If the table has, for example, 20 million records, the ORDER BY part will increase the query
running significantly, eventually causing time out, and resulting in no output written to
Hadoop file system.

Proposed solution
-------------------------
Not to append the ORDER_BY clause to SQL query if no where clause is specified.

Can there be any issues with this solution?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message