Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 7482 invoked from network); 6 Jan 2010 21:43:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Jan 2010 21:43:17 -0000 Received: (qmail 42559 invoked by uid 500); 6 Jan 2010 21:43:17 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 42481 invoked by uid 500); 6 Jan 2010 21:43:17 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 42470 invoked by uid 99); 6 Jan 2010 21:43:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 21:43:17 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jan 2010 21:43:16 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A1074234C4B4 for ; Wed, 6 Jan 2010 13:42:56 -0800 (PST) Message-ID: <695586626.80151262814176658.JavaMail.jira@brutus.apache.org> Date: Wed, 6 Jan 2010 21:42:56 +0000 (UTC) From: "Aaron Kimball (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1339) Sqoop full table import job times out when using the split-by attribute In-Reply-To: <1165935364.1262047349450.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797324#action_12797324 ] Aaron Kimball commented on MAPREDUCE-1339: ------------------------------------------ Leonid, Which database are you connecting to? Most databases do not use ORDER BY any more. Oracle does because DataDrivenDBInputFormat doesn't support Oracle (yet). This is a known problem with importing from Oracle. > Sqoop full table import job times out when using the split-by attribute > ----------------------------------------------------------------------- > > Key: MAPREDUCE-1339 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop > Affects Versions: 0.22.0 > Reporter: Leonid Furman > Priority: Critical > Fix For: 0.22.0 > > > Problem > ------------ > When running sqoop command for full table import with split-by attribute specified, as follows: > sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile --warehouse-dir OUTPUT_DIR split-by RECORD_ID > Sqoop is going to transform the split-by attribute to ORDER BY clause and run the following query in SQL (say, Oracle): > SELECT * FROM TABLE_NAME ORDER BY RECORD_ID > If the table has, for example, 20 million records, the ORDER BY part will increase the query running significantly, eventually causing time out, and resulting in no output written to Hadoop file system. > Proposed solution > ------------------------- > Not to append the ORDER_BY clause to SQL query if no where clause is specified. > Can there be any issues with this solution? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.