From mapreduce-issues-return-13262-apmail-hadoop-mapreduce-issues-archive=hadoop.apache.org@hadoop.apache.org Mon May 03 18:44:23 2010 Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 80296 invoked from network); 3 May 2010 18:44:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 18:44:22 -0000 Received: (qmail 72241 invoked by uid 500); 3 May 2010 18:44:22 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 72210 invoked by uid 500); 3 May 2010 18:44:22 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 72202 invoked by uid 99); 3 May 2010 18:44:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 18:44:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 18:44:19 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o43IhvAW000118 for ; Mon, 3 May 2010 18:43:57 GMT Message-ID: <3954086.19231272912237918.JavaMail.jira@thor> Date: Mon, 3 May 2010 14:43:57 -0400 (EDT) From: "Aaron Kimball (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Resolved: (MAPREDUCE-1449) Sqoop Documentation about --split-by column has to be unique key seems to be wrong In-Reply-To: <1939206602.17571265238807918.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball resolved MAPREDUCE-1449. -------------------------------------- Resolution: Won't Fix Sqoop has been removed from MapReduce; issue moved to http://github.com/cloudera/sqoop/issues#issue/2 > Sqoop Documentation about --split-by column has to be unique key seems to be wrong > ---------------------------------------------------------------------------------- > > Key: MAPREDUCE-1449 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1449 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/sqoop > Affects Versions: 0.20.1 > Reporter: mingran wang > > http://archive.cloudera.com/docs/sqoo... > The document above shows that " To guarantee correctness of your input, you must select an ordering column for which each row has a unique value. If duplicate values appear in the ordering column, the results of the import are undefined, and Sqoop will not be able to detect the error." > I read the source code for sqoop, it seems that the column to split by doesn't have to be a unique key. Plus, when the primary key is a composite key, the sqoop code only takes the first column of the composite key which in most cases is not unique key anyways. > I also checked the output when non-unique key is used to split, there is nothing wrong with the result. > I am wondering if the document is wrong, or there is some hidden trickiness that I am not aware of. > I am using sqoop 20.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.