sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "SATOSHI KONDO (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-360) Duplication of the record in the case of using import mode
Date Tue, 01 Nov 2011 04:10:32 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140898#comment-13140898
] 

SATOSHI KONDO commented on SQOOP-360:
-------------------------------------

I describe the test case.

DDL:
create table test (id int,val1 varchar(100),primary key (id));

Test Data: 
 id | val1 
----+------
  1 | aaa
  2 | bbb
  3 | ccc
  4 | ddd
  5 | eee
  6 | fff
  7 | ggg
  8 | hhh
  9 | iii
 10 | jjj

Sqoop command:
$sqoop import --connect jdbc:postgresql://DBIP:port/dbname --username username --password
password --table test --num-mappers 100 --split-by val1

Log:
11/10/31 15:33:58 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
11/10/31 15:33:59 INFO manager.SqlManager: Using default fetchSize of 1000
11/10/31 15:33:59 INFO tool.CodeGenTool: Beginning code generation
11/10/31 15:33:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "test"
AS t LIMIT 1
11/10/31 15:33:59 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop
11/10/31 15:33:59 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop/hadoop-0.20.2-cdh3u0-core.jar
注:/tmp/sqoop-mapred/compile/f6fc257a0ede3037c1a66880b928b4b2/test.java は推奨されない
API を使用またはオーバーライドしています。
注:詳細については、-Xlint:deprecation オプションを指定して再コンパイルしてください。
11/10/31 15:34:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-mapred/compile/f6fc257a0ede3037c1a66880b928b4b2/test.jar
11/10/31 15:34:00 WARN manager.PostgresqlManager: It looks like you are importing from postgresql.
11/10/31 15:34:00 WARN manager.PostgresqlManager: This transfer can be faster! Use the --direct
11/10/31 15:34:00 WARN manager.PostgresqlManager: option to exercise a postgresql-specific
fast path.
11/10/31 15:34:00 INFO mapreduce.ImportJobBase: Beginning import of test
11/10/31 15:34:02 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("val1"),
MAX("val1") FROM "test"
11/10/31 15:34:02 WARN db.TextSplitter: Generating splits for a textual index column.
11/10/31 15:34:02 WARN db.TextSplitter: If your database sorts in a case-insensitive order,
this may result in a partial import or duplicate records.
11/10/31 15:34:02 WARN db.TextSplitter: You are strongly encouraged to choose an integral
split column.
11/10/31 15:34:02 INFO mapred.JobClient: Running job: job_201110311446_0014
11/10/31 15:34:03 INFO mapred.JobClient:  map 0% reduce 0%
11/10/31 15:34:10 INFO mapred.JobClient:  map 2% reduce 0%
11/10/31 15:34:11 INFO mapred.JobClient:  map 3% reduce 0%
11/10/31 15:34:14 INFO mapred.JobClient:  map 5% reduce 0%
11/10/31 15:34:15 INFO mapred.JobClient:  map 6% reduce 0%
11/10/31 15:34:27 INFO mapred.JobClient:  map 8% reduce 0%
11/10/31 15:34:30 INFO mapred.JobClient:  map 9% reduce 0%
11/10/31 15:34:31 INFO mapred.JobClient:  map 11% reduce 0%
11/10/31 15:34:32 INFO mapred.JobClient:  map 18% reduce 0%
11/10/31 15:34:38 INFO mapred.JobClient:  map 19% reduce 0%
11/10/31 15:34:42 INFO mapred.JobClient:  map 22% reduce 0%
11/10/31 15:34:43 INFO mapred.JobClient:  map 23% reduce 0%
11/10/31 15:34:47 INFO mapred.JobClient:  map 24% reduce 0%
11/10/31 15:34:48 INFO mapred.JobClient:  map 25% reduce 0%
11/10/31 15:34:50 INFO mapred.JobClient:  map 26% reduce 0%
11/10/31 15:34:51 INFO mapred.JobClient:  map 28% reduce 0%
11/10/31 15:34:52 INFO mapred.JobClient:  map 38% reduce 0%
11/10/31 15:34:53 INFO mapred.JobClient:  map 41% reduce 0%
11/10/31 15:34:55 INFO mapred.JobClient:  map 44% reduce 0%
11/10/31 15:34:58 INFO mapred.JobClient:  map 46% reduce 0%
11/10/31 15:35:00 INFO mapred.JobClient:  map 53% reduce 0%
11/10/31 15:35:01 INFO mapred.JobClient:  map 54% reduce 0%
11/10/31 15:35:02 INFO mapred.JobClient:  map 58% reduce 0%
11/10/31 15:35:03 INFO mapred.JobClient:  map 62% reduce 0%
11/10/31 15:35:04 INFO mapred.JobClient:  map 65% reduce 0%
11/10/31 15:35:05 INFO mapred.JobClient:  map 66% reduce 0%
11/10/31 15:35:07 INFO mapred.JobClient:  map 72% reduce 0%
11/10/31 15:35:09 INFO mapred.JobClient:  map 73% reduce 0%
11/10/31 15:35:11 INFO mapred.JobClient:  map 76% reduce 0%
11/10/31 15:35:12 INFO mapred.JobClient:  map 78% reduce 0%
11/10/31 15:35:13 INFO mapred.JobClient:  map 81% reduce 0%
11/10/31 15:35:14 INFO mapred.JobClient:  map 83% reduce 0%
11/10/31 15:35:15 INFO mapred.JobClient:  map 86% reduce 0%
11/10/31 15:35:16 INFO mapred.JobClient:  map 87% reduce 0%
11/10/31 15:35:18 INFO mapred.JobClient:  map 92% reduce 0%
11/10/31 15:35:21 INFO mapred.JobClient:  map 94% reduce 0%
11/10/31 15:35:22 INFO mapred.JobClient:  map 96% reduce 0%
11/10/31 15:35:23 INFO mapred.JobClient:  map 100% reduce 0%
11/10/31 15:35:24 INFO mapred.JobClient: Job complete: job_201110311446_0014
11/10/31 15:35:24 INFO mapred.JobClient: Counters: 12
11/10/31 15:35:24 INFO mapred.JobClient:   Job Counters 
11/10/31 15:35:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3838656
11/10/31 15:35:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after
reserving slots (ms)=0
11/10/31 15:35:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving
slots (ms)=0
11/10/31 15:35:24 INFO mapred.JobClient:     Launched map tasks=100
11/10/31 15:35:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
11/10/31 15:35:24 INFO mapred.JobClient:   FileSystemCounters
11/10/31 15:35:24 INFO mapred.JobClient:     HDFS_BYTES_READ=14641
11/10/31 15:35:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=5936796
11/10/31 15:35:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=79
11/10/31 15:35:24 INFO mapred.JobClient:   Map-Reduce Framework
11/10/31 15:35:24 INFO mapred.JobClient:     Map input records=13
11/10/31 15:35:24 INFO mapred.JobClient:     Spilled Records=0
11/10/31 15:35:24 INFO mapred.JobClient:     Map output records=13
11/10/31 15:35:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=14641
11/10/31 15:35:24 INFO mapreduce.ImportJobBase: Transferred 79 bytes in 83.7106 seconds (0.9437
bytes/sec)
11/10/31 15:35:24 INFO mapreduce.ImportJobBase: Retrieved 13 records.


result:
1,aaa
2,bbb
3,ccc
4,ddd
4,ddd
5,eee
5,eee
6,fff
6,fff
7,ggg
8,hhh
9,iii
10,jjj

                
> Duplication of the record in the case of using import mode 
> -----------------------------------------------------------
>
>                 Key: SQOOP-360
>                 URL: https://issues.apache.org/jira/browse/SQOOP-360
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors
>    Affects Versions: 1.3.0
>            Reporter: SATOSHI KONDO
>
> When I use Import mode of Sqoop,
> I get duplicate records.
> This occurs on condition of the following. 
> 1.Use Import mode of Sqoop
> 2.Set a character type to "split-by" parameter 
> 3.Set a big value to "num-mappers" parameter
> The big value is a relatively large thing as compared with the total number of records.

> For example, 
> when total number of records is 10,
> I set 100 to "num-mappers" parameter.
> I expect to get 10 records.
> But I get 10 or more lines.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message