sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan Carlos Araya (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3471) While doing sqoop-export mapper progress goes back causing duplicated data
Date Fri, 22 May 2020 13:45:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114049#comment-17114049
] 

Juan Carlos Araya commented on SQOOP-3471:
------------------------------------------

I am seeing the same issue, if a mapper fails, it start from 0% again, this causing duplicates.
In this case I tried twice, I was expecting 271Million records the first time, and endup with
384M records, second time again I was expecting 271M records and is going on 284M records.. 

> While doing sqoop-export mapper progress goes back causing duplicated data
> --------------------------------------------------------------------------
>
>                 Key: SQOOP-3471
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3471
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Ruben Agudo
>            Priority: Major
>         Attachments: image-2020-04-21-10-36-15-108.png
>
>
> We are running the sqoop-export tool in Qubole, to export some data from S3 back to an
SQL Server Database.
> Our issue is that sometimes, one of the mappers of the mapping part seem that fail/restart
or something. basically we see the progress going back like in the following image:
> !image-2020-04-21-10-36-15-108.png!
> This is causing duplicates in our destination table. I'm a bit lost because in the documentation
it says that *"If an export map task fails due to these or other reasons, it will cause the
export job to fail."* and this is not the behaviour we are seeing.
> Unfortunately we can't duplicate it in a consistent manner.
> The command that we are running is:
> sqoop export 
>  -Dsqoop.export.records.per.statement=50000 
>  -Dsqoop.export.statements.per.transaction=100 
>  -Dsqoop.throwOnError=1 
>  --connection-manager org.apache.sqoop.manager.SQLServerManager 
>  --driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
>  --connect connectionString 
>  --table config.table 
>  --export-dir config.source
>  --input-fields-terminated-by ,
>  --num-mappers 8
>  --columns theColumnsToCopy
>  --batch
>  --schema theSchema
> I removed the things that I can't add for privacy reasons.
> And the table we want to export contains 237,371,726 records.
> What could be the cause of the mapper going back in progress? And, if that happens, is
it possible to make the sqoop export fail?
> Also, if this isn't the correct channel for this, please let me know.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message