sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-721) Duplicating rows on export when exporting from compressed files.
Date Fri, 30 Nov 2012 03:15:59 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507060#comment-13507060

Cheolsoo Park commented on SQOOP-721:


I diff'ed {{CombineFileInputFormat.java}} from Sqoop and Hadoop-2.0.x and confirmed that there
is one change as follows:
<     return codec instanceof SplittableCompressionCodec;
>     // Once we remove support for Hadoop < 2.0
>     //return codec instanceof SplittableCompressionCodec;
>     return false;
As far as I understand, the only impact of this difference is that the compressed files won't
be split even though they're splitable, which doesn't have any impact on correctness while
it does on performance.

I didn't run any tests with this patch, but given that the patch is identical to what's committed
in MAPREDUCE-1597, I think that it is fine. Please let me know if anyone has any concerns.

> Duplicating rows on export when exporting from compressed files.
> ----------------------------------------------------------------
>                 Key: SQOOP-721
>                 URL: https://issues.apache.org/jira/browse/SQOOP-721
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Blocker
>         Attachments: bugSQOOP-721.patch, bugSQOOP-721.patch
> It appears that in some situations export will duplicate rows. It seems that this behavior
is happening when user is exporting compressed files that are "big enough".

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message