hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1017) Compression and output splitting for Sqoop
Date Tue, 22 Sep 2009 00:38:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758087#action_12758087

Aaron Kimball commented on MAPREDUCE-1017:

In addition to the unit tests added in the {{org.apache.hadoop.sqoop.io}} package, I also
performed a larger-scale test of this functionality. A 1.5 GB table was imported from MySQL
to HDFS; the data was highly redundant, so compression shrank the files considerably, as well
improved as the import time. The arguments {{\-z \-\-direct-split-size 25000000}} was given,
so that it would generate approximately 25 MB files. This worked, and three files were generated.
I verified using {{head}} and {{tail}} that the files did not lose any records and that records
did not span multiple files.

I also verified that {{\-\-direct-split-size}} worked without compression, which it does.

> Compression and output splitting for Sqoop
> ------------------------------------------
>                 Key: MAPREDUCE-1017
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1017
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/sqoop
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-1017.patch
> Sqoop "direct mode" writing will generate a single large text file in HDFS. It is important
to be able to compress this data before it reaches HDFS. Due to the difficulty in splitting
compressed files in HDFS for use by MapReduce jobs, data should also be split at compression

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message