hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9802) Support Snappy codec on Windows.
Date Wed, 31 Jul 2013 21:53:48 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated HADOOP-9802:
----------------------------------

    Attachment: HADOOP-9802-branch-1-win.1.patch

This work started on branch-1-win, so I'm attaching the patch for that.  I'll provide a trunk
patch soon too.  Here is a summary of the changes:
# Update the runtime library path used in hadoop.cmd so that snappy.dll can be loaded from
lib/native if the build bundled snappy into the distro.
# build.xml changes to call javah on Windows.
# Visual Studio project file changes to compile the C code.
# Windows-specific dynamic library loading code.
# Minor changes to C code to guarantee correct calling convention and move a few variable
declarations to the top of the function, because MSVC doesn't support C99.

Assuming you have Snappy itself deployed to C:\snappy, here is the easiest way to test it:

{code}
ant clean test-core -Dwindows=true -Dsnappy.prefix=C:\snappy -Dtestcase=TestCodec
{code}

I also successfully tested creating a distro with snappy bundled:

{code}
ant clean tar -Dwindows=true -Dforrest.home=C:\apache-forrest-0.9 -Dbundle.snappy=true -Dsnappy.prefix=C:\snappy
{code}

Then, I used that distro to test running a wordcount MR job that compresses its output:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar
wordcount -D mapred.output.compress=true -D mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
/input /output
{code}

Then, I ran a grep MR job using the snappy-compressed file as input to verify that the codec
could decompress successfully:

{code}
hadoop-1.3.0-SNAPSHOT\bin\hadoop.cmd jar hadoop-1.3.0-SNAPSHOT\hadoop-examples-1.3.0-SNAPSHOT.jar
grep /output/part* /grepout Apache
{code}

(My input file was our LICENSE.txt file, which is why I grepped for "Apache" in my test.)

Big thanks to [~chuanliu] who started a lot of this work.

                
> Support Snappy codec on Windows.
> --------------------------------
>
>                 Key: HADOOP-9802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9802
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.0.0, 1-win, 2.1.1-beta
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-9802-branch-1-win.1.patch
>
>
> Build and test the existing Snappy codec on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message