hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Ganelin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12990) lz4 incompatibility between OS and Hadoop
Date Thu, 19 Jan 2017 20:49:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830582#comment-15830582
] 

Ilya Ganelin commented on HADOOP-12990:
---------------------------------------

[~jzhuge] My proposed approach is to create a new file based loosely on hadoop/io/compress/Lz4Codec.java
reproducing the byte structure analagous to your 4/3 hack. Does that seem reasonable?  

If my goal is ultimately to use this in something like Spark, if the version of Hadoop we're
using is patched with the appropriate class, where would I add additional logic to switch
between the two codecs (Lz4Codec vs. Lz4FrameCodec)?  



> lz4 incompatibility between OS and Hadoop
> -----------------------------------------
>
>                 Key: HADOOP-12990
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12990
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io, native
>    Affects Versions: 2.6.0
>            Reporter: John Zhuge
>            Priority: Minor
>
> {{hdfs dfs -text}} hit exception when trying to view the compression file created by
Linux lz4 tool.
> The Hadoop version has HADOOP-11184 "update lz4 to r123", thus it is using LZ4 library
in release r123.
> Linux lz4 version:
> {code}
> $ /tmp/lz4 -h 2>&1 | head -1
> *** LZ4 Compression CLI 64-bits r123, by Yann Collet (Apr  1 2016) ***
> {code}
> Test steps:
> {code}
> $ cat 10rows.txt
> 001|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 002|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 003|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 004|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 005|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 006|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 007|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 008|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 009|c1|c2|c3|c4|c5|c6|c7|c8|c9
> 010|c1|c2|c3|c4|c5|c6|c7|c8|c9
> $ /tmp/lz4 10rows.txt 10rows.txt.r123.lz4
> Compressed 310 bytes into 105 bytes ==> 33.87%
> $ hdfs dfs -put 10rows.txt.r123.lz4 /tmp
> $ hdfs dfs -text /tmp/10rows.txt.r123.lz4
> 16/04/01 08:19:07 INFO compress.CodecPool: Got brand-new decompressor [.lz4]
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>     at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
>     at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>     at java.io.InputStream.read(InputStream.java:101)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
>     at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:106)
>     at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:101)
>     at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>     at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>     at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>     at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>     at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>     at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>     at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>     at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message