hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors
Date Fri, 02 Feb 2018 22:33:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351016#comment-16351016
] 

Sergey Shelukhin commented on HADOOP-15171:
-------------------------------------------

Update: turns out, end() was a red herring after all - any reuse of the same object without
calling reset causes the issue.
Given that the object does not support the zlib library model of repeatedly calling inflate
with more data, it basically never makes sense to call decompress without calling reset.
Perhaps the call should be built in? I cannot find whether zlib itself actually requires one
to reset (at least, for the continuous decompression case, it doesn't look like it's the case),
so perhaps cleanup could be improved too.
At any rate, error handling should be fixed to not return 0.

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles
some zlib errors
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15171
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15171
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 3.1.0
>            Reporter: Sergey Shelukhin
>            Assignee: Lokesh Jain
>            Priority: Blocker
>             Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for a particular
compressed segment of the file. We narrowed it down to Hadoop native ZLIB codec; when the
data is copied to heap-based buffer and the JDK Inflater is used, it produces correct output.
Input is only 127 bytes so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by the same
code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 (1515637158315_0079_1_00_000000_0)]
encoded.EncodedReaderImpl: Decompressing 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_000000_0)]
encoded.EncodedReaderImpl: The codec has produced 0 bytes for 127 bytes at pos 0, data hash
1719565039: [e3 92 e1 62 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f
6e 74 73 04 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa
15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 01 ae fd
d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 c8 f2 c3 82 02 0f 96
0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_000000_0)]
encoded.EncodedReaderImpl: Fell back to JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. Not sure
how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message