hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
Date Tue, 13 Dec 2016 21:38:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746346#comment-15746346

Jason Lowe commented on HADOOP-13578:

The thing I'm worried about is that when we call ZSTD_compressStream we are passing descriptors
for both the input buffer and the output buffer.  When we call ZSTD_endStream we are only
passing the descriptor for the output buffer.  Therefore I don't know how ZSTD_endStream is
supposed to finish consuming any input that ZSTD_compressStream didn't get to if it doesn't
have access to that input buffer descriptor.

Looking at the zstd code you'll see that when it does call ZSTD_compressStream inside ZSTD_endStream,
it's calling it with srcSize == 0.  That means there is no more source to consume.  So if
the last call of the JNI code to ZSTD_compressStream did not fully consume the input buffer's
data (i.e.: input pos is not moved to the end of the data) then it looks like calling ZSTD_endStream
will simply flush out what input data did make it and then end the frame.  That matches what
the documentation for ZSTD_endStream says.  So I still think we need to make sure we do not
call ZSTD_endStream if input.pos is not at the end of the input buffer after we call ZSTD_compressStream,
or we risk losing the last chunk of data if the zstd library for some reason cannot fully
consume the input buffer when we try to finish.

> Add Codec for ZStandard Compression
> -----------------------------------
>                 Key: HADOOP-13578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13578
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, HADOOP-13578.v2.patch,
HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 months
by facebook now.  v1.0 was recently released.  Create a codec for this library.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message