hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "churro morales (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13578) Add Codec for ZStandard Compression
Date Mon, 12 Dec 2016 21:21:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743180#comment-15743180
] 

churro morales commented on HADOOP-13578:
-----------------------------------------

HI [~jlowe]

Thanks for taking the time to review.  I agree with all of the above comments and will correct
those issues.  The last question you had was related to the ZSTD_endStream().  The endStream()
finishes the frame and writes the epilogue only if the uncompressed buffer has been fully
consumed.  Otherwise it basically does the same thing as ZSTD_compressStream(). 

You are correct, if the output buffer is too small it may not be able to flush.  There is
a check in ZSTD_endStream() which does this: 

{code} 
        size_t const notEnded = ZSTD_compressStream_generic(zcs, ostart, &sizeWritten,
&srcSize, &srcSize, zsf_end);  
        size_t const remainingToFlush = zcs->outBuffContentSize - zcs->outBuffFlushedSize;
        op += sizeWritten;
        if (remainingToFlush) {
            output->pos += sizeWritten;
            return remainingToFlush + ZSTD_BLOCKHEADERSIZE /* final empty block */ + (zcs->checksum
* 4);
        }
       // Create the epilogue and flush the epilogue

{code}

so if there is still data to be consumed the library wont finish the frame, thus making it
safe to call repeatedly with our framework because we never set the finished flag until the
epilogue has been written successfully.  

The code in the CompressorStream.java which calls our codec simply does this:

{code} 
@Override
  public void finish() throws IOException {
    if (!compressor.finished()) {
      compressor.finish();
      while (!compressor.finished()) {
        compress();
      }
    }
  }
{code}

So I believe we wont drop any data with the way things are done.  Please let me know if I
am missing something obvious here :). 




> Add Codec for ZStandard Compression
> -----------------------------------
>
>                 Key: HADOOP-13578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13578
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: churro morales
>            Assignee: churro morales
>         Attachments: HADOOP-13578.patch, HADOOP-13578.v1.patch, HADOOP-13578.v2.patch,
HADOOP-13578.v3.patch, HADOOP-13578.v4.patch, HADOOP-13578.v5.patch, HADOOP-13578.v6.patch
>
>
> ZStandard: https://github.com/facebook/zstd has been used in production for 6 months
by facebook now.  v1.0 was recently released.  Create a codec for this library.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message