hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7845) Compress block reports
Date Mon, 02 Mar 2015 22:17:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343879#comment-14343879

Todd Lipcon commented on HDFS-7845:

Agreed we should run various simulations before writing the code in Hadoop. My guess (from
intuition only, not testing) is that the BLOSC/LZ compression will actually result in better
results than doing vint encoding. You would run the BLOSC/LZ on the raw int arrays, not the
vint-encoded ones, and I'd guess it's actually faster to do BLOSC/LZ compress/decompress compared
to PB-style vints. The former is very SIMD-able whereas the latter requires a branch per byte
so inhibits good processor pipelining.

If you were to use BLOSC to compress the [block, gs, length] tuples, you'd set "typesize=24"
instead of "typesize=8". Can anyone gather a text dump of all blockid/genstamp/size data from
a large production DN that we could run experiments with?

> Compress block reports
> ----------------------
>                 Key: HDFS-7845
>                 URL: https://issues.apache.org/jira/browse/HDFS-7845
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Charles Lamb
> We should optionally compress block reports using a low-cpu codec such as lz4 or snappy.

This message was sent by Atlassian JIRA

View raw message