hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7845) Compress block reports
Date Mon, 02 Mar 2015 21:36:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343805#comment-14343805

Colin Patrick McCabe commented on HDFS-7845:

As [~arpitagarwal] pointed out, we're not dealing with a series of ints, but with a series
of protobuf vints (variable length ints).  [~clamb] did some tests with a block report and
got around 50% (if I'm remembering correctly?)  [~clamb], can you comment on whether those
tests were done with vints or regular integers?

We should probably make sure we're doing the compression test with what we're actually sending,
which is going to be a 3-tuple of [ block_id, genstamp, length ], all encoded as protobuf
vints.  Sorting is an interesting idea, but I wonder if the effectiveness diminishes when
you interleave the 3 numbers?  Of course we could separate them, but then our L1 / L2 cache
hit rates plummet when actually processing the blocks.

> Compress block reports
> ----------------------
>                 Key: HDFS-7845
>                 URL: https://issues.apache.org/jira/browse/HDFS-7845
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Charles Lamb
> We should optionally compress block reports using a low-cpu codec such as lz4 or snappy.

This message was sent by Atlassian JIRA

View raw message