hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Lamb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
Date Thu, 26 Feb 2015 17:45:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338768#comment-14338768

Charles Lamb commented on HDFS-7435:


This looks really good. I like the new approach and your current patch does a pre-emptive
strike on several of the comments I was going to make on the .002 patch.

I really only have nits.

The patch needs to be rebased. There was one .rej when I applied it (obviously I worked past
that for my review).


BlockListAsLongs(Collection) needs an @param for the javadoc.

In #BlockListAsLongs(Collection), the ReplicaState is being written as a varint64. I realize
it's a varint, but since it's really only a single byte in the implementation, it seems a
little heavy handed to write it to the cos as a varint64. I also realize that it will need
to be a long on the way back out for the uc long[]. If you don't want to change it from being
a varint64 in the cos, then perhaps just add a comment saying that you know it's a byte (actually
int) in the impl but for consistency you're using a varint64?

Since you throw UnsupportedOperationException from multiple #remove methods, you might want
to add the class name to the message. e.g. "Sorry. remove not implemented for BlockReportListIterator".
Along a similar vein, would it be appropriate to add a message to BlockReportReplica.getVisibleLength,
getStorageUuid, and isOnTransientStorage's UnsupportedOperationException?


getBlockReports has one line that exceeds the 80 char width.


the import of NameNodeLayoutVersion is unused.

the decl of useBlocksBuffer busts the 80 char width.


import ByteString is unused.


in #getBlockReports, the line under case RUR busts the 80 char limit.


Perhaps s/Protobuf optimized/Optimized protobuf/


Thanks for fixing the formatting in here.


blocks.add(... busts the 80 char limit.

> PB encoding of block reports is very inefficient
> ------------------------------------------------
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.patch,
> Block reports are encoded as a PB repeating long.  Repeating fields use an {{ArrayList}}
with default capacity of 10.  A block report containing tens or hundreds of thousand of longs
(3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times.
 Also, decoding repeating fields will box the primitive longs which must then be unboxed.

This message was sent by Atlassian JIRA

View raw message