hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage
Date Wed, 08 Jan 2014 03:56:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865040#comment-13865040
] 

Colin Patrick McCabe commented on HDFS-5722:
--------------------------------------------

bq. The design requires putting offset and length in the FSImage, and having compression inside
the file makes things difficult. Therefore this jira proposes to move compression from FSImage
to the higher-level application logic.

I don't see why having compression makes things difficult.  If the software wants to skip
an N byte section did doesn't understand, it just asks the {{CompressedStream}} to skip N
bytes.  The stream takes care of the details of translating that into byte offsets in the
file.  It may be more efficient to do this when compression is not enabled, but that is no
reason to break the configurations of users who do have compression enabled now.

I like the idea of implementing compression in the HTTP server code.  But I don't see why
we need to remove a feature that people are using, the on-disk FSImage compression feature.
 Possibly we should deprecate this feature, since HTTP compression is better for most use
cases.

> Implement compression in the HTTP server of SNN / SBN instead of FSImage
> ------------------------------------------------------------------------
>
>                 Key: HDFS-5722
>                 URL: https://issues.apache.org/jira/browse/HDFS-5722
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>
> The current FSImage format support compression, there is a field in the header which
specifies the compression codec used to compress the data in the image. The main motivation
was to reduce the number of bytes to be transferred between SNN / SBN / NN.
> The main disadvantage, however, is that it requires the client to access the FSImage
in strictly sequential order. This might not fit well with the new design of FSImage. For
example, serializing the data in protobuf allows the client to quickly skip data that it does
not understand. The compression built-in the format, however, complicates the calculation
of offsets and lengths. Recovering from a corrupted, compressed FSImage is also non-trivial
as off-the-shelf tools like bzip2recover is inapplicable.
> This jira proposes to move the compression from the format of the FSImage to the transport
layer, namely, the HTTP server of SNN / SBN. This design simplifies the format of FSImage,
opens up the opportunity to quickly navigate through the FSImage, and eases the process of
recovery. It also retains the benefits of reducing the number of bytes to be transferred across
the wire since there are compression on the transport layer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message