hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2699) Store data and checksums together in block file
Date Sun, 18 Dec 2011 00:11:30 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171712#comment-13171712

Scott Carey commented on HDFS-2699:

That brings up a related question:  Why a 4 byte crc per 512 bytes and not per 4096 bytes?

512 aligns with the old hard drive block size, the physical media had ECC at 512 byte blocks
and could not read or write in smaller chunks than that.  New hard drives all have 4096 byte
blocks and ECC at that granularity -- no smaller chunk can be read or written.  SSDs use 4096
or 8192 byte blocks these days.

If the physical media is corrupting blocks, these will most likely be corrupted in 4k chunks.
 A CRC per 4k decreases the checksum overhead by a factor of 8, increasing the likelihood
of finding it in OS cache if it is in a side file.  Now that CRC is accelerated by the processor
and very fast, I don't think the overhead of the larger block CRC for reads smaller than 4k
will matter either.

Inlining the CRC could decrease seek and OS pagecache overhead a lot.  Since most file systems
and OS's work on 4k blocks, HDFS could store a 4 byte crc and 4092 bytes of data into a single
OS / disk page.  (Or, 8 4 byte CRCs and 4064 bytes in a page)  This has big advantages:  If
your data is in the OS pagecache, the crc will be too -- one will never be written to disk
without the other, nor evicted from cache without the other.
> Store data and checksums together in block file
> -----------------------------------------------
>                 Key: HDFS-2699
>                 URL: https://issues.apache.org/jira/browse/HDFS-2699
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
> The current implementation of HDFS stores the data in one block file and the metadata(checksum)
in another block file. This means that every read from HDFS actually consumes two disk iops,
one to the datafile and one to the checksum file. This is a major problem for scaling HBase,
because HBase is usually  bottlenecked on the number of random disk iops that the storage-hardware

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message