hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1470) Rework FSInputChecker and FSOutputSummer to support checksum code sharing between ChecksumFileSystem and block level crc dfs
Date Wed, 27 Jun 2007 08:29:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508473

Sameer Paranjpye commented on HADOOP-1470:

> but we also need to minimize the amount of buffered, checksummed data. It's thus preferable
to keep the outermost, checksummed buffer small ...

Granted, but changing the readChunk() contract to allow multiple continguous chunks to be
read at once doesn't impact buffering. The outermost buffer would still be small. However,
when an application does a large read, the current API requires that the data be fetched piecemeal
into the InputChecker buffer, checksummed and then copied to the application buffer. Allowing
multiple contiguous chunks to be read simultaneously, would allow the application buffer to
be passed directly to readChunk() and checksummed in place and thus save a copy.

This is a small performance gain, however, a more important reason is that constraining readChunk()
to 1 chunk at a time makes it difficult to support preads.

Implementing efficient support for 'sequential preads' i.e. multiple pread invocations hitting
contiguous byte ranges is quite tricky. An implementation has to be pretty clever in order
to efficiently serve several concurrent threads all calling pread. It probably needs to keep
a pool of file descriptors around and have some complicated logic to determine, among other
- if, in a pread invocation, the position on any of its available descriptors is close to
the position in the call, so that buffered data can be used
- if the position in a call isn't a good match for any of the descriptors, then whether to
seek on an existing descriptor or open a new one. Using an existing descriptor might cause
buffered data useful for a subsequent call to be thrown away

On the other hand, an implementation thats not trying support 'sequential preads', can be
a lot simpler. It also needs to keep a pool of descriptors. But in each call it can pick an
arbitrary descriptor, seek, fetch and return the data. It would also do no buffering (except
on 1 canonical descriptor which corresponds to the file position that the implementation reports
to clients).

It's not clear that the 'sequential pread' access pattern occurs frequently in application
contexts. It certainly doesn't in the use cases we've seen so far, they almost all have very
little locality of reference across pread invocations. However, given the readChunk() API
above, an application level pread would most likely be split into multiple readChunk() calls
and require the 'sequential pread' access pattern to be supported. It feels like this access
pattern is being imposed on implementations by the readChunk() API and not by application
contexts and that is undesirable. If the readChunk() API allowed multiple chunks to be read
at once, then an application level pread would likely result in a single call to readChunk()
and would allow for a simple implementation.

> Rework FSInputChecker and FSOutputSummer to support checksum code sharing between ChecksumFileSystem
and block level crc dfs
> ----------------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-1470
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1470
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>         Attachments: GenericChecksum.patch, genericChecksum.patch, InputChecker-01.java
> Comment from Doug in HADOOP-1134:
> I'd prefer it if the CRC code could be shared with CheckSumFileSystem. In particular,
it seems to me that FSInputChecker and FSOutputSummer could be extended to support pluggable
sources and sinks for checksums, respectively, and DFSDataInputStream and DFSDataOutputStream
could use these. Advantages of this are: (a) single implementation of checksum logic to debug
and maintain; (b) keeps checksumming as close to possible to data generation and use. This
patch computes checksums after data has been buffered, and validates them before it is buffered.
We sometimes use large buffers and would like to guard against in-memory errors. The current
checksum code catches a lot of such errors. So we should compute checksums after minimal buffering
(just bytesPerChecksum, ideally) and validate them at the last possible moment (e.g., through
the use of a small final buffer with a larger buffer behind it). I do not think this will
significantly affect performance, and data integrity is a high priority. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message