hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4379) In HDFS, sync() not yet guarantees data available to the new readers
Date Tue, 27 Jan 2009 01:10:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667535#action_12667535

dhruba borthakur commented on HADOOP-4379:

Hi Doug,

The file length (as returned by getFileStatus) will not change at every write from the client
to the datanode. Similarly, every fsync call from the client does not reach the namenode (only
the first one per block reaches the namenode). That means the namenode has no good way to
know the size of a block when the block is being written to by a writer.

In your case, the writer has died. The namenode has a timeout of 1 hour before it starts lease
recovery for this file. The lease recovery process will set the correct file size on the namenode
metadata. If you do not want to wait for one hour, then you can manually trigger lease recovery
from your application by trying to reopen the file for append(please use FileSystem.append(pathname)).
Lease recovery will update the true length of the file in the namenode metadata.

> In HDFS, sync() not yet guarantees data available to the new readers
> --------------------------------------------------------------------
>                 Key: HADOOP-4379
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4379
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: dhruba borthakur
>             Fix For: 0.19.1
>         Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders3.patch,
Reader.java, Writer.java
> In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc),
it says
> * A reader is guaranteed to be able to read data that was 'flushed' before the reader
opened the file
> However, this feature is not yet implemented.  Note that the operation 'flushed' is now
called "sync".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message