hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5157) File length not reported correctly after application crash
Date Wed, 04 Feb 2009 10:19:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670298#action_12670298

dhruba borthakur commented on HADOOP-5157:

The staleness corrects itself if either another writer opens the file for "appending" to it
or the hard limit of 1 hour (the lease recovery period) expires. But I agree that your proposal
is better. +1

It introduces additional latency for the getFileSatus() call, but if we do this only for files
that have a lease on it (i.e. a writer was writing to this file), then it should be ok.

Additionally, the current getFileStatus() call does not retrieve block location information
from the namenode. This has to be enhanced to return the location of at least the last block
of a file.

> File length not reported correctly after application crash
> ----------------------------------------------------------
>                 Key: HADOOP-5157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5157
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Doug Judd
>             Fix For: 0.20.0
> Our application (Hypertable) creates a transaction log in HDFS.  This log is written
with the following pattern:
> out_stream.write(header, 0, 7);
> out_stream.sync()
> out_stream.write(data, 0, amount);
> out_stream.sync()
> [...]
> However, if the application crashes and then comes back up again, the following statement
> length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
> returns the wrong length.  Apparently this is because this method fetches length information
from the NameNode which is stale.  Ideally, a call to getFileStatus() would return the accurate
file length by fetching the size of the last block from the primary datanode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message