hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-89) files are not visible until they are closed
Date Tue, 28 Aug 2007 06:59:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523170

dhruba borthakur commented on HADOOP-89:

1. Contents of new blocks that are appended to a file is visible to clients as soon as the
datanode reports the block to the namenode. This means that data is visible to clients even
when the block metadata is not yet persisted on disk. This approach lets us avoid a fs-transaction
into the edit log for every new block allocation.

2. The block allocation for a file is persisted in the edit log when the file is closed.

3. A new API FSDataOutputStream.sync() allows an application to make data persistent on disk
even before the file is closed. The invocation of this API causes a transaction to be logged
into the edits log to record the blocks that are currently allocated to the file. An application
that is recording data to a log file will periodically invoke this API to ensure that the
contents of the log file persist even if the application dies before closing the file.

4. The FsShell utility has a new command that is invoked as "bin/hadoop dfs -tail [-f] <filename>".
When the "-f" option is used, the FsShell utility will periodically poll for changes to the
filesize. When a filesize change is detected, it will re-open the file and will display the
new contents that were added to the file.

> files are not visible until they are closed
> -------------------------------------------
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message