hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3177) Expose DFSOutputStream.fsync API though the FileSystem interface
Date Tue, 03 Jun 2008 13:26:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601922#action_12601922

Tom White commented on HADOOP-3177:

bq. I think it should be java.io.FileOutputStream since we are doing FileSystem.

But FileOutputStream is tied to Java's File abstraction which isn't general enough for Hadoop
FileSystems. Furthermore FileOutputStream#getFD is final, as is FileDescriptor, so we can't
use it here.

How about an interface:

public interface Syncable {
  void sync() throws IOException;

(Or should it be "Synchable"?) Then make DFSOutputStream implement Syncable, so FSDataOutputStream
- which is also a Syncable - can see if it can call sync() on the underlying stream.

What are the semantics of sync()? I think the expectation is that after sync returns the system
has successfully sync'ed buffers to disk. So if this is not true, sync() should throw an exception.
This is what java.io.FileDescriptor does. Using a subclass of IOException (java.io.SyncFailedException?)
would make this easier for callers. I realize that this description is at odds with the current
contract for DFSOutputStream#fsync, which doesn't guarantee that the data has been flushed
to persistent storage, but I wondered whether DFSOutputStream could be strengthened to make
this guarantee? 

If the FileSystem doesn't support sync then do we get an exception when calling sync(), or
is it a no op?

> Expose DFSOutputStream.fsync API though the FileSystem interface
> ----------------------------------------------------------------
>                 Key: HADOOP-3177
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3177
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
> In the current code, there is a DFSOutputStream.fsync() API that allows a client to flush
all buffered data to the datanodes and also persist block locations on the namenode. This
API should be exposed through the generic API in the org.hadoop.fs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message