hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Judd (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4379) In HDFS, sync() not yet guarantees data available to the new readers
Date Tue, 27 Jan 2009 07:10:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667602#action_12667602

Doug Judd commented on HADOOP-4379:

Hi Dhruba,

I tried your suggestion, but got the following exception when trying to open the file with
the 'append' method:

SEVERE: I/O exception while getting length of file '/hypertable/servers/'
- org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hypertable/servers/
for DFSClient_2003773208 on client, because this file is already being created
by DFSClient_423127459 on
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1088)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1177)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:321)
	at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

I could re-write the whole thing to not be dependent on knowing log length, however, it seems
like it ought to be possible to obtain the actual file length in this situation.  The semantics
of getFileStatus() seem a little odd.  Sometimes it returns the actual length of the file
and sometimes it returns a stale version of the length.  I suppose this is ok as long as it
is well documented.  But it should be possible to obtain the actual length of a file.  Would
it be possible to add a FileSystem::length(Path path) method that returns the accurate file
length by fetching the size of the last block from the primary datanode?

- Doug

> In HDFS, sync() not yet guarantees data available to the new readers
> --------------------------------------------------------------------
>                 Key: HADOOP-4379
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4379
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: dhruba borthakur
>             Fix For: 0.19.1
>         Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders3.patch,
fsyncConcurrentReaders4.patch, Reader.java, Reader.java, Writer.java, Writer.java
> In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc),
it says
> * A reader is guaranteed to be able to read data that was 'flushed' before the reader
opened the file
> However, this feature is not yet implemented.  Note that the operation 'flushed' is now
called "sync".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message