hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillaume Perrot <gper...@ubikod.com>
Subject Read sequence files that are being written
Date Thu, 17 Jan 2013 11:36:34 GMT
Hi everyone,

I am using Hadoop 1.0.3.

I write logs to an Hadoop sequence file into HDFS, I call syncFS() after
each bunch of logs but I never close the file (except when I am performing
daily rolling).

What I want to guarantee is that the file is available to readers while the
file is still being written.

I can read the bytes of the sequence file via FSDataInputStream, but if I
try to use SequenceFile.Reader.next(key,val), it returns false at the first
call.

I know the data is in the file since I can read it with FSDataInputStream
or with the cat command and I am 100% sure that syncFS() is called.

I checked the namenode and datanode logs, no error or warning. fsck shows
no corruption.

Why SequenceFile.Reader is unable to read my currently being written file ?

Mime
View raw message