hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemant Bhanawat <hema...@vmware.com>
Subject Partially written SequenceFile
Date Thu, 04 Jul 2013 09:39:56 GMT


I am working on 2.0.2-alpha version of Hadoop. I am currently writing my key value pairs on
HDFS in a sequence file. I regularly flush my data using hsync() because the process that
is writing to the file can terminate abruptly. My requirement is that once my hsync() is successful,
my data that was written before hsync() should still be available. 

To ensure this, I carried out a test that killed the process (that was writing to a SequenceFile)
after this process did a hsync(). Now when I read the data using "hadoop fs -cat" command,
I can see the data. But the size of file is 0. Also, SequenceFile.Reader.next(key, value)
returns me false. I read somewhere that since file was not closed properly its size was not
updated with the namenode and because of the same reason next() returns false. 

To fix this and to enable reading of file using SequenceFile APIs, I opened the file stream
in append mode and then I closed it immediately. This fixed the size of the file. While doing
this, I retry if I receive RecoveryInProgress or AlreadyBeingCreated exception. Now, I can
successfully read data using SequenceFile.Reader. Following is the code that I am using. 


writer = SequenceFile.createWriter(fs, conf, path, value.getClass(), value.getClass(), CompressionType.NONE);

writer.append(new Text("India"), new Text("Delhi")); 
writer.append(new Text("China"), new Text("Beijing")); 

*** I expect that India and China Should be available but next returns false*** 

*** Code to fix the file size **** 

while (true) { 
try { 
FileSystem fs = FileSystem.get(namenodeURI, conf); 
Path path = new Path( uri); 
FSDataOutputStream open = fs.append(path); 
} catch (Recovery In Progress Exception) { 
} catch (Already Being Created Exception) { 
} catch (Exception) { 


Would it be possible for you to let me know if this approach has any shortcomings or if there
are any other better alternatives available? 

Hemant Bhanawat 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message