hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Long <br...@dotspots.com>
Subject How to flush SequenceFile.Writer?
Date Thu, 29 Jan 2009 23:17:43 GMT
I have a SequenceFile.Writer that I obtained via SequenceFile.createWriter
and write to using append(key, value). Because the writer volume is low,
it's not uncommon for it to take over a day for my appends to finally be
flushed to HDFS (e.g. the new file will sit at 0 bytes for over a day).
Because I am running map/reduce tasks on this data multiple times a day, I
want to "flush" the sequence file so the mapred jobs can pick it up when
they run.
What's the right way to do this? I'm assuming it's a fairly common use
case. Also -- are writes to the sequence files atomic? (e.g. if I am
actively appending to a sequence file, is it always safe to read from that
same file in a mapred job?)

To be clear, I want the flushing to be time based (controlled explicitly by
the app), not size based. Will this create waste in HDFS somehow?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message