hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saptarshi Guha <saptarshi.g...@gmail.com>
Subject Textoutputformat not outputting all keys in Hadoop 0.20?
Date Sat, 05 Sep 2009 19:22:57 GMT
I'm using the the textoutputformat in mapreduce/lib/output with Hadoop  
0.20 and it appears it is not writing all the keys to the output file  
even though the
the write method in the RecordWriter is recieving them. Let me explain

1) I copied TextOutputFormat  save for some debugging print messages

     public synchronized void write(K key, V value)
       throws IOException {

       boolean nullKey = key == null || key instanceof NullWritable;
       boolean nullValue = value == null || value instanceof  
       if (nullKey && nullValue) {
       if (!nullKey) {
       if (!(nullKey || nullValue)) {
       if (!nullValue) {


I expect 52 keys corresponding to the upper/lower case keys of the  
alphabet.  I get < 52 keys in the output folder, sometimes 44, some  
times, and once even 52.
/However/, the write method above does recieve the missing K,V value  
as evidenced by the log file messages, i.e i see Key=(missing key) and  
Hence for some reason, a) it is not writing,b) writing but not  
flushing/commiting or c) the temporary outputs are getting deleted.
Also if a given reducer has received  e.g 5 keys, i see messages for 5  
keys, of which a few (but not all) are missing.

SequenceFileOutputFormat does not have the same issues(all 52 present)

Any ideas?My bug?
Kind Regards

Version: 0.20.0, r763504
Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
Identifier: 200908281653

Saptarshi Guha | saptarshi.guha@gmail.com | http://www.stat.purdue.edu/~sguha
Kindness is a language which the deaf can hear and the blind can read.
		-- Mark Twain

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message