hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject problems reading compressed sequencefiles in streaming (0.13.1)
Date Fri, 26 Oct 2007 07:29:58 GMT
I was hoping to use -inputformat SequenceFileAsTextInputFormat to process compressed sequencefiles
in streaming jobs


However, using a python mapper that just echoes out each line as it gets, and numreducetasks=0
- here's what the streaming job output looks like:


SEQ^F org.apache.hadoop.io.IntWritable^Yorg.apache.hadoop.io.Text^A^A'org.apache.hadoop.io.compress.GzipCodec^@^@^@^@Z+r������^F�


So seems like the input file was not treated as sequencefile. 


I must be missing some args - except don't understand what. Help appreciated ..





  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message