hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Miller <someb...@squareplanet.de>
Subject Re: SequenceFile as map input
Date Thu, 08 Jul 2010 20:59:31 GMT
Hi Alex,

I'm not sure what you mean. I already set my mapper's signature to:
   public class MyMapper extends Mapper<Object, BytesWritable, Text, Text> {
      ...
      public void map(Text key, BytesWritable value, Context context)
      }
    }

In my map() loop the contents of value is the text from the original file
and the value.toString() returns a String of bytes as hex pairs 
separated by space.
But I'd like the original tab separated list of strings (i.e. the lines 
in my original files).

I see BytesWritable.getBytes() returns a byte[]. I guess I could write 
my own
RecordReader to convert the byte[] back to text strings but I thought 
this is
something the framework would provide.

Alan

On 07/08/2010 08:42 PM, Alex Loddengaard wrote:
> Hi Alan,
>
> SequenceFiles keep track of the key and value type, so you should be 
> able to use the Writables in the signature.  Though it looks like 
> you're using the new API, and I admit that I'm not an expert with the 
> new API.  Have you tried using the Writables in the signature?
>
> Alex
>
> On Thu, Jul 8, 2010 at 6:44 AM, Some Body <somebody@squareplanet.de 
> <mailto:somebody@squareplanet.de>> wrote:
>
>     To get around the small-file-problem (I have thousands of 2MB log
>     files) I wrote
>     a class to convert all my log files into a single SequenceFile in
>     (Text key,  BytesWritable value) format.  That works fine. I can
>     run this:
>
>        hadoop fs -text /my.seq |grep peemt114.log | head -1
>        10/07/08 15:02:10 INFO util.NativeCodeLoader: Loaded the
>     native-hadoop library
>        10/07/08 15:02:10 INFO zlib.ZlibFactory: Successfully loaded &
>     initialized native-zlib library
>        10/07/08 15:02:10 INFO compress.CodecPool: Got brand-new
>     decompressor
>        peemt114.log    70 65 65 6d 74 31 31 34 09 .........[snip].......
>
>     which shows my file name key (peemt114.log)
>     and file contents value which appears to be converted to hex.
>     The hex values up to the first tab (09)  translate to my hostname.
>
>     I'm trying to adapt my mapper to use the SequenceFile as input.
>
>     I  changed the job's inputFormatClass to:
>        MyJob.setInputFormatClass(SequenceFileInputFormat.class);
>     and modified my mapper signature to:
>       public class MyMapper extends Mapper<Object, BytesWritable,
>     Text, Text> {
>
>     but how do I convert the value back to Text? When I print out the
>     key,values using:
>            System.out.printf("MAPPER INKEY: [%s]\n", key);
>            System.out.printf("MAPPER INVAL: [%s]\n", value.toString());
>     I get::
>        MAPPER INKEY: [peemt114.log]
>        MAPPER INVAL: [70 65 65 6d 74 31 31 34 09 .....[snip]......]
>
>     Alan
>
>


Mime
View raw message