hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Connell, Chuck" <>
Subject RE: mapper is slower than hive' mapper
Date Wed, 01 Aug 2012 14:35:32 GMT
This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for
regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself.
Compilers know the tricks of their target language.

Chuck Connell
Nuance R&D Data Team
Burlington, MA

-----Original Message-----
From: Yue Guan [] 
Sent: Wednesday, August 01, 2012 10:29 AM
Subject: mapper is slower than hive' mapper

Hi, there

I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's
mapper. The Hive query is like:

select sum(column1) from table group by column2, column3;

My mapreduce program likes this:

     public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey, DoubleWritable>

         public void map(BytesWritable key, Text value, Context context) throws IOException,
InterruptedException {
                 String[] sLine = StringUtils.split(value.toString(),
             context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));


I assume hive is doing something similar. Is there any trick in hive to speed this thing up?
Thank you!


View raw message