hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Guan <pipeha...@gmail.com>
Subject mapper is slower than hive' mapper
Date Wed, 01 Aug 2012 14:28:55 GMT
Hi, there

I'm writing mapreduce to replace some hive query and I find that my 
mapper is slow than hive's mapper. The Hive query is like:

select sum(column1) from table group by column2, column3;

My mapreduce program likes this:

     public static class HiveTableMapper extends Mapper<BytesWritable, 
Text, MyKey, DoubleWritable> {

         public void map(BytesWritable key, Text value, Context context) 
throws IOException, InterruptedException {
                 String[] sLine = StringUtils.split(value.toString(), 
StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
             context.write(new MyKey(Integer.parseInt(sLine[0]), 
sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
         }

     }

I assume hive is doing something similar. Is there any trick in hive to 
speed this thing up? Thank you!

Best,


Mime
View raw message