hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Guan <pipeha...@gmail.com>
Subject Re: mapper is slower than hive' mapper
Date Wed, 01 Aug 2012 15:11:00 GMT
Hive don't use Writable?!!. Could you please give me a pointer to hive 
code to see how they do the job?

I check the map output record. I find this:
my case:
total mapper input record: 23091348
total mapper output record: 23091348
avg mapper output bytes/record: 34.819994
total combiner output record: 27298
hive:
total mapper input record: 23091348
total mapper output record: 13164
avg mapper output bytes/record: 36.199407
total combiner output record: 0

Hive actually do reduce in mapper? How does that work?



On 08/01/2012 10:41 AM, Bertrand Dechoux wrote:
> One hint would be to reduce the number of writable instances you need.
> Create the object once and reuse it.
> By the way, Hive do not use Writable. ;)
>
> Bertrand
>
> On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck 
> <Chuck.Connell@nuance.com <mailto:Chuck.Connell@nuance.com>> wrote:
>
>     This is actually not surprising. Hive is essentially a MapReduce
>     compiler. It is common for regular compilers (C, C#, Fortran) to
>     emit faster assembler code than you write yourself. Compilers know
>     the tricks of their target language.
>
>     Chuck Connell
>     Nuance R&D Data Team
>     Burlington, MA
>
>
>     -----Original Message-----
>     From: Yue Guan [mailto:pipehappy@gmail.com
>     <mailto:pipehappy@gmail.com>]
>     Sent: Wednesday, August 01, 2012 10:29 AM
>     To: user@hive.apache.org <mailto:user@hive.apache.org>
>     Subject: mapper is slower than hive' mapper
>
>     Hi, there
>
>     I'm writing mapreduce to replace some hive query and I find that
>     my mapper is slow than hive's mapper. The Hive query is like:
>
>     select sum(column1) from table group by column2, column3;
>
>     My mapreduce program likes this:
>
>          public static class HiveTableMapper extends
>     Mapper<BytesWritable, Text, MyKey, DoubleWritable> {
>
>              public void map(BytesWritable key, Text value, Context
>     context) throws IOException, InterruptedException {
>                      String[] sLine = StringUtils.split(value.toString(),
>     StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>                  context.write(new MyKey(Integer.parseInt(sLine[0]),
>     sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
>              }
>
>          }
>
>     I assume hive is doing something similar. Is there any trick in
>     hive to speed this thing up? Thank you!
>
>     Best,
>
>
>
>
> -- 
> Bertrand Dechoux


Mime
View raw message