hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: mapper is slower than hive' mapper
Date Wed, 01 Aug 2012 15:13:33 GMT
As mentioned, if you avoid using new, by re-using objects and possibly
use buffer objects you may be able to match or beat the speed. But in
the general case the hive saves you time by allowing you not to worry
about low level details like this.

On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck
<Chuck.Connell@nuance.com> wrote:
> This is actually not surprising. Hive is essentially a MapReduce compiler. It is common
for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself.
Compilers know the tricks of their target language.
>
> Chuck Connell
> Nuance R&D Data Team
> Burlington, MA
>
>
> -----Original Message-----
> From: Yue Guan [mailto:pipehappy@gmail.com]
> Sent: Wednesday, August 01, 2012 10:29 AM
> To: user@hive.apache.org
> Subject: mapper is slower than hive' mapper
>
> Hi, there
>
> I'm writing mapreduce to replace some hive query and I find that my mapper is slow than
hive's mapper. The Hive query is like:
>
> select sum(column1) from table group by column2, column3;
>
> My mapreduce program likes this:
>
>      public static class HiveTableMapper extends Mapper<BytesWritable, Text, MyKey,
DoubleWritable> {
>
>          public void map(BytesWritable key, Text value, Context context) throws IOException,
InterruptedException {
>                  String[] sLine = StringUtils.split(value.toString(),
> StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR);
>              context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2])));
>          }
>
>      }
>
> I assume hive is doing something similar. Is there any trick in hive to speed this thing
up? Thank you!
>
> Best,
>

Mime
View raw message