hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-396) Hive performance benchmarks
Date Fri, 19 Jun 2009 19:16:07 GMT

    [ https://issues.apache.org/jira/browse/HIVE-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721942#action_12721942

Zheng Shao commented on HIVE-396:

Q: Why for the first query Hive program is faster than Hadoop app?
A: This is definitely possible in a lot of situations.
This particular case is mainly because Hive's implementation of LIKE is using Text, while
the hadoop app's implementation was using String.find(). We used the hadoop code from the
SIGMOD 2009 paper to allow us to have a consistent comparison.
While it's possible to improve the hadoop code in this particular case, there are cases that
it's very hard to do the same optimization for each and every hadoop application. For example,
the map-side join (HIVE-195) provides much better efficiency for joining a very small table
with any other table, without using reducer. Another case is the object model in Hive is different
from Hadoop - we reuse the same object across different rows. Details of this is in the org.apache.hadoop.hive.serde

> Hive performance benchmarks
> ---------------------------
>                 Key: HIVE-396
>                 URL: https://issues.apache.org/jira/browse/HIVE-396
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>         Attachments: hive_benchmark_2009-06-18.pdf, hive_benchmark_2009-06-18.tar.gz
> We need some performance benchmark to measure and track the performance improvements
of Hive.
> Some references:
> PIG performance benchmarks PIG-200
> PigMix: http://wiki.apache.org/pig/PigMix

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message