hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuntao Jia (JIRA)" <>
Subject [jira] Updated: (HIVE-600) Running TPC-H queries on Hive
Date Tue, 11 Aug 2009 22:52:14 GMT


Yuntao Jia updated HIVE-600:

    Attachment: TPC-H_on_Hive_2009-08-11.tar.gz

Attached the report of running TPC-H Benchmark on Hive, together with the package that is
necessary to reproduce the benchmark. 

Please note that we only considered the twenty two queries in the TPC-H benchmark but not
the two refresh functions due to limited time. That will also be part of the future work.

Basically, Hive supports all the TPC-H queries even though rewriting of some queries is required.
We also set up Hive on an eleven node cluster and performed the benchmark. In this particular
configuration, the Price/Performance metric of the Hive system is 84.34. 

Please see the official TPC-H benchmark specification for the details of the benchmark and


> Running TPC-H queries on Hive
> -----------------------------
>                 Key: HIVE-600
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Yuntao Jia
>            Assignee: Yuntao Jia
>         Attachments: TPC-H_on_Hive_2009-08-11.pdf, TPC-H_on_Hive_2009-08-11.tar.gz
> The goal is to run all TPC-H ( benchmark queries on Hive for
two reasons. First, through those queries, we would like to find the new features that we
need to put into Hive so that Hive supports common SQL queries. Second, we would like to measure
the performance of Hive to find out what Hive is not good at. We can then improve Hive based
on those information. 
> For queries that are not supported now in Hive, I will try to rewrite them to one or
more Hive-supported queries. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message