hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naga Vijay <nvti...@gmail.com>
Subject Re: Enhancing Query Join to speed up Query
Date Fri, 14 Jun 2013 04:35:17 GMT
Hi,

Here are the test results for the two query options shown in my initial
email -

Test Criteria - Both tables have 20 million rows each, 20 partitions, 16
buckets for item_id

Query Option 1 - Took 25 minutes, 5 seconds
Query Option 2 - Took 24 minutes, 42 seconds

Thanks for all the responses!

Naga


On Thu, Jun 13, 2013 at 6:42 PM, Navis류승우 <navis.ryu@nexr.com> wrote:

> You can use "explain" for confirming differences. For inner joins, it
> would make the same plan.
>
> 2013/6/14 Igor Tatarinov <igor@decide.com>:
> > I would expect no difference because of predicate pushdown.
> >
> > igor
> > decide.com
> >
> >
> > On Thu, Jun 13, 2013 at 11:31 AM, Naga Vijay <nvtibco@gmail.com> wrote:
> >>
> >> Sure, Will do
> >>
> >>
> >> On Thu, Jun 13, 2013 at 10:42 AM, Stephen Sprague <spragues@gmail.com>
> >> wrote:
> >>>
> >>> Hi naja,
> >>> test those two versions (or three now) and report back to the group.
>  :)
> >>> even if some smarty-pants thinks he knows the answer its always good to
> >>> confirm things are as they should be.
> >>>
> >>>
> >>> On Wed, Jun 12, 2013 at 11:54 PM, Sanjay Subramanian
> >>> <Sanjay.Subramanian@wizecommerce.com> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I would actually do it like this…so that the set on the left of JOIN
> >>>> becomes smaller
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM
> >>>>      ( SELECT
> >>>>                     item_id, create_dt
> >>>>       FROM
> >>>>                     A
> >>>>       WHERE
> >>>>                    item_id = 'I001'
> >>>>        AND
> >>>>                   category_name = 'C001'
> >>>>       )  a
> >>>> JOIN
> >>>>          b
> >>>> ON
> >>>>         a.item_id = b.item_id
> >>>> ;
> >>>>
> >>>>
> >>>> From: Naga Vijay <nvtibco@gmail.com>
> >>>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> >>>> Date: Wednesday, June 12, 2013 9:17 PM
> >>>> To: "user@hive.apache.org" <user@hive.apache.org>
> >>>> Subject: Enhancing Query Join to speed up Query
> >>>>
> >>>> Hi,
> >>>>
> >>>> Which of the two query options is better?
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id)
> >>>> WHERE  a.item_id = 'I001'
> >>>> AND    a.category_name = 'C001';
> >>>>
> >>>> - or -
> >>>>
> >>>> SELECT a.item_id, a.create_dt
> >>>> FROM   a JOIN b
> >>>> ON     (a.item_id = b.item_id AND a.item_id = 'I001')
> >>>> WHERE  a.category_name = 'C001';
> >>>>
> >>>> Thanks
> >>>> Naga
> >>>>
> >>>> CONFIDENTIALITY NOTICE
> >>>> ======================
> >>>> This email message and any attachments are for the exclusive use of
> the
> >>>> intended recipient(s) and may contain confidential and privileged
> >>>> information. Any unauthorized review, use, disclosure or distribution
> is
> >>>> prohibited. If you are not the intended recipient, please contact the
> sender
> >>>> by reply email and destroy all copies of the original message along
> with any
> >>>> attachments, from your computer system. If you are the intended
> recipient,
> >>>> please be advised that the content of this message is subject to
> access,
> >>>> review and disclosure by the sender's Email System Administrator.
> >>>
> >>>
> >>
> >
>

Mime
View raw message