hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yin Huai <huaiyin....@gmail.com>
Subject Re: UNION ALL query behaving strangely; WHERE CLAUSE is also not being honored
Date Fri, 13 Sep 2013 00:27:20 GMT
Hi,

Can you also attach the query plan (the result of EXPLAIN)? It may help to
find where is the problem.

Thanks,

Yin


On Thu, Sep 12, 2013 at 1:00 PM, Chuck Hardin <chardin@namimedia.com> wrote:

> Please bear with me, because this is a pretty large query.
>
> TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries
> return no results, but the overall query does.  Sadly, the constraints on
> the subqueries are not being honored in the UNION ALL.
>
> I'm executing the following:
>
> SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
>        SUM(entries.clicks), SUM(entries.conversions),
>        SUM(entries.cost), SUM(entries.feed_calls),
>        SUM(entries.impressions), SUM(entries.revenue),
>        SUM(entries.used_listings)
> FROM
>   (
>     SELECT
>       used_listing.advertiserid AS advertiser_id,
>       -1 AS keyword,
>       used_listing.adgroupid AS adgroup_id,
>       used_listing.adid AS ad_id,
>       used_listing.feedid AS feed_id,
>       0 AS clicks,
>       used_listing.campaignid AS campaign_id,
>       concat(used_listing.publisherid, "_", used_listing.sourceid, "_",
> used_listing.subid) AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       used_listing.publisherid AS publisher_id,
>       0 AS cost,
>       used_listing.subid AS sub_id,
>       used_listing.sourceid AS source_id,
>       used_listing.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       used_listing.networkid AS network_id,
>       05 AS lax_day,
>       1 AS used_listings,
>       used_listing.ul_datetime AS start_timestamp,
>       -1 AS bid_id FROM used_listing
> WHERE used_listing.ul_datetime >= 1378364400
> AND   used_listing.ul_datetime < 1378450800
> AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       click.click.advertiser_id AS advertiser_id,
>       click.click.keyword AS keyword,
>       click.click.adgroup_id AS adgroup_id,
>       click.click.ad_id AS ad_id,
>       click.feed_id AS feed_id,
>       1 AS clicks,
>       click.click.campaign_id AS campaign_id,
>       click.click.subid AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       click.click.publisher_id AS publisher_id,
>       click.click.publisher_cost AS cost,
>       click.click.sub_id AS sub_id,
>       click.click.source_id AS source_id,
>       click.click.timezone AS timezone,
>       09 AS lax_month,
>       click.click.click_cost AS revenue,
>       click.click.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       click.start_timestamp AS start_timestamp,
>       click.click.bid_id AS bid_id FROM click
> WHERE click.start_timestamp >= 1378364400
> AND   click.start_timestamp < 1378450800
> AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       result.ad.advertiser_id AS advertiser_id,
>       result.bid.text AS keyword,
>       result.ad.adgroup_id AS adgroup_id,
>       result.ad.ad_id AS ad_id,
>       result.ad.feed_id AS feed_id,
>       0 AS clicks,
>       result.ad.campaign_id AS campaign_id,
>       concat(result.ad.publisher_id, "_", result.ad.source_id, "_",
> result.ad.sub_id) AS subid,
>       0 AS conversions,
>       0 AS feed_calls,
>       1 AS impressions,
>       2013 AS lax_year,
>       result.ad.publisher_id AS publisher_id,
>       0 AS cost,
>       result.ad.sub_id AS sub_id,
>       result.ad.source_id AS source_id,
>       result.ad.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       result.ad.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       result.ad.start_timestamp AS start_timestamp,
>       result.bid.bid_id AS bid_id FROM impression
> WHERE result.ad.start_timestamp >= 1378364400
> AND   result.ad.start_timestamp < 1378450800
> AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
>     UNION ALL
>     SELECT
>       conversion.conv_data.advertiser_id AS advertiser_id,
>       conversion.conv_data.keyword AS keyword,
>       conversion.conv_data.adgroup_id AS adgroup_id,
>       conversion.conv_data.ad_id AS ad_id,
>       conversion.conv_data.feed_id AS feed_id,
>       0 AS clicks,
>       conversion.conv_data.campaign_id AS campaign_id,
>       conversion.conv_data.subid AS subid,
>       1 AS conversions,
>       0 AS feed_calls,
>       0 AS impressions,
>       2013 AS lax_year,
>       conversion.conv_data.publisher_id AS publisher_id,
>       0 AS cost,
>       conversion.conv_data.sub_id AS sub_id,
>       conversion.conv_data.source_id AS source_id,
>       conversion.conv_data.timezone AS timezone,
>       09 AS lax_month,
>       0 AS revenue,
>       conversion.conv_data.network_id AS network_id,
>       05 AS lax_day,
>       0 AS used_listings,
>       conversion.start_timestamp AS start_timestamp,
>       -1 AS bid_id FROM conversion
> WHERE conversion.start_timestamp >= 1378364400
> AND   conversion.start_timestamp < 1378450800
> AND
> conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> )
> entries
> GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id
>
> Got all that?  Good!
>
> The first strange thing is that the overall query returns results, yet I
> get no results for any of the subqueries on which I'm performing a UNION
> ALL.
>
> The second strange thing is that the overall query does not honor the
> constraint on network_id at the very least, and probably isn't honoring the
> constraints on start_timestamp either (I haven't checked).
>
> I tried putting the constraint on the output of the UNION ALL query:
>
> WHERE entries.start_timestamp >= 1378364400
> AND   entries.start_timestamp < 1378450800
> AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'
>
> That didn't work either.  Same behavior.
>
> Can anyone help me make this work?

Mime
View raw message