hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Hardin <char...@namimedia.com>
Subject UNION ALL query behaving strangely; WHERE CLAUSE is also not being honored
Date Thu, 12 Sep 2013 17:00:41 GMT
Please bear with me, because this is a pretty large query.

TL;DR:  I'm doing a UNION ALL on a bunch of subqueries.  The subqueries return no results,
but the overall query does.  Sadly, the constraints on the subqueries are not being honored
in the UNION ALL.

I'm executing the following:

SELECT lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id,
       SUM(entries.clicks), SUM(entries.conversions),
       SUM(entries.cost), SUM(entries.feed_calls),
       SUM(entries.impressions), SUM(entries.revenue),
       SUM(entries.used_listings)
FROM
  (
    SELECT
      used_listing.advertiserid AS advertiser_id,
      -1 AS keyword,
      used_listing.adgroupid AS adgroup_id,
      used_listing.adid AS ad_id,
      used_listing.feedid AS feed_id,
      0 AS clicks,
      used_listing.campaignid AS campaign_id,
      concat(used_listing.publisherid, "_", used_listing.sourceid, "_", used_listing.subid)
AS subid,
      0 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      used_listing.publisherid AS publisher_id,
      0 AS cost,
      used_listing.subid AS sub_id,
      used_listing.sourceid AS source_id,
      used_listing.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      used_listing.networkid AS network_id,
      05 AS lax_day,
      1 AS used_listings,
      used_listing.ul_datetime AS start_timestamp,
      -1 AS bid_id FROM used_listing
WHERE used_listing.ul_datetime >= 1378364400
AND   used_listing.ul_datetime < 1378450800
AND used_listing.networkid='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      click.click.advertiser_id AS advertiser_id,
      click.click.keyword AS keyword,
      click.click.adgroup_id AS adgroup_id,
      click.click.ad_id AS ad_id,
      click.feed_id AS feed_id,
      1 AS clicks,
      click.click.campaign_id AS campaign_id,
      click.click.subid AS subid,
      0 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      click.click.publisher_id AS publisher_id,
      click.click.publisher_cost AS cost,
      click.click.sub_id AS sub_id,
      click.click.source_id AS source_id,
      click.click.timezone AS timezone,
      09 AS lax_month,
      click.click.click_cost AS revenue,
      click.click.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      click.start_timestamp AS start_timestamp,
      click.click.bid_id AS bid_id FROM click
WHERE click.start_timestamp >= 1378364400
AND   click.start_timestamp < 1378450800
AND click.click.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      result.ad.advertiser_id AS advertiser_id,
      result.bid.text AS keyword,
      result.ad.adgroup_id AS adgroup_id,
      result.ad.ad_id AS ad_id,
      result.ad.feed_id AS feed_id,
      0 AS clicks,
      result.ad.campaign_id AS campaign_id,
      concat(result.ad.publisher_id, "_", result.ad.source_id, "_", result.ad.sub_id) AS subid,
      0 AS conversions,
      0 AS feed_calls,
      1 AS impressions,
      2013 AS lax_year,
      result.ad.publisher_id AS publisher_id,
      0 AS cost,
      result.ad.sub_id AS sub_id,
      result.ad.source_id AS source_id,
      result.ad.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      result.ad.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      result.ad.start_timestamp AS start_timestamp,
      result.bid.bid_id AS bid_id FROM impression
WHERE result.ad.start_timestamp >= 1378364400
AND   result.ad.start_timestamp < 1378450800
AND result.ad.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

    UNION ALL
    SELECT
      conversion.conv_data.advertiser_id AS advertiser_id,
      conversion.conv_data.keyword AS keyword,
      conversion.conv_data.adgroup_id AS adgroup_id,
      conversion.conv_data.ad_id AS ad_id,
      conversion.conv_data.feed_id AS feed_id,
      0 AS clicks,
      conversion.conv_data.campaign_id AS campaign_id,
      conversion.conv_data.subid AS subid,
      1 AS conversions,
      0 AS feed_calls,
      0 AS impressions,
      2013 AS lax_year,
      conversion.conv_data.publisher_id AS publisher_id,
      0 AS cost,
      conversion.conv_data.sub_id AS sub_id,
      conversion.conv_data.source_id AS source_id,
      conversion.conv_data.timezone AS timezone,
      09 AS lax_month,
      0 AS revenue,
      conversion.conv_data.network_id AS network_id,
      05 AS lax_day,
      0 AS used_listings,
      conversion.start_timestamp AS start_timestamp,
      -1 AS bid_id FROM conversion
WHERE conversion.start_timestamp >= 1378364400
AND   conversion.start_timestamp < 1378450800
AND conversion.conv_data.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

)
entries
GROUP BY lax_year,lax_month,lax_day,network_id,advertiser_id,publisher_id

Got all that?  Good!

The first strange thing is that the overall query returns results, yet I get no results for
any of the subqueries on which I'm performing a UNION ALL.

The second strange thing is that the overall query does not honor the constraint on network_id
at the very least, and probably isn't honoring the constraints on start_timestamp either (I
haven't checked).

I tried putting the constraint on the output of the UNION ALL query:

WHERE entries.start_timestamp >= 1378364400
AND   entries.start_timestamp < 1378450800
AND entries.network_id='e6ac40577bcc157a00f7f2d4f5bf30cc0bca6120'

That didn't work either.  Same behavior.

Can anyone help me make this work?
Mime
View raw message