hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Setty <>
Subject RE: Set difference in Hive
Date Mon, 29 Jun 2009 23:42:36 GMT
Thanks very much. But the reducer hangs with the warning WARN org.apache.hadoop.hive.ql.exec.JoinOperator:
table 0 has more than joinEmitInterval rows for join key []

Both the tables are large and as Zheng mentions at,
large size for table 0 is a problem. Is there any way to overcome this?



From: Peter Skomoroch []
Sent: Monday, June 29, 2009 4:20 PM
Subject: Re: Set difference in Hive

Here is an example of what Amr mentioned from one of my Hive scripts, returns the set of pages
not in "daily_pagecounts_table"

select dt.page_id, dt.dates, dt.pageviews, dt.total_pageviews
FROM daily_timelines dt LEFT OUTER JOIN daily_pagecounts_table dp ON (dt.page_id = dp.page_id)
where dp.page_id is NULL
On Mon, Jun 29, 2009 at 7:14 PM, Amr Awadallah <<>>

do an outer join on user and filter on name.user is null

-- amr

Rakesh Setty wrote:


            I am new to Hive. I would like to know what is the easiest way to get the difference
between two sets. For example, how can I convert the following SQL query to Hive?

select user from page_views where user not in (select name from users);



Peter N. Skomoroch

View raw message