hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Boisvert <a...@bizo.com>
Subject Re: Help with last 30 day unique user query
Date Fri, 15 Oct 2010 17:51:13 GMT
As far as I know, Hive has no built-in support for sliding-window analytics.
 There is an enhancement request here:
https://issues.apache.org/jira/browse/HIVE-896

<https://issues.apache.org/jira/browse/HIVE-896>Without such support, the
brute force way of doing things is,

SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date > start_date
and event_date <= end_date;

(repeated N times to cover each day of your time window).

alex

On Thu, Oct 14, 2010 at 11:36 PM, Vijay <techvd@gmail.com> wrote:

> Hi, I need help with this scenario. We have a table of events which has
> columns date, event (not important for this discussion), and user_id. It is
> obviously easy to find number of unique users for each day. I also need to
> find number of unique users in the last 30 days for each day. This is also
> quite simple to do for one day. However, I cannot figure out how to do this
> for a range of days. Something like this is pretty straightforward in most
> RDBMS but with HiveQL has I'm finding this hard. I might be missing
> something simple though. Any help is appreciated. Ideally the query should
> also be as optimized as possible as this table could be huge.
>
> Thanks,
> Vijay
>
>

Mime
View raw message