hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Help needed: Out of memory with windowing functions
Date Wed, 20 Aug 2014 15:51:52 GMT
Hi Furcy,

Sorry that you run into this, but this is a non-trivial problem. But, good
news is Harish has done some good work in this area. See following links.
All this will be available in upcoming release, date for which is under
discussion on dev list.

1. https://issues.apache.org/jira/browse/HIVE-6999
2. https://issues.apache.org/jira/browse/HIVE-7062
3. https://issues.apache.org/jira/browse/HIVE-7063
4. https://issues.apache.org/jira/browse/HIVE-7143
5. https://issues.apache.org/jira/browse/HIVE-7344

Hope it helps,
Ashutosh


On Wed, Aug 20, 2014 at 5:34 AM, Furcy Pin <furcy.pin@flaminem.com> wrote:

> Hi all,
>
> I have an event table with (user_id, timestamp, event)
> and I'm trying to write a query to get the first 10 events for each user.
>
>
> My query goes like this :
>
> SELECT user_id, event
> FROM
> (
> SELECT
> user_id,
> event,
> ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY time) as rownum
> FROM eventTable
> ) T
> WHERE rownum <= 10
>
> However, the table may contain millions of events for the same user and
> I'm getting
> an OutOfMemory Error in the reduce phase, inside the following method:
>
>
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRowNumber$RowNumberBuffer.incr(GenericUDAFRowNumber.java:80)
>
>
>
>
> It seems that the windowing functions were designed to store a Buffer containing all
>
>
> results for each "PARTITION", and writes everything once all rows of that partition
>
> have been read.
>
>
> This make windowing with Hive not very scalable...
>
>
> My questions are:
>
>
> a) Is there a reason why it was implemented this way rather than in a "streaming" fashion?
>
>
> b) Do you know how I could rewrite the query to avoid the problem (if possible without
having to write my own UDF)?
>
>
>
>
>
> Thanks,
>
>
>
> Furcy
>
>
>
>
>

Mime
View raw message