hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <i...@decide.com>
Subject Re: implementing moving average as a UDF
Date Tue, 22 Feb 2011 19:55:22 GMT
Thank you, John.

It's not quite clear from the page whether my solution:
1. makes sense
2. works now
3. will work in the future if the issue is resolved/implemented

Could you elaborate?

Also, there is no mentioning of UDF object sharing (between mappers) in the
current implementation. Is this a problem? do I need to use ThreadLocal or
something like that?

On Tue, Feb 22, 2011 at 11:42 AM, John Sichi <jsichi@fb.com> wrote:

> Please see the discussion in this JIRA issue:
>
> https://issues.apache.org/jira/browse/HIVE-1994
>
> JVS
>
> On Feb 21, 2011, at 10:45 PM, Igor Tatarinov wrote:
>
> > I would like to implement the moving average as a UDF (instead of a
> streaming reducer). Here is what I am thinking. Please let me know if I am
> missing something here:
> >
> > SELECT product, date, mavg(product, price, 10)
> > FROM (
> >   SELECT *
> >   FROM prices
> >   DISTRIBUTE BY product
> >   SORT BY product, date
> > )
> >
> > I have to pass the key to mavg() because it has to detect when one
> product grouping ends and another starts.
> >
> > Unfortunately, mavg will also need to maintain a state (moving sum and
> count). That's where I am worried that Hive (Hadoop?) will use a single
> instance of my UDF to process concurrent groupings and this idea won't work.
> >
> > Is that the main issue? Is there something I can do to fix that?
> >
> > Thanks!
> > igor
> >
>
>

Mime
View raw message