hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohan monga <monga.ro...@gmail.com>
Subject Re: Invoke a UDAF inside another UDAF
Date Fri, 20 Jan 2012 23:00:54 GMT
my bad, i hastily converted the query to a wrong example.

it should be like this

select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;

In essence, I just want to use the value generated by one UDAF ( in this
case avg(f2) ) as a single number and then apply that value to the group
inside a different UDAF.
For e.g. if I were to use a streaming reducer, it would be something like
this

avg1 = computeSum(list) / len(list)
return computeSum(x-avg1 for x in list) / len(list)

As I write this I realize why this might not be possible [ the group
computation being done in one step and the information being lost ] :)

But why the nullpointer exception?

Regards,
--
Rohan Monga


On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> IMHO You can not possibly nest the percentile calculation because the
> results would be meaningless. percentile has to aggregate a set and
> pick the Nth element, But if you nest then the inner percentile only
> returns one result to the outer percentile, and that is pretty
> meaningless.
>
> (I think someone talked about this on list in the last month or so).
> Without seeing your input data and your expected results, i can not
> understand what your query wants to do, and suggest an alternative.
>
>
>
> On 1/20/12, rohan monga <monga.rohan@gmail.com> wrote:
> > thanks edward that seems to work  :)
> >
> > However, I have another query is like this
> >
> > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx(
> b,
> > .5), .5 ) from table1 group by a
> >
> > Here I will loose the group info if I include the inner query in the FROM
> > clause, is there a way to get this to work?
> >
> > Thanks,
> > --
> > Rohan Monga
> >
> >
> > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
> > <edlinuxguru@gmail.com>wrote:
> >
> >> I think if you are grouping by b, b has to be in your select list. Try
> >> this.
> >> FROM (
> >> select b,count(a) as theCount from table one group by b
> >> ) a select mean(theCount);
> >>
> >> I think that should work.
> >>
> >> On 1/20/12, rohan monga <monga.rohan@gmail.com> wrote:
> >> > Hi,
> >> > I am trying to run a query like
> >> > "select mean(count(a)) from table1 group by b;"
> >> >
> >> > I am getting the following error
> >> > <snip>
> >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> >> > java.lang.NullPointerException
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223)
> >> >     at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
> >> >     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428)
> >> >     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> >> >     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> >> >     at
> >> >
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
> >> >     at
> >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> >> >     at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
> >> >     at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> >> >     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
> >> >     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552)
> >> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >     at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >     at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >> > </snip>
> >> >
> >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8
> >> >
> >> > Thanks
> >> > --
> >> > Rohan Monga
> >> >
> >>
> >
>

Mime
View raw message