hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Aggregrate Query Fails.
Date Thu, 23 Apr 2009 03:22:14 GMT
Hey Zheng,

Thanks for the insight. Perhaps these sorts of quirks could be added to the
documentation on the wiki?

Thanks,
Jeff

On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zshao9@gmail.com> wrote:

> Hi Matt,
>
> There is a known bug with JOIN -  all output columns from JOIN will become
> STRING.
> The stacktrace shows this is exactly because of that bug.
>
> The workaround right now is to replace all columns with CAST(xxx as
> STRING), if the column type is not a STRING.
>
>
> select  m.description, o_buyers.num as buyers*, count(1) as total*
> from clickstream_output o
>   join merchandise m on (CAST(o.merchandise_id AS STRING) =
> CAST(m.merchandise_id AS STRING))
>   left outer join ( select CAST(o1.merchandise_id AS STRING) as
> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
> clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
> o_buyers
>      on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
> STRING))
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> *group by m.description, o_buyers.num* limit 40 ;
>
>
> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.
>
> Zheng
>
>
> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <athusoo@facebook.com>wrote:
>
>> Can you do an explain <query> and send us the plan.
>>
>> Another thing that you may try is to put the entire subquery in the from
>> clause and then do an aggregation on it..
>>
>> i.e.
>>
>> select ..., count(1)
>> from (select ...
>>         from clickstream_output o join .....) t
>> group by t.description, t.num limit 40;
>>
>> Also are you using the 0.3.0 release candidate?
>>
>> Ashish
>>
>> ________________________________________
>> From: Matt Pestritto [matt@pestritto.com]
>> Sent: Wednesday, April 22, 2009 3:53 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Aggregrate Query Fails.
>>
>> So the only change was to remove the column alias correct?
>> Still no luck.  Same result.
>>
>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
>> <mailto:pchakka@facebook.com>> wrote:
>> That is strange... Does below also fail?
>>
>> select  m.description, o_buyers.num , count(1) as total
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> group by m.description, o_buyers.num limit 40 ;
>>
>>
>> ________________________________
>> From: Matt Pestritto <matt@pestritto.com<http://matt@pestritto.com>>
>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
>> >>
>> Date: Wed, 22 Apr 2009 15:43:40 -0700
>> To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org>>
>> Subject: Re: Aggregrate Query Fails.
>>
>>
>> Thanks.
>> I tried that also earlier:
>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
>> buyers
>>
>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
>> http://aaa@cloudera.com>> wrote:
>>
>>
>> in the group by, try this instead:
>>
>> group by m.description, buyers limit 40 ;
>>
>> Matt Pestritto wrote:
>> Hi - I'm having a problem with a query below.  When I try to run any
>> aggregate function on a column from the sub-query, the job fails.
>> The queries and output messages are below.
>>
>> Suggestions?
>>
>> thanks in advance.
>>
>> -- works:  2 map-reduces jobs.
>> select m.description, o_buyers.num as buyers
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>
>> Successful output: PROD7362, 3
>>
>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum
>> or max(o_buyers.num) and removing from group by also fails.
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>  group by m.description, o_buyers.num limit 40 ;
>>
>>
>>
>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>  at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>  at
>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>
>>
>>
>>
>>
>>
>
>
> --
> Yours,
> Zheng
>

Mime
View raw message