hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wzc1989 <wzc1...@gmail.com>
Subject 回复: different outer join plan between hive 0.9 and hive 0.10
Date Wed, 03 Jul 2013 15:27:58 GMT
Hi navis:  
Thanks for your reply. Currently I'm working on the  temporary solution by changing the type
of filter mask and doing the performance test. I try to read the patches and source code now
and when I get better understanding of the code maybe I can help with this problem :)

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年7月2日星期二,上午8:24,Navis류승우 写道:

> Yes, a little bit.
>  
> IMHO, these flags could be assigned only for aliases with condition on
> 'on' clause. Then, I think, even a byte (8 flags) could be enough in
> most cases.
>  
> I'll do that if time permits.
>  
> 2013/7/1 wzc1989 <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> > hi navis:
> > look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464), I
> > understand what you mean by "hive tags rows a filter mask as a short for
> > outer join, which can contain 16 flags. " . I wonder why not choose Long or
> > int which can contain 64/32 tags. Does adding one Long/int in every row cost
> > too much?
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午2:17,Navis류승우 写道:
> >  
> > In short, hive tags rows a filter mask as a short for outer join,
> > which can contain 16 flags. (see HIVE-3411, plz)
> >  
> > I'll survey for a solution.
> >  
> > 2013/5/14 wzc1989 <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> >  
> > "hive cannot merge joins of 16+ aliases with outer join into single stage."
> > In our use case we use one table full outer join all other table to produce
> > one big table, which may exceed 16 outer join limits and will be split into
> > multi stage under hive 0.10.
> > It become very slow under hive 0.10 while we run such query well under hive
> > 0.9.
> > I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
> > merge join 16+ aliases into single stage while hive 0.9 doesn't have such
> > issue. could you explain this or give me some hint?
> >  
> > Thanks!
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午12:26,Navis류승우 写道:
> >  
> > The error message means hive cannot merge joins of 16+ aliases with
> > outer join into single stage. It was 8 way originally (HIVE-3411) but
> > expanded to 16 later.
> >  
> > Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
> >  
> > 2013/5/14 wzc1989 <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> >  
> > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> > commits and the above explain result matches in hive 0.9 and hive 0.10,
> > thanks!
> > But I confuse about this error msg:
> >  
> > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> > join(s) " +
> > "cannot have more than 16 aliases"),
> >  
> > does this mean in hive0.10 when we have more than 16 outer join the query
> > plan will still have some bug?
> > I test the sql below and find the explain result still diff between hive 0.9
> > and hive 0.10.
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> > left outer join default.test_join h on a.key = h.key
> > left outer join default.test_join i on a.key = i.key
> > left outer join default.test_join j on a.key = j.key
> > left outer join default.test_join k on a.key = k.key
> > left outer join default.test_join l on a.key = l.key
> > left outer join default.test_join m on a.key = m.key
> > left outer join default.test_join n on a.key = n.key
> > left outer join default.test_join u on a.key = u.key
> > left outer join default.test_join v on a.key = v.key
> > left outer join default.test_join w on a.key = w.key
> > left outer join default.test_join x on a.key = x.key
> > left outer join default.test_join z on a.key = z.key
> >  
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
> >  
> > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> > HIVE-3464) and still not completely fixed even in trunk.
> >  
> > Will be fixed shortly.
> >  
> > 2013/3/29 wzc <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> >  
> > The bug remains even if I apply the patch in HIVE-4206 :( The explain
> > result hasn't change.
> >  
> >  
> > 2013/3/28 Navis류승우 <navis.ryu@nexr.com (mailto:navis.ryu@nexr.com)>
> >  
> >  
> > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
> >  
> > Thanks for reporting it.
> >  
> > 2013/3/24 wzc <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> >  
> > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> > of
> > our hive queries almost 7 times slow. One of such query consists
> > multiple
> > table outer join on the same key. By looking into the query, we found
> > the
> > query plans generate by hive 0.9 and hive 0.10 are different. Here is
> > the
> > example:
> >  
> > testcase:
> >  
> > use default;
> > create table test_join (
> > `key` string,
> > `value` string
> > );
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> >  
> >  
> > the explain of hive 0.9:
> >  
> > STAGE DEPENDENCIES:
> >  
> > Stage-1 is a root stage
> >  
> > Stage-2 depends on stages: Stage-1
> >  
> > Stage-0 is a root stage
> >  
> > ...
> >  
> > Reduce Operator Tree:
> >  
> > Join Operator
> >  
> > condition map:
> >  
> > Left Outer Join0 to 1
> >  
> > Left Outer Join0 to 2
> >  
> > Left Outer Join0 to 3
> >  
> > Left Outer Join0 to 4
> >  
> > Left Outer Join0 to 5
> >  
> > Left Outer Join0 to 6
> >  
> > condition expressions:
> >  
> > 0 {VALUE._col1}
> >  
> > 1
> >  
> > 2
> >  
> > 3
> >  
> > 4
> >  
> > 5
> >  
> > 6
> >  
> > ......
> >  
> >  
> > while the explain of hive 0.10:
> >  
> > STAGE DEPENDENCIES:
> >  
> > Stage-6 is a root stage
> >  
> > Stage-1 depends on stages: Stage-6
> >  
> > Stage-2 depends on stages: Stage-1
> >  
> > Stage-0 is a root stage
> >  
> > ...
> >  
> > Reduce Operator Tree:
> >  
> > Join Operator
> >  
> > condition map:
> >  
> > Left Outer Join0 to 1
> >  
> > Left Outer Join0 to 2
> >  
> > condition expressions:
> >  
> > 0 {VALUE._col0} {VALUE._col1}
> >  
> > 1
> >  
> > 2
> >  
> > ...
> >  
> > Reduce Operator Tree:
> >  
> > Join Operator
> >  
> > condition map:
> >  
> > Left Outer Join0 to 1
> >  
> > Left Outer Join0 to 2
> >  
> > Left Outer Join0 to 3
> >  
> > Left Outer Join0 to 4
> >  
> > condition expressions:
> >  
> > 0 {VALUE._col9}
> >  
> > 1
> >  
> > 2
> >  
> > 3
> >  
> > 4
> >  
> > ....
> >  
> >  
> > It seems like hive 0.9 use only one stage/job to process all outer joins
> > but
> > hive 0.10 split them into two stage. When running such kind of query on
> > hive0.10 in production, in the second stage of outer join process, some
> > reducer stucks.
> >  
> > I can't find any param to change the query plain , can anyone give me
> > some
> > hint?
> >  
> > Thanks!  


Mime
View raw message