hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Beeline throws OOM on large input query
Date Fri, 02 Sep 2016 13:53:44 GMT
hmmm.  so beeline blew up *before* the query was even submitted to the
execution engine?   one would think 16G would be plenty 8M row sql
statement.

some suggestions if you feel like going further down the rabbit hole.

1.  confirm your beeline java process is indeed running with expanded
memory (ps -ef | grep beeline and look for the last Xmx setting on the line)

2.  try the hive-cli (or the python one even.)  or "beeline -u
jdbc:hive2://" (local beeline - maybe thats different)

3.  chop down your 6K points to 3K or something smaller to see just where
the breaking point is.  does 1K points even work?  ie. determine how close
to edge are you?

Cheers,
Stephen.

PS. i had never heard of a "theta" join before so a searched it and found
this:
https://cwiki.apache.org/confluence/display/Hive/Theta+Join  and
this: https://issues.apache.org/jira/browse/HIVE-556 (looks like this came
first)

and still in "open" status i see. well you're not alone if that's any
solace!

maybe ping that Jira and see if Edward or Brock (or others) have any new
news on the topic as supporting theta joins sounds like the proper solution
to this whole rigamarole you find yourself in.

On Fri, Sep 2, 2016 at 6:12 AM, Adam <work.asr@gmail.com> wrote:

> I set the heap size using HADOOP_CLIENT_OPTS all the way to 16g and still
> no luck.
>
> I tried to go down the table join route but the problem is that the
> relation is not an equality so it would be a theta join which is not
> supported in Hive.
> Basically what I am doing is a geographic intersection against 6,000
> points so the where clause has 6000 points in it (I use a custom UDF for
> the intersection).
>
> To avoid the problem I ended up writing another version of the UDF that
> reads the point list from an HDFS file.
>
> It's a low priority I'm sure but I bet there are some inefficiencies in
> the query string handling that could be fixed.  When I traced the code it
> was doing all kinds of StringBuffer and String += type stuff.
>
> Regards,
>

Mime
View raw message