hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam <work....@gmail.com>
Subject RE: Beeline throws OOM on large input query
Date Sun, 04 Sep 2016 01:07:15 GMT
*Reply to Stephen Sprague*
*1) confirm your beeline java process is indeed running with expanded*
*memory*
I used the -XX:+PrintCommandLineFlags which showed:
    -XX:MaxHeapSize=17179869184
confirming the 16g setting.

*2) *
*try the hive-cli (or the python one even.)  or "beeline -u *
*jdbc:hive2://"*
I was using the beeline jdbc connect:
  issuing: !connect jdbc:hive2: ........

*3) chop down your 6K points to 3K or something smaller to see just where*
*the breaking point is*
I didn't bother though it would be good information since I found a work
around and troubleshooting beeline wasn't my primary goal :)

*Reply to Markovitz, Dudu*
The query is basically finding geometry intersections.
If you are familiar with Postgis, it is a Java version of the Postgis
function ST_Intersects (http://postgis.net/docs/ST_Intersects.html) wrapped
in a Hive UDF.

We are checking intersection of a table's geometry column with a set of N
geometries (6000+ in this case).

*select from table*
*where st_intersects(table.geom, g1) OR st_intersects(table.geom, g2), etc.*

Unfortunately doing it with a table join requires a theta condition which
Hive doesn't support, something like

*select from table inner join reftable on st_intersects(table.geom,
reftable.geom)*

I tried pushing down the predicate but that required a cross join which was
not feasible for the huge table sizes.

Mime
View raw message