hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: Beeline throws OOM on large input query
Date Tue, 06 Sep 2016 13:06:34 GMT
Hi Adam

Are you familiar with MBR (Minimum Bounding Rectangle) and tessellations technics?
You might be able to significantly reduce the number of potentially intersected polygons.

Dudu

From: Adam [mailto:work.asr@gmail.com]
Sent: Sunday, September 04, 2016 4:07 AM
To: user@hive.apache.org
Subject: RE: Beeline throws OOM on large input query

Reply to Stephen Sprague
1) confirm your beeline java process is indeed running with expanded
memory
I used the -XX:+PrintCommandLineFlags which showed:
    -XX:MaxHeapSize=17179869184
confirming the 16g setting.

2)
try the hive-cli (or the python one even.)  or "beeline -u
jdbc:hive2://"
I was using the beeline jdbc connect:
  issuing: !connect jdbc:hive2: ........

3) chop down your 6K points to 3K or something smaller to see just where
the breaking point is
I didn't bother though it would be good information since I found a work around and troubleshooting
beeline wasn't my primary goal :)

Reply to Markovitz, Dudu
The query is basically finding geometry intersections.
If you are familiar with Postgis, it is a Java version of the Postgis function ST_Intersects
(http://postgis.net/docs/ST_Intersects.html) wrapped in a Hive UDF.

We are checking intersection of a table's geometry column with a set of N geometries (6000+
in this case).

select from table
where st_intersects(table.geom, g1) OR st_intersects(table.geom, g2), etc.

Unfortunately doing it with a table join requires a theta condition which Hive doesn't support,
something like

select from table inner join reftable on st_intersects(table.geom, reftable.geom)

I tried pushing down the predicate but that required a cross join which was not feasible for
the huge table sizes.




Mime
View raw message