hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject View Partition Pruning not Occurring during transform
Date Wed, 10 Oct 2012 19:04:11 GMT
Greetings all, I am trying to incorporate a TRANSFORM into a view (so we
can abstract the transform script away from the user)



As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) with
lots of partitions

and I tried this

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;

The reason I used 'cat' in my test is if this works, I will distribute my
transform scripts to each node manually, I know each node has cat, so this
works as a test.

When run

SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks get
spun up.

If I rewrite the view to be

CREATE VIEW view_transform as
Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table where
day = '2012-10-08';

Then only 16 map tasks get spun up (the desired behavior, but the pruning
is happening in the view not in the query)

Thus I wanted input on whether this should be considered a bug.  I.e.
Should we be able to define a partition spec in a view that uses a
transform that allows normal pruning to occur even though the partition
spec will be passed to the transfrom script?  I think we should, and it's
likely doable some how. This would be awesome for a number of situations
where you may want to expose "transformed" data to analysis without the
mess of having them format their script for transform.

Mime
View raw message