incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Weird behavior with partition filter push-down not working in Pig 0.10 when producing more than one relation with different filters
Date Sat, 26 Jan 2013 19:48:13 GMT
I have Pig script that loads data from HCatalog. I filter immediately
after the load and my filter includes criteria on my partitions. The
push down works as expected in this scenario. Here's an example of the
Pig code:

all_rows = load 'some_hive_table' using org.apache.hcatalog.pig.HCatLoader();
filtered_rows = foreach (filter all_rows by (datetime_partition >=
'$ROWS_30_DAYS_AGO'))
   generate ...;

In this case, my partition field is datetime_partition.


However, I also need another set of rows from my "some_hive_table"
(actual name obfuscated) later in the script such as:

filtered_rows = foreach (filter all_rows by (datetime_partition >=
'$ROWS_1_DAYS_AGO'))
   generate ...;

What I'm finding is that Pig ends up doing a full-table scan on across
all partitions, ie. the push-down doesn't occur.

I tried changing the second filter to re-load the table but that gave
some weird error "Could not resolve org.apache.hcatalog.pig.HCatLoader
using imports: [com.dachisgroup.analytics.pig.storage., ,
org.apache.pig.builtin., org.apache.pig.impl.builtin.]" ... Here's the
code that produced the error:

all_rows_2ndpass = load 'some_hive_table' using
org.apache.hcatalog.pig.HCatLoader();
filtered_rows = foreach (filter all_rows_2ndpass by
(datetime_partition >= '$ROWS_1_DAYS_AGO'))
   generate ...;

Is this expected behavior with Pig 0.10? I suppose I could split up
the script into two parts, but that's not ideal.

Cheers,
Tim

Mime
View raw message