If I remember correctly upgrading to pig 0.9.3 fixes this. Or its fixed in 0.4.1 hcat. Can't remember which. Try pig first since 0.4.1 isn't out.
Hi All,I am currently using the following. In certain scenario the filter condition is not applied and it ends up scanning the entire data. Sample is given below.
dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
source_data_new = FILTER source_data BY d =='20120415';
joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER, dim_referrer BY referrer_url using 'skewed';
In this case, all records are scanned and the filtering is not applied by HCatalog.
Shouldn't it apply the filter first and then do the sampling M/R job required for "skewed" join?Is this a known issue. Any pointers would be of great help.