incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rajesh.balamo...@gmail.com>
Subject Hcataog + PIG filter issue
Date Wed, 16 May 2012 05:53:22 GMT
Hi All,

I am currently using the following. In certain scenario the filter
condition is not applied and it ends up scanning the entire data. Sample is
given below.


Pig 0.9.0
HCatalog 0.4.0
Hadoop 0.20.20x

dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
source_data_new = FILTER source_data BY d =='20120415';
joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER,
dim_referrer BY referrer_url using 'skewed';
dump joined_data_referrer;

In this case, all records are scanned and the filtering is not applied by
HCatalog.

Shouldn't it apply the filter first and then do the sampling M/R job
required for "skewed" join?

Is this a known issue. Any pointers would be of great help.



-- 
~Rajesh.B

Mime
View raw message