incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Hcataog + PIG filter issue
Date Wed, 16 May 2012 13:27:05 GMT
If I remember correctly upgrading to pig 0.9.3 fixes this.  Or its fixed in
0.4.1 hcat. Can't remember which. Try pig first since 0.4.1 isn't out.
On May 15, 2012 10:53 PM, "Rajesh Balamohan" <rajesh.balamohan@gmail.com>
wrote:

> Hi All,
>
> I am currently using the following. In certain scenario the filter
> condition is not applied and it ends up scanning the entire data. Sample is
> given below.
>
>
> Pig 0.9.0
> HCatalog 0.4.0
> Hadoop 0.20.20x
>
> dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
> source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
> source_data_new = FILTER source_data BY d =='20120415';
> joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER,
> dim_referrer BY referrer_url using 'skewed';
> dump joined_data_referrer;
>
> In this case, all records are scanned and the filtering is not applied by
> HCatalog.
>
> Shouldn't it apply the filter first and then do the sampling M/R job
> required for "skewed" join?
>
> Is this a known issue. Any pointers would be of great help.
>
>
>
> --
> ~Rajesh.B
>

Mime
View raw message