incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rajesh.balamo...@gmail.com>
Subject Re: Hcataog + PIG filter issue
Date Wed, 16 May 2012 13:37:23 GMT
Thanks for the reply David.

I tried with pig 0.9.3 as well. It had the same issue.

Would 0.4.1 fix this?
On May 16, 2012 6:57 PM, "David Capwell" <dcapwell@gmail.com> wrote:

> If I remember correctly upgrading to pig 0.9.3 fixes this.  Or its fixed
> in 0.4.1 hcat. Can't remember which. Try pig first since 0.4.1 isn't out.
> On May 15, 2012 10:53 PM, "Rajesh Balamohan" <rajesh.balamohan@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am currently using the following. In certain scenario the filter
>> condition is not applied and it ends up scanning the entire data. Sample is
>> given below.
>>
>>
>> Pig 0.9.0
>> HCatalog 0.4.0
>> Hadoop 0.20.20x
>>
>> dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
>> source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
>> source_data_new = FILTER source_data BY d =='20120415';
>> joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER,
>> dim_referrer BY referrer_url using 'skewed';
>> dump joined_data_referrer;
>>
>> In this case, all records are scanned and the filtering is not applied by
>> HCatalog.
>>
>> Shouldn't it apply the filter first and then do the sampling M/R job
>> required for "skewed" join?
>>
>> Is this a known issue. Any pointers would be of great help.
>>
>>
>>
>> --
>> ~Rajesh.B
>>
>

Mime
View raw message