incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Capwell <dcapw...@gmail.com>
Subject Re: Hcataog + PIG filter issue
Date Wed, 16 May 2012 15:53:16 GMT
Not at a computer right now so can't check jira but this should be fixed
in  hcat 0.4.1.

You should be able to compile truck or branch 4.  I live off trunk and I
remember this being fixed awhile ago
On May 16, 2012 6:37 AM, "Rajesh Balamohan" <rajesh.balamohan@gmail.com>
wrote:

> Thanks for the reply David.
>
> I tried with pig 0.9.3 as well. It had the same issue.
>
> Would 0.4.1 fix this?
> On May 16, 2012 6:57 PM, "David Capwell" <dcapwell@gmail.com> wrote:
>
>> If I remember correctly upgrading to pig 0.9.3 fixes this.  Or its fixed
>> in 0.4.1 hcat. Can't remember which. Try pig first since 0.4.1 isn't out.
>> On May 15, 2012 10:53 PM, "Rajesh Balamohan" <rajesh.balamohan@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am currently using the following. In certain scenario the filter
>>> condition is not applied and it ends up scanning the entire data. Sample is
>>> given below.
>>>
>>>
>>> Pig 0.9.0
>>> HCatalog 0.4.0
>>> Hadoop 0.20.20x
>>>
>>> dim_referrer = LOAD 'tableA' USING org.apache.hcatalog.pig.HCatLoader();
>>> source_data = LOAD 'tableB' USING org.apache.hcatalog.pig.HCatLoader();
>>> source_data_new = FILTER source_data BY d =='20120415';
>>> joined_data_referrer = JOIN source_data_new BY referrer LEFT OUTER,
>>> dim_referrer BY referrer_url using 'skewed';
>>> dump joined_data_referrer;
>>>
>>> In this case, all records are scanned and the filtering is not applied
>>> by HCatalog.
>>>
>>> Shouldn't it apply the filter first and then do the sampling M/R job
>>> required for "skewed" join?
>>>
>>> Is this a known issue. Any pointers would be of great help.
>>>
>>>
>>>
>>> --
>>> ~Rajesh.B
>>>
>>

Mime
View raw message