The issue was not specific to filter-union
The fix was to do filter PushUpFilter before PartitionFilterOptimizer .
As this is not a hcat issue, it should not matter if you have an older hcat version . fyi, this bug was not there in pig 0.8.x .
Was it pig 0.9.0 or 0.9.1 that you used ?
On 4/24/12 5:21 PM, Aniket Mokashi wrote:
Can you point me to jira that fixes filter-union problem (in pig)? I
haven't tried hcat-0.4 yet, good to know about that issue. I will keep a
On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair <firstname.lastname@example.org<mailto:email@example.com>> wrote:<mailto:rajesh.balamohan@__gmail.com
Are you using pig 0.9 or 0.9.1 ?
If yes, can you try with pig 0.9.2 ?
Wondering if you are also hitting the issue that Thomas mentioned .
On 4/23/12 7:39 PM, Aniket Mokashi wrote:
Something similar I have noticed is -
A = load ...
B1 = filter A by cond1;
B2 = filter A by cond2;
B3 = filter A by cond3;
B = union B1, B2, B3; does not push projection.
Is that expected?
Ideally, we should have "strict" mode under hcatalog, that when
on will avoid executing pig queries on the full (partitioned) table.
On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan
<firstname.lastname@example.org <mailto:email@example.com><mailto:firstname.lastname@example.org <mailto:email@example.com>>__>
Thanks for the quick response.
I am using HCatalog 0.4.
With simple PIG script it works great. HCatalog beautifully
only the relevant information. However, full scan happens
we have couple of additional joins and when we change the
order (we also use "using skewed").
Though we have looked into the debug logs, we saw the
number of records from the JobTracker's counters itself. Without
pruning, the m/r job was pretty much scanning the entire set
I am not sure if there is a corner case, where in "skewed"
trying to override the filtering.
On Tue, Apr 24, 2012 at 2:13 AM, Alan Gates
What version of HCatalog are you using? How do you know
scanning all the partitions, does it say so in the logs,
you getting all the records back?
And yes, HCat is supposed to do partition pruning so that it
only scans the required partitions.
On Apr 21, 2012, at 8:27 PM, Rajesh Balamohan wrote:
> Hi All,
> I have a hcatalog table "partitioned by (d string)".
> I have couple of days worth of data and when i run "show
partitions" it provides the correct daa.
> However, when I run PIG with "filter a by d == '20120415'",
it ends up scanning all data.
> Is this a known bug/enhancement in HCatalog?. Ideally,
shouldn't it scan only the d=20120415 directory?
> Any pointers would be of great help.