pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-3510) New filter extractor fails with more than one filter statement
Date Wed, 09 Oct 2013 23:04:42 GMT
Cheolsoo Park created PIG-3510:
----------------------------------

             Summary: New filter extractor fails with more than one filter statement
                 Key: PIG-3510
                 URL: https://issues.apache.org/jira/browse/PIG-3510
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.12.0
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
             Fix For: 0.12.1


This is a regression from PIG-3461 - rewrite of partition filter optimizer. Here is an example
that demonstrates the problem:
{code:title=two filters}
b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001);
c = FILTER b BY (event_id == 419 OR event_id == 418);
{code}
{code:title=one filter}
b = FILTER a BY (dateint >= 20130901 AND dateint <= 20131001) AND (event_id == 419 OR
event_id == 418);
{code}
Both dateint and event_id are partition columns. For the 1 filter case, the whole expression
is pushed down whereas for the 2 filter case, only (event_id == 419 OR event_id == 418) is
pushed down.

The reason is the filter extractor overwrites the pushdown expression that it extracted from
the 1st statement while visiting the 2nd statement.
{code}
private Expression pushdownExpr = null;
{code}
The old filter extractor used to keep pushdown expressions in array and assemble them with
AND at the end.
{code}
private ArrayList<Expression> pColConditions = new ArrayList<Expression>();
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message