pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Woodruff (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-5326) Issue with auto parallelism and scalar inputs in Tez
Date Wed, 03 Jan 2018 19:26:00 GMT
Travis Woodruff created PIG-5326:
------------------------------------

             Summary: Issue with auto parallelism and scalar inputs in Tez
                 Key: PIG-5326
                 URL: https://issues.apache.org/jira/browse/PIG-5326
             Project: Pig
          Issue Type: Bug
          Components: tez
            Reporter: Travis Woodruff


I'm getting a "Scalar has more than one row in the output" error with the following script:

{code}
a = LOAD 't' as (x:chararray);
b = GROUP a BY x PARALLEL 2;
c = GROUP b by group;
d = FOREACH (GROUP a ALL) GENERATE COUNT(a) as count;
e = FOREACH c GENERATE group, d.count;
DUMP e;
{code}

If I add a PARALLEL clause to {{c}}, the error goes away, so the issue seems to be related
to auto parallelism.

I'm not very familiar with Tez, so I'm not sure how things are supposed to work, the issue
seems to be related to the following (I know almost nothing about Tez so take this with a
grain of salt):

# {{PigGraceShuffleVertexManager}} calls {{VertexImpl.reconfigureVertex()}}, which configures
the parallelism of the vertex ({{VertexImpl.numTasks}})
# The {{InputSpec}} for the scalar input is created (via {{Edge.getDestinationSpec()}}) with
{{physicalInputCount}} equal to the parallelism set above
# The input is created (in {{LogicalIOProcessorRuntimeTask.createInput()}}) based on this
{{InputSpec}}.
# The resulting {{UnorderedKVInput}} creates a {{ShuffleManager}} with {{numInputs}} = {{numPhysicalInputs}}.
 
This creates a reader that reads the scalar input {{numPhysicalInputs}} times, which results
in the "Scalar has more than one row in the output" error in {{ReadScalarsTez}}.

When parallelism is specified explicitly, {{VertexImpl.reconfigureVertex()}} is never called,
and {{numPhysicalInputs}} remains as 1 for the scalar input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message