hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators
Date Wed, 16 Dec 2009 18:00:19 GMT

    [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791489#action_12791489
] 

Dmitriy V. Ryaboy commented on PIG-1156:
----------------------------------------

Attached patch adds a new field, alias, to ExecJob and to PhysicalOperator.

The PhysicalOperator alias is set to the alias of the LogicalOperator that was compiled into
this PO. In cases when multiple POs are needed to represent a single LogicalOperator, all
of them get the same alias. Note that this means *there is a one-to-many correspondence* between
the LogicalOperator aliases and PhysicalOperator aliases.

POStore is assigned the alias of the relation being stored -- so, "store A into ...." will
have the alias 'A'.

ExecJob also gets an alias, which is assigned to it based on the alias of its POStore.

This allows us to call pigServer.executeBatch(), get a List of ExecJobs, and identify the
ExecJobs based on the name of the relation they stored -- allowing us to get appropriate result
iterators.

Note that adding aliases to PhysicalOperators will allow us to generate more meaningful plans
and error messages, as users will be able to correlate elements of the physical plan with
their PigLatin job. This means we are a step closer to solving PIG-908


> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way
to identify which of the ExecJobs corresponds to which store.  We should add aliases by which
the stored relations are known to ExecJob in order to allow the user to identify what the
jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message