pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3555) Initial implementation of combiner optimization
Date Wed, 06 Nov 2013 11:07:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Cheolsoo Park updated PIG-3555:

    Attachment: PIG-3555-1.patch

Attaching the first patch. RB link- https://reviews.apache.org/r/15261/

> Initial implementation of combiner optimization
> -----------------------------------------------
>                 Key: PIG-3555
>                 URL: https://issues.apache.org/jira/browse/PIG-3555
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>         Attachments: PIG-3555-1.patch
> To support algebraic UDFs and others, combiner is required. To start with, I am proposing
the following initial implementation-
> * In Tez, combiner runs as part of ShuffledMergedInput in edges, so multiple combine
plans (one per edge) need to be registered in a destination vertex. Each vertex is mapped
to a TezOperator in Tez plan, so an array of combine plans will be stored in the TezOperator
that maps to a destination vertex.
> * To register combine plans in a TezOperator, we will run a CombinerOptimizer on the
Tez plan after TezCompiler generates it but before TezDagBuilder converts it into DAG.
> * Finally, TezDagBuilder will insert combine plans into the payload of ShuffledMergedInput
while constructing a destination vertex.
> This initial implementation will allow us to run algebraic UDFs. In the future, we can
implement more optimizations for limit, order-by, etc on top of this.

This message was sent by Atlassian JIRA

View raw message