tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jihoon Son <jihoon...@apache.org>
Subject Re: Parallel Aggregates
Date Mon, 15 Jun 2015 23:55:29 GMT
Hi Atri, thanks for your question.

First of all, maybe you already did, I recommend that you read this article
<http://www.hadoopsphere.com/2015/02/technical-deep-dive-into-apache-tajo.html>
before you start implementation. This is written by Hyunsik, and contains
the description of Tajo's overall infrastructure. Afterwards, I think that
you may ask more detailed question.

Here, I'll roughly list some important classes for aggregate implementation.

   - SQLParser.g4 contains our SQL parsing rules. It is written in antlr.
   - SQLAnalyzer is our parser based on rules defined at SQLParser.g4.
   - SQLAnalyzer translates a SQL query into a tree of Expr which
   represents an algebraic expression.
   - LogicalPlanner translates the Expr tree into a LogicalPlan that
   logically describes how the given query will be executed.
   - GlobalPlanner translates the LogicalPlan into a MasterPlan
   (distributed query execution plan) that describes how the given query will
   be executed in distributed cluster.
   - Once a MasterPlan is created, QueryMaster starts to execute query
   processing. A query consists of multiple stages, which are individually
   processed in some order.
      - For example, a simple aggregation query is executed in two stages,
      each of which is for parallel aggregation and combining aggregates. These
      stages are executed sequentially.
   - A stage is concurrently processed by multiple tasks, and is executed
   by TajoWorker.
   - Each task contains meta information for input data and a LogicalPlan
   of the stage. This LogicalPlan is translated into PhysicalExec by
   PhysicalPlanner.
   - PhysicalExec describes how the query is actually executed.
      - For example, there are two types of AggregationExec,
      i.e., HashAggregateExec and SortAggregateExec, for hash-based aggregation
      and sort-based aggregation, respectively.

Best regards,
Jihoon

2015년 6월 15일 (월) 오후 11:32, Atri Sharma <atri.jiit@gmail.com>님이 작성:

> Folks,
>
> I am looking into parallel aggregates/combining aggregates. I have a plan
> around it which I think can work.
>
> Please update me on current infrastructure and point me around the existing
> code base. Also, ideas would be most welcome around it.
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message