From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/DeveloperGuide" by AshishThusoo
Date Mon, 15 Dec 2008 20:20:48 GMT
Summaries of the various query processor components.

  === SerDe ===
  === MetaStore ===
  === Query Processor ===
+ The following are the main components of the Hive Query Processor:
+  * Parse and SemanticAnalysis (ql/parse) - This component contains the code for parsing
SQL, converting it into Abstract Syntax Trees, converting the Abstract Syntax Trees into Operator
Plans and finally converting the operator plans into a directed graph of tasks which are executed
by Driver.java.
+  * Optimizer (ql/optimizer) - This component contains some simple rule based optimizations
like pruning non referenced columns from table scans (column pruning) that the Hive Query
Processor does while converting SQL to a series of map/reduce tasks.
+  * Plan Components (ql/plan) - This component contains the classes (which are called descriptors),
that are used by the compiler (Parser, SemanticAnalysis and Optimizer) to pass the information
to operator trees that is used by the execution code.
+  * MetaData Layer (ql/metadata) - This component is used by the query processor to interface
with the MetaStore in order to retrieve information about tables, partitions and the columns
of the table. This information is used by the compiler to compile SQL to a series of map/reduce
+  * Map/Reduce Execution Engine (ql/exec) - This component contains all the query operators
and the framework that is used to invoke those operators from within the map/reduces tasks.
+  * Hadoop Record Readers, Input and Output Formatters for Hive (ql/io) - This component
contains the record readers and the input, output formatters that Hive registers with a Hadoop
+  * Sessions (ql/session) - A rudimentary session implementation for Hive.
+  * Type interfaces (ql/typeinfo) - This component provides all the type information for
table columns that is retrieved from the MetaStore and the SerDes.
+  * Hive Function Framework (ql/udf) - Framework and implementation of Hive operators, Functions
and Aggregate Functions. This component also contains the interfaces that a user can implement
to create user defined functions.
+  * Tools (ql/tools) - Some simple tools provided by the query processing framework. Currently,
this component contains the implementation of the lineage tool that can parse the query and
show the source and destination tables of the query.
  ==== Compiler ====
  ==== Parser ====
  ==== TypeChecking ====

