drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2077) Provide a clear starting point for new developers about what to start reading to learn about Drill
Date Tue, 27 Jan 2015 00:42:35 GMT
Jason Altekruse created DRILL-2077:

             Summary: Provide a clear starting point for new developers about what to start
reading to learn about Drill
                 Key: DRILL-2077
                 URL: https://issues.apache.org/jira/browse/DRILL-2077
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Jason Altekruse
            Assignee: Jason Altekruse

As part of my package level javadocs posted in DRILL-1904 I tried to document the root org.apache.drill.exec
package. We should have some good information here as well as in the markdown file on the
git repo about the best place to start reading the code to understand how drill operates.

Here is a description I started. I think we want to make sure this is informative but concise.
I want to get in the rest of the package docs, so I am leaving this here as a TODO, please
feel free to comment, revise or add to this.

 * A good place to start learning about Drill is exploring the query plans. A
 * Drill physical plan is defined as a connected graph of operators that read
 * and manipulate data. Operators are configured by implementations of the {@See
 * PhysicalOperator} interface. These query graphs are translated into a graph
 * of physical operators that will actually process data at query execution
 * time. The connections between these nodes are materialized as interfaces
 * where data is passed between different operators. As Drill is distributed
 * these connections can take the form of an RPC layer between the nodes in a
 * Drill cluster.
 * While physical plans can be written by hand, the primary interface for Drill
 * is SQL. Drill is targeted for compliance with the ANSI SQL 2003
 * specification. Query parsing and optimization is handled by Calcite, an
 * Apache incubator project, also used for planning in Apache Hive. Drill
 * defines many planning rules an optimizations that plug into the Calcite
 * planning engine to generate optimal plans for the Drill engine.
 * Unlike most query systems, Drill is designed to query raw files without
 * a predefined catalog of metadata defining the types of data or columns 
 * available in the dataset. To maintain performance in a flexible schema
 * environment, Drill uses runtime code generation to compile custom java
 * code as operators receive a message of change in schema. 

This message was sent by Atlassian JIRA

View raw message