drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Westin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2077) Provide a clear starting point for new developers about what to start reading to learn about Drill
Date Thu, 29 Jan 2015 18:52:35 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297335#comment-14297335

Chris Westin commented on DRILL-2077:

A frequent question in the forums was about how to add a new storage plugin. I think there've
been some decent responses to that in the past that we could lift out of the group posting
and add to this.

> Provide a clear starting point for new developers about what to start reading to learn
about Drill
> --------------------------------------------------------------------------------------------------
>                 Key: DRILL-2077
>                 URL: https://issues.apache.org/jira/browse/DRILL-2077
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jason Altekruse
>            Assignee: Jason Altekruse
> As part of my package level javadocs posted in DRILL-1904 I tried to document the root
org.apache.drill.exec package. We should have some good information here as well as in the
markdown file on the git repo about the best place to start reading the code to understand
how drill operates.
> Here is a description I started. I think we want to make sure this is informative but
concise. I want to get in the rest of the package docs, so I am leaving this here as a TODO,
please feel free to comment, revise or add to this.
> {code}
>  * A good place to start learning about Drill is exploring the query plans. A
>  * Drill physical plan is defined as a connected graph of operators that read
>  * and manipulate data. Operators are configured by implementations of the {@See
>  * PhysicalOperator} interface. These query graphs are translated into a graph
>  * of physical operators that will actually process data at query execution
>  * time. The connections between these nodes are materialized as interfaces
>  * where data is passed between different operators. As Drill is distributed
>  * these connections can take the form of an RPC layer between the nodes in a
>  * Drill cluster.
>  *
>  * While physical plans can be written by hand, the primary interface for Drill
>  * is SQL. Drill is targeted for compliance with the ANSI SQL 2003
>  * specification. Query parsing and optimization is handled by Calcite, an
>  * Apache incubator project, also used for planning in Apache Hive. Drill
>  * defines many planning rules an optimizations that plug into the Calcite
>  * planning engine to generate optimal plans for the Drill engine.
>  *
>  * Unlike most query systems, Drill is designed to query raw files without
>  * a predefined catalog of metadata defining the types of data or columns 
>  * available in the dataset. To maintain performance in a flexible schema
>  * environment, Drill uses runtime code generation to compile custom java
>  * code as operators receive a message of change in schema. 
> {code}

This message was sent by Atlassian JIRA

View raw message