- - - - - -
-

Apache Drill Contribution Ideas

- -
- - - -
- -
    -
  • Fixing JIRAs
  • -
  • SQL functions
  • -
  • Support for new file format readers/writers
  • -
  • Support for new data sources
  • -
  • New query language parsers
  • -
  • Application interfaces - -
      -
    • BI Tool testing
    • -
  • -
  • General CLI improvements
  • -
  • Eco system integrations - -
      -
    • MapReduce
    • -
    • Hive views
    • -
    • YARN
    • -
    • Spark
    • -
    • Hue
    • -
    • Phoenix
    • -
  • -
- -

Fixing JIRAs

- -

This is a good place to begin if you are new to Drill. Feel free to pick -issues from the Drill JIRA list. When you pick an issue, assign it to -yourself, inform the team, and start fixing it.

- -

For any questions, seek help from the team through the mailing list.

- -

https://issues.apache.org/jira/browse/DRILL/?selectedTab=com.atlassian.jira -.jira-projects-plugin:summary-panel

- -

SQL functions

- -

One of the next simple places to start is to implement a DrillFunc.
 DrillFuncs -is way that Drill express all scalar functions (UDF or system).
 First you can -put together a JIRA for one of the DrillFunc's we don't yet have but should -(referencing the capabilities of something like Postgres
 or SQL Server or your -own use case). Then try to implement one.

- -

One example DrillFunc:
-ComparisonFunctions.java

- -
- -

Additional ideas on functions that can be added to SQL support

- -
    -
  • Madlib integration
  • -
  • Machine learning functions
  • -
  • Approximate aggregate functions (such as what is available in BlinkDB)
  • -
- -

Support for new file format readers/writers

- -

Currently Drill supports text, JSON and Parquet file formats natively when -interacting with file system. More readers/writers can be introduced by -implementing custom storage plugins. Example formats are.

- -
    -
  • Sequence
  • -
  • RC
  • -
  • ORC
  • -
  • Protobuf
  • -
  • XML
  • -
  • Thrift
  • -
- -

Support for new data sources

- -

Writing a new file-based storage plugin, such as a JSON or text-based storage plugin, simply involves implementing a couple of interfaces. The JSON storage plugin is a good example.

- -

You can refer to the github commits to the mongo db and hbase storage plugin for implementation details:

- - - -

Focus on implementing/extending this list of classes and the corresponding implementations done by Mongo and Hbase. Ignore the mongo db plugin optimizer rules for pushing predicates into the scan.

- -

Initially, concentrate on basics:

- -
    -
  • AbstractGroupScan (MongoGroupScan, HbaseGroupScan)
  • -
  • SubScan (MongoSubScan, HbaseSubScan)
  • -
  • RecordReader (MongoRecordReader, HbaseRecordReader)
  • -
  • BatchCreator (MongoScanBatchCreator, HbaseScanBatchCreator)
  • -
  • AbstractStoragePlugin (MongoStoragePlugin, HbaseStoragePlugin)
  • -
  • StoragePluginConfig (MongoStoragePluginConfig, HbaseStoragePluginConfig)
  • -
- -

Implement custom storage plugins for the following non-Hadoop data sources:

- -
    -
  • NoSQL databases (such as Mongo, Cassandra, Couch etc)
  • -
  • Search engines (such as Solr, Lucidworks, Elastic Search etc)
  • -
  • SQL databases (MySQL< PostGres etc)
  • -
  • Generic JDBC/ODBC data sources
  • -
  • HTTP URL
  • -
  • ----
  • -
- -

New query language parsers

- -

Drill exposes strongly typed JSON APIs for logical and physical plans. Drill provides a -SQL language parser today, but any language parser that can generate -logical/physical plans can use Drill's power on the backend as the distributed -low latency query execution engine along with its support for self-describing -data and complex/multi-structured data.

- -
    -
  • Pig parser : Use Pig as the language to query data from Drill. Great for existing Pig users.
  • -
  • Hive parser : Use HiveQL as the language to query data from Drill. Great for existing Hive users.
  • -
- -

Application interfaces

- -

Drill currently provides JDBC/ODBC drivers for the applications to interact -along with a basic version of REST API and a C++ API. The following list -provides a few possible application interface opportunities:

- - - -

BI Tool testing

- -

Drill provides JDBC/ODBC drivers to connect to BI tools. We need to make sure -Drill works with all major BI tools. Doing a quick sanity testing with your -favorite BI tool is a good place to learn Drill and also uncover issues in -being able to do so.

- -

General CLI improvements

- -

Currently Drill uses SQLLine as the CLI. The goal of this effort is to improve -the CLI experience by adding functionality such as execute statements from a -file, output results to a file, display version information, and so on.

- -

Eco system integrations

- -

MapReduce

- -

Allow using result set from Drill queries as input to the Hadoop/MapReduce -jobs.

- -

Hive views

- -

Query data from existing Hive views using Drill queries. Drill needs to parse -the HiveQL and translate them appropriately (into Drill's SQL or -logical/physical plans) to execute the requests.

- -

YARN

- -

https://issues.apache.org/jira/browse/DRILL-1170

- -

Spark

- -

Provide ability to invoke Drill queries as part of Apache Spark programs. This -gives ability for Spark developers/users to leverage Drill richness of the -query layer , for data source access and as low latency execution engine.

- -

Hue

- -

Hue is a GUI for users to interact with various Hadoop eco system components -(such as Hive, Oozie, Pig, HBase, Impala ...). The goal of this project is to -expose Drill as an application inside Hue so users can explore Drill metadata -and do SQL queries.

- -

Phoenix

- -

Phoenix provides a low latency query layer on HBase for operational -applications. The goal of this effort is to explore opportunities for -integrating Phoenix with Drill.

- - - - - - -
-