drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject [08/12] drill git commit: fold in review changes
Date Tue, 17 Mar 2015 21:02:49 GMT
fold in review changes


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2b7773de
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2b7773de
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2b7773de

Branch: refs/heads/gh-pages
Commit: 2b7773de2ae66ee87f67db9d29b3ca411825b4c3
Parents: e73f2ec
Author: Kristine Hahn <khahn@maprtech.com>
Authored: Tue Mar 3 17:40:02 2015 -0800
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Wed Mar 4 10:35:33 2015 -0800

----------------------------------------------------------------------
 _docs/009-datasources.md              |   2 -
 _docs/009-dev-custom-func.md          |  37 ----
 _docs/010-manage.md                   |  14 --
 _docs/011-develop.md                  |   9 -
 _docs/012-rn.md                       | 191 ------------------
 _docs/013-contribute.md               |   9 -
 _docs/014-sample-ds.md                |  10 -
 _docs/015-design.md                   |  13 --
 _docs/016-progress.md                 |   8 -
 _docs/018-bylaws.md                   | 170 ----------------
 _docs/connect/007-default-frmt.md     |  11 +-
 _docs/data-sources/001-hive-types.md  |   3 +-
 _docs/data-sources/002-hive-udf.md    |   3 +-
 _docs/data-sources/003-parquet-ref.md |   4 +-
 _docs/data-sources/004-json-ref.md    | 299 ++++++++++++++++++-----------
 15 files changed, 205 insertions(+), 578 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/009-datasources.md
----------------------------------------------------------------------
diff --git a/_docs/009-datasources.md b/_docs/009-datasources.md
index 77348a7..5400f22 100644
--- a/_docs/009-datasources.md
+++ b/_docs/009-datasources.md
@@ -8,8 +8,6 @@ Included in the data sources that  Drill supports are these key data sources:
 * MapR-DB
 * File system
 
-. . .
-
 Drill supports the following input formats for data:
 
 * CSV (Comma-Separated-Values)

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/009-dev-custom-func.md
----------------------------------------------------------------------
diff --git a/_docs/009-dev-custom-func.md b/_docs/009-dev-custom-func.md
deleted file mode 100644
index f8a6445..0000000
--- a/_docs/009-dev-custom-func.md
+++ /dev/null
@@ -1,37 +0,0 @@
----
-title: "Develop Custom Functions"
----
-
-Drill provides a high performance Java API with interfaces that you can
-implement to develop simple and aggregate custom functions. Custom functions
-are reusable SQL functions that you develop in Java to encapsulate code that
-processes column values during a query. Custom functions can perform
-calculations and transformations that built-in SQL operators and functions do
-not provide. Custom functions are called from within a SQL statement, like a
-regular function, and return a single value.
-
-## Simple Function
-
-A simple function operates on a single row and produces a single row as the
-output. When you include a simple function in a query, the function is called
-once for each row in the result set. Mathematical and string functions are
-examples of simple functions.
-
-## Aggregate Function
-
-Aggregate functions differ from simple functions in the number of rows that
-they accept as input. An aggregate function operates on multiple input rows
-and produces a single row as output. The COUNT(), MAX(), SUM(), and AVG()
-functions are examples of aggregate functions. You can use an aggregate
-function in a query with a GROUP BY clause to produce a result set with a
-separate aggregate value for each combination of values from the GROUP BY
-clause.
-
-## Process
-
-To develop custom functions that you can use in your Drill queries, you must
-complete the following tasks:
-
-  1. Create a Java program that implements Drill’s simple or aggregate interface, and compile a sources and a classes JAR file.
-  2. Add the sources and classes JAR files to Drill’s classpath.
-  3. Add the name of the package that contains the classes to Drill’s main configuration file, drill-override.conf. 

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/010-manage.md
----------------------------------------------------------------------
diff --git a/_docs/010-manage.md b/_docs/010-manage.md
deleted file mode 100644
index ec6663b..0000000
--- a/_docs/010-manage.md
+++ /dev/null
@@ -1,14 +0,0 @@
----
-title: "Manage Drill"
----
-When using Drill, you may need to stop and restart a Drillbit on a node, or
-modify various options. For example, the default storage format for CTAS
-statements is Parquet. You can modify the default setting so that output data
-is stored in CSV or JSON format.
-
-You can use certain SQL commands to manage Drill from within the Drill shell
-(SQLLine). You can also modify Drill configuration options, such as memory
-allocation, in Drill's configuration files.
-
-  
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/011-develop.md
----------------------------------------------------------------------
diff --git a/_docs/011-develop.md b/_docs/011-develop.md
deleted file mode 100644
index 2b9ce67..0000000
--- a/_docs/011-develop.md
+++ /dev/null
@@ -1,9 +0,0 @@
----
-title: "Develop Drill"
----
-To develop Drill, you compile Drill from source code and then set up a project
-in Eclipse for use as your development environment. To review or contribute to
-Drill code, you must complete the steps required to install and use the Drill
-patch review tool.
-
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/012-rn.md
----------------------------------------------------------------------
diff --git a/_docs/012-rn.md b/_docs/012-rn.md
deleted file mode 100644
index 25ec29e..0000000
--- a/_docs/012-rn.md
+++ /dev/null
@@ -1,191 +0,0 @@
----
-title: "Release Notes"
----
-## Apache Drill 0.7.0 Release Notes
-
-Apache Drill 0.7.0, the third beta release for Drill, is designed to help
-enthusiasts start working and experimenting with Drill. It also continues the
-Drill monthly release cycle as we drive towards general availability.
-
-This release is available as
-[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-
-drill-0.7.0.tar.gz) and
-[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-
-drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop.
-Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop
-distributions. There are associated build profiles and JIRAs that can help you
-run Drill against your preferred distribution
-
-### Apache Drill 0.7.0 Key Features
-
-  * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios:
-
-    * UDP multicast not enabled (as in EC2)
-
-    * Cluster spans multiple subnets
-
-    * Cluster has multihome configuration
-
-  * New functions to natively work with nested data - KVGen and Flatten 
-
-  * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) 
-
-  * Improved performance when querying Hive tables and File system through partition pruning
-
-  * Improved performance for HBase with LIKE operator pushdown
-
-  * Improved memory management
-
-  * Drill web UI monitoring and query profile improvements
-
-  * Ability to parse files without explicit extensions using default storage format specification
-
-  * Fixes for dealing with complex/nested data objects in Parquet/JSON
-
-  * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly
-
-  * Several hang related fixes
-
-  * Parquet writer fixes for handling large datasets
-
-  * Stability improvements in ODBC and JDBC drivers
-
-### Apache Drill 0.7.0 Key Notes and Limitations
-
-  * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
-  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results.
-
-## Apache Drill 0.6.0 Release Notes
-
-Apache Drill 0.6.0, the second beta release for Drill, is designed to help
-enthusiasts start working and experimenting with Drill. It also continues the
-Drill monthly release cycle as we drive towards general availability.
-
-This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc
-ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and 
-[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu
-bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled
-against Apache Hadoop. Drill has been tested against MapR, Cloudera, and
-Hortonworks Hadoop distributions. There are associated build profiles and
-JIRAs that can help you run Drill against your preferred distribution.
-
-### Apache Drill 0.6.0 Key Features
-
-This release is primarily a bug fix release, with [more than 30 JIRAs closed](
-https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers
-ion=12327472), but there are some notable features:
-
-  * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/docs/mongodb-plugin-for-apache-drill)
-  * Filesystem query performance improvements with partition pruning
-  * Ability to use the file system as a persistent store for query profiles and diagnostic information
-  * Window function support (alpha)
-
-### Apache Drill 0.6.0 Key Notes and Limitations
-
-  * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
-  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results.
-
-## Apache Drill 0.5.0 Release Notes
-
-Apache Drill 0.5.0, the first beta release for Drill, is designed to help
-enthusiasts start working and experimenting with Drill. It also continues the
-Drill monthly release cycle as we drive towards general availability.
-
-The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h
-ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi
-on=12324880) closed, but there are some notable features. For information
-about the features, see the [Apache Drill Blog for the 0.5.0
-release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see).
-
-This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc
-ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and 
-[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu
-bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled
-against Apache Hadoop. Drill has been tested against MapR, Cloudera, and
-Hortonworks Hadoop distributions. There are associated build profiles and
-JIRAs that can help you run Drill against your preferred distribution.
-
-### Apache Drill 0.5.0 Key Notes and Limitations
-
-  * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
-  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results.
-  * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information.
-
-## Apache Drill 0.4.0 Release Notes
-
-The 0.4.0 release is a developer preview release, designed to help enthusiasts
-start to work with and experiment with Drill. It is the first Drill release
-that provides distributed query execution.
-
-This release is built upon [more than 800
-JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/).
-It is a pre-beta release on the way towards Drill. As a developer snapshot,
-the release contains a large number of outstanding bugs that will make some
-use cases challenging. Feel free to consult outstanding issues [targeted for
-the 0.5.0
-release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/)
-to see whether your use case is affected.
-
-To read more about this release and new features introduced, please view the
-[0.4.0 announcement blog
-entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4).
-
-The release is available as both [binary](http://www.apache.org/dyn/closer.cgi
-/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz)
-and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0-
-incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases,
-these are compiled against Apache Hadoop. Drill has also been tested against
-MapR, Cloudera and Hortonworks Hadoop distributions and there are associated
-build profiles or JIRAs that can help you run against your preferred
-distribution.
-
-### Some Key Notes & Limitations
-
-  * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
-  * In many cases,merge join operations return incorrect results.
-  * Use of a local filter in a join “on” clause when using left, right or full outer joins may result in incorrect results.
-  * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases.
-  * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results.
-  * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results.
-  * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways.
-  * Multiplication of some types of DECIMAL(28+,*) will return incorrect result.
-
-## Apache Drill M1 -- Release Notes (Apache Drill Alpha)
-
-### Milestone 1 Goals
-
-The first release of Apache Drill is designed as a technology preview for
-people to better understand the architecture and vision. It is a functional
-release tying to piece together the key components of a next generation MPP
-query engine. It is designed to allow milestone 2 (M2) to focus on
-architectural analysis and performance optimization.
-
-  * Provide a new optimistic DAG execution engine for data analysis
-  * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity
-  * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance
-  * Support queries against columnar on disk format (Parquet) and JSON
-  * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN
-  * Support schema-on-read querying and execution
-  * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal.
-  * Support unlimited level of subqueries and correlated subqueries
-  * Provided an extensible query-language agnostic JSON-base logical data flow syntax.
-  * Support complex data type manipulation via logical plan operations
-
-### Known Issues
-
-SQL Parsing  
-Because Apache Drill is built to support late-bound changing schemas while SQL
-is statically typed, there are couple of special requirements that are
-required writing SQL queries. These are limited to the current release and
-will be correct in a future milestone release.
-
-  * All tables are exposed as a single map field that contains
-  * Drill Alpha doesn't support implicit or explicit casts outside those required above.
-  * Drill Alpha does not include, there are currently a couple of differences for how to write a query in In order to query against
-
-### UDFs
-
-  * Drill currently supports simple and aggregate functions using scalar, repeated and
-  * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. However,
-  * asd
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/013-contribute.md
----------------------------------------------------------------------
diff --git a/_docs/013-contribute.md b/_docs/013-contribute.md
deleted file mode 100644
index 42108b9..0000000
--- a/_docs/013-contribute.md
+++ /dev/null
@@ -1,9 +0,0 @@
----
-title: "Contribute to Drill"
----
-The Apache Drill community welcomes your support. Please read [Apache Drill
-Contribution Guidelines](/docs/apache-drill-contribution-guidelines) for information about how to contribute to
-the project. If you would like to contribute to the project and need some
-ideas for what to do, please read [Apache Drill Contribution
-Ideas](/docs/apache-drill-contribution-ideas).
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/014-sample-ds.md
----------------------------------------------------------------------
diff --git a/_docs/014-sample-ds.md b/_docs/014-sample-ds.md
deleted file mode 100644
index c6f51e1..0000000
--- a/_docs/014-sample-ds.md
+++ /dev/null
@@ -1,10 +0,0 @@
----
-title: "Sample Datasets"
----
-Use any of the following sample datasets provided to test Drill:
-
-  * [AOL Search](/docs/aol-search)
-  * [Enron Emails](/docs/enron-emails)
-  * [Wikipedia Edit History](/docs/wikipedia-edit-history)
-
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/015-design.md
----------------------------------------------------------------------
diff --git a/_docs/015-design.md b/_docs/015-design.md
deleted file mode 100644
index 474052e..0000000
--- a/_docs/015-design.md
+++ /dev/null
@@ -1,13 +0,0 @@
----
-title: "Design Docs"
----
-Review the Apache Drill design docs for early descriptions of Apache Drill
-functionality, terms, and goals, and reference the research articles to learn
-about Apache Drill's history:
-
-  * [Drill Plan Syntax](/docs/drill-plan-syntax)
-  * [RPC Overview](/docs/rpc-overview)
-  * [Query Stages](/docs/query-stages)
-  * [Useful Research](/docs/useful-research)
-  * [Value Vectors](/docs/value-vectors)
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/016-progress.md
----------------------------------------------------------------------
diff --git a/_docs/016-progress.md b/_docs/016-progress.md
deleted file mode 100644
index 680290e..0000000
--- a/_docs/016-progress.md
+++ /dev/null
@@ -1,8 +0,0 @@
----
-title: "Progress Reports"
----
-Review the following Apache Drill progress reports for a summary of issues,
-progression of the project, summary of mailing list discussions, and events:
-
-  * [2014 Q1 Drill Report](/docs/2014-q1-drill-report)
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/018-bylaws.md
----------------------------------------------------------------------
diff --git a/_docs/018-bylaws.md b/_docs/018-bylaws.md
deleted file mode 100644
index 2c35042..0000000
--- a/_docs/018-bylaws.md
+++ /dev/null
@@ -1,170 +0,0 @@
----
-title: "Project Bylaws"
----
-## Introduction
-
-This document defines the bylaws under which the Apache Drill project
-operates. It defines the roles and responsibilities of the project, who may
-vote, how voting works, how conflicts are resolved, etc.
-
-Drill is a project of the [Apache Software
-Foundation](http://www.apache.org/foundation/). The foundation holds the
-copyright on Apache code including the code in the Drill codebase. The
-[foundation FAQ](http://www.apache.org/foundation/faq.html) explains the
-operation and background of the foundation.
-
-Drill is typical of Apache projects in that it operates under a set of
-principles, known collectively as the _Apache Way_. If you are new to Apache
-development, please refer to the [Incubator
-project](http://incubator.apache.org/) for more information on how Apache
-projects operate.
-
-## Roles and Responsibilities
-
-Apache projects define a set of roles with associated rights and
-responsibilities. These roles govern what tasks an individual may perform
-within the project. The roles are defined in the following sections.
-
-### Users
-
-The most important participants in the project are people who use our
-software. The majority of our contributors start out as users and guide their
-development efforts from the user's perspective.
-
-Users contribute to the Apache projects by providing feedback to contributors
-in the form of bug reports and feature suggestions. As well, users participate
-in the Apache community by helping other users on mailing lists and user
-support forums.
-
-### Contributors
-
-All of the volunteers who are contributing time, code, documentation, or
-resources to the Drill Project. A contributor that makes sustained, welcome
-contributions to the project may be invited to become a committer, though the
-exact timing of such invitations depends on many factors.
-
-### Committers
-
-The project's committers are responsible for the project's technical
-management. Committers have access to a specified set of subproject's code
-repositories. Committers on subprojects may cast binding votes on any
-technical discussion regarding that subproject.
-
-Committer access is by invitation only and must be approved by lazy consensus
-of the active PMC members. A Committer is considered _emeritus_ by his or her
-own declaration or by not contributing in any form to the project for over six
-months. An emeritus committer may request reinstatement of commit access from
-the PMC which will be sufficient to restore him or her to active committer
-status.
-
-Commit access can be revoked by a unanimous vote of all the active PMC members
-(except the committer in question if he or she is also a PMC member).
-
-All Apache committers are required to have a signed [Contributor License
-Agreement (CLA)](http://www.apache.org/licenses/icla.txt) on file with the
-Apache Software Foundation. There is a [Committer
-FAQ](http://www.apache.org/dev/committers.html) which provides more details on
-the requirements for committers.
-
-A committer who makes a sustained contribution to the project may be invited
-to become a member of the PMC. The form of contribution is not limited to
-code. It can also include code review, helping out users on the mailing lists,
-documentation, etc.
-
-### Project Management Committee
-
-The PMC is responsible to the board and the ASF for the management and
-oversight of the Apache Drill codebase. The responsibilities of the PMC
-include
-
-  * Deciding what is distributed as products of the Apache Drill project. In particular all releases must be approved by the PMC.
-  * Maintaining the project's shared resources, including the codebase repository, mailing lists, websites.
-  * Speaking on behalf of the project.
-  * Resolving license disputes regarding products of the project.
-  * Nominating new PMC members and committers.
-  * Maintaining these bylaws and other guidelines of the project.
-
-Membership of the PMC is by invitation only and must be approved by a lazy
-consensus of active PMC members. A PMC member is considered _emeritus_ by his
-or her own declaration or by not contributing in any form to the project for
-over six months. An emeritus member may request reinstatement to the PMC,
-which will be sufficient to restore him or her to active PMC member.
-
-Membership of the PMC can be revoked by an unanimous vote of all the active
-PMC members other than the member in question.
-
-The chair of the PMC is appointed by the ASF board. The chair is an office
-holder of the Apache Software Foundation (Vice President, Apache Drill) and
-has primary responsibility to the board for the management of the projects
-within the scope of the Drill PMC. The chair reports to the board quarterly on
-developments within the Drill project.
-
-The term of the chair is one year. When the current chair's term is up or if
-the chair resigns before the end of his or her term, the PMC votes to
-recommend a new chair using lazy consensus, but the decision must be ratified
-by the Apache board.
-
-## Decision Making
-
-Within the Drill project, different types of decisions require different forms
-of approval. For example, the previous section describes several decisions
-which require 'lazy consensus' approval. This section defines how voting is
-performed, the types of approvals, and which types of decision require which
-type of approval.
-
-### Voting
-
-Decisions regarding the project are made by votes on the primary project
-development mailing list
-_[dev@drill.apache.org](mailto:dev@drill.apache.org)_. Where necessary, PMC
-voting may take place on the private Drill PMC mailing list
-[private@drill.apache.org](mailto:private@drill.apache.org). Votes are clearly
-indicated by subject line starting with [VOTE]. Votes may contain multiple
-items for approval and these should be clearly separated. Voting is carried
-out by replying to the vote mail. Voting may take four flavors.
-
- <table ><tbody><tr><td valign="top" >Vote</td><td valign="top" > </td></tr><tr><td valign="top" >+1</td><td valign="top" >'Yes,' 'Agree,' or 'the action should be performed.' In general, this vote also indicates a willingness on the behalf of the voter in 'making it happen'.</td></tr><tr><td valign="top" >+0</td><td valign="top" >This vote indicates a willingness for the action under consideration to go ahead. The voter, however will not be able to help.</td></tr><tr><td valign="top" >-0</td><td valign="top" >This vote indicates that the voter does not, in general, agree with the proposed action but is not concerned enough to prevent the action going ahead.</td></tr><tr><td valign="top" >-1</td><td valign="top" >This is a negative vote. On issues where consensus is required, this vote counts as a <strong>veto</strong>. All vetoes must contain an explanation of why the veto is appropriate. Vetoes with no explanation are void. It may also be appropriate for a -1 vote to include an al
 ternative course of action.</td></tr></tbody></table>
-  
-All participants in the Drill project are encouraged to show their agreement
-with or against a particular action by voting. For technical decisions, only
-the votes of active committers are binding. Non binding votes are still useful
-for those with binding votes to understand the perception of an action in the
-wider Drill community. For PMC decisions, only the votes of PMC members are
-binding.
-
-Voting can also be applied to changes already made to the Drill codebase.
-These typically take the form of a veto (-1) in reply to the commit message
-sent when the commit is made. Note that this should be a rare occurrence. All
-efforts should be made to discuss issues when they are still patches before
-the code is committed.
-
-### Approvals
-
-These are the types of approvals that can be sought. Different actions require
-different types of approvals.
-
-<table ><tbody><tr><td valign="top" >Approval Type</td><td valign="top" > </td></tr><tr><td valign="top" >Consensus</td><td valign="top" >For this to pass, all voters with binding votes must vote and there can be no binding vetoes (-1). Consensus votes are rarely required due to the impracticality of getting all eligible voters to cast a vote.</td></tr><tr><td valign="top" >Lazy Consensus</td><td valign="top" >Lazy consensus requires 3 binding +1 votes and no binding vetoes.</td></tr><tr><td valign="top" >Lazy Majority</td><td valign="top" >A lazy majority vote requires 3 binding +1 votes and more binding +1 votes that -1 votes.</td></tr><tr><td valign="top" >Lazy Approval</td><td valign="top" >An action with lazy approval is implicitly allowed unless a -1 vote is received, at which time, depending on the type of action, either lazy majority or lazy consensus approval must be obtained.</td></tr></tbody></table>  
-  
-### Vetoes
-
-A valid, binding veto cannot be overruled. If a veto is cast, it must be
-accompanied by a valid reason explaining the reasons for the veto. The
-validity of a veto, if challenged, can be confirmed by anyone who has a
-binding vote. This does not necessarily signify agreement with the veto -
-merely that the veto is valid.
-
-If you disagree with a valid veto, you must lobby the person casting the veto
-to withdraw his or her veto. If a veto is not withdrawn, the action that has
-been vetoed must be reversed in a timely manner.
-
-### Actions
-
-This section describes the various actions which are undertaken within the
-project, the corresponding approval required for that action and those who
-have binding votes over the action. It also specifies the minimum length of
-time that a vote must remain open, measured in business days. In general votes
-should not be called at times when it is known that interested members of the
-project will be unavailable.
-
-<table ><tbody><tr><td valign="top" >Action</td><td valign="top" >Description</td><td valign="top" >Approval</td><td valign="top" >Binding Votes</td><td valign="top" >Minimum Length</td></tr><tr><td valign="top" >Code Change</td><td valign="top" >A change made to a codebase of the project and committed by a committer. This includes source code, documentation, website content, etc.</td><td valign="top" >Consensus approval of active committers, with a minimum of one +1. The code can be committed after the first +1</td><td valign="top" >Active committers</td><td valign="top" >1</td></tr><tr><td valign="top" >Release Plan</td><td valign="top" >Defines the timetable and actions for a release. The plan also nominates a Release Manager.</td><td valign="top" >Lazy majority</td><td valign="top" >Active committers</td><td valign="top" >3</td></tr><tr><td valign="top" >Product Release</td><td valign="top" >When a release of one of the project's products is ready, a vote is required to accept t
 he release as an official release of the project.</td><td valign="top" >Lazy Majority</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >Adoption of New Codebase</td><td valign="top" >When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr><tr><td valign="top" >New Committer</td><td valign="top" >When a new committer is proposed for the project.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign="top" >3</td></tr><tr><td valign="top" >New PMC Member</td><td valign="top" >When a committer is proposed for the PMC.</td><td valign="top" >Lazy consensus</td><td valign="top" >Active PMC members</td><td valign
 ="top" >3</td></tr><tr><td valign="top" >Committer Removal</td><td valign="top" >When removal of commit privileges is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the committer in question if a member of the PMC).</td><td valign="top" >6</td></tr><tr><td valign="top" >PMC Member Removal</td><td valign="top" >When removal of a PMC member is sought. <em>Note: Such actions will also be referred to the ASF board by the PMC chair.</em></td><td valign="top" >Consensus</td><td valign="top" >Active PMC members (excluding the member in question).</td><td valign="top" >6</td></tr><tr><td valign="top" >Modifying Bylaws</td><td valign="top" >Modifying this document.</td><td valign="top" >2/3 majority</td><td valign="top" >Active PMC members</td><td valign="top" >6</td></tr></tbody></table>
-

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/connect/007-default-frmt.md
----------------------------------------------------------------------
diff --git a/_docs/connect/007-default-frmt.md b/_docs/connect/007-default-frmt.md
index 31cfe29..9325bdb 100644
--- a/_docs/connect/007-default-frmt.md
+++ b/_docs/connect/007-default-frmt.md
@@ -57,4 +57,13 @@ steps:
             "location" : "/max/proddata",
             "writable" : true,
             "defaultInputFormat" : "json"
-        }
\ No newline at end of file
+        }
+
+## Querying Compressed JSON
+
+You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed files having the .json extension. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file.
+
+      "extensions": [
+        "json",
+        "gz"
+      ]
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/data-sources/001-hive-types.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/001-hive-types.md b/_docs/data-sources/001-hive-types.md
index 34d5bb6..c6cdb90 100644
--- a/_docs/data-sources/001-hive-types.md
+++ b/_docs/data-sources/001-hive-types.md
@@ -1,5 +1,6 @@
+---
 title: "Hive-to-Drill Data Type Mapping"
-parent: "Data Sources"
+parent: "Data Sources and File Formats"
 ---
 Using Drill you can read tables created in Hive that use data types compatible with Drill. Drill currently does not support writing Hive tables. The following table shows Drill support for Hive primitive types:
 <table>

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/data-sources/002-hive-udf.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/002-hive-udf.md b/_docs/data-sources/002-hive-udf.md
index ba82145..266e433 100644
--- a/_docs/data-sources/002-hive-udf.md
+++ b/_docs/data-sources/002-hive-udf.md
@@ -1,5 +1,6 @@
+---
 title: "Deploying and Using a Hive UDF"
-parent: "Data Sources"
+parent: "Data Sources and File Formats"
 ---
 If the extensive Hive functions, such as the mathematical and date functions, which Drill supports do not meet your needs, you can use a Hive UDF in Drill queries. Drill supports your existing Hive scalar UDFs. You can do queries on Hive tables and access existing Hive input/output formats, including custom serdes. Drill serves as a complement to Hive deployments by offering low latency queries.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/data-sources/003-parquet-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/003-parquet-ref.md b/_docs/data-sources/003-parquet-ref.md
index 6fee4a6..aa2ff11 100644
--- a/_docs/data-sources/003-parquet-ref.md
+++ b/_docs/data-sources/003-parquet-ref.md
@@ -1,6 +1,6 @@
-
+---
 title: "Parquet Format"
-parent: "Data Sources"
+parent: "Data Sources and File Formats"
 ---
 ## Parquet Format
 [Apache Parquet](http://parquet.incubator.apache.org/documentation/latest) has the following characteristics:

http://git-wip-us.apache.org/repos/asf/drill/blob/2b7773de/_docs/data-sources/004-json-ref.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources/004-json-ref.md b/_docs/data-sources/004-json-ref.md
index d60db06..6119e2e 100644
--- a/_docs/data-sources/004-json-ref.md
+++ b/_docs/data-sources/004-json-ref.md
@@ -1,6 +1,6 @@
-
+---
 title: "JSON Data Model"
-parent: "Data Sources"
+parent: "Data Sources and File Formats"
 ---
 Drill supports [JSON (JavaScript Object Notation)](http://www.json.org/), a self-describing data format. The data itself implies its schema and has the following characteristics:
 
@@ -12,11 +12,9 @@ Semi-structured JSON data often consists of complex, nested elements having sche
 
 Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. 
 
-Using Drill you can natively query dynamic JSON data sets using SQL. Drill treats a JSON object as a SQL record. One object equals one row in a Drill table. 
-
-Drill 0.8 and higher can  query compressed .gz files having JSON as well as uncompressed .json files.<<link to section>>.
+Drill 0.8 and higher can [query compressed .gz files](/docs/drill-default-input-format#querying-compressed-json) having JSON as well as uncompressed .json files. 
 
-n addition to the examples presented later in this section, see "How to Analyze Highly Dynamic Datasets with Apache Drill" (https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill) for information about how to analyze a JSON data set.
+In addition to the examples presented later in this section, see "How to Analyze Highly Dynamic Datasets with Apache Drill" (https://www.mapr.com/blog/how-analyze-highly-dynamic-datasets-apache-drill) for information about how to analyze a JSON data set.
 
 ## Data Type Mapping
 JSON data consists of the following types:
@@ -69,24 +67,22 @@ Use all text mode to prevent the schema change error described in the previous s
 
 When you set this option, Drill reads all data from the JSON files as VARCHAR. After reading the data, use a SELECT statement in Drill to cast data as follows:
 
-* Cast [JSON numeric values](/docs/lession-2-run-queries-with-ansi-sql#return-customer-data-with-appropriate-data-types) to SQL types, such as BIGINT, DECIMAL, FLOAT, INTEGER, and SMALLINT.
+* Cast JSON numeric values to [SQL types](/docs/data-types), such as BIGINT, DECIMAL, FLOAT, INTEGER, and SMALLINT.
 * Cast JSON strings to [Drill Date/Time Data Type Formats](/docs/supported-date-time-data-type-formats).
 
-For example, apply a [Drill view] (link to view reference) to the data. 
-
-Drill uses [map and array data types](/docs/data-types) internally for reading and writing complex and nested data structures from JSON. <<true?>>
+Drill uses [map and array data types](/docs/data-types) internally for reading and writing complex and nested data structures from JSON. You can cast data in a map or array of data to return a value from the structure, as shown in [“Create a view on a MapR-DB table”] (/docs/lession-2-run-queries-with-ansi-sql). “Query Complex Data” shows how to access nested arrays, for example.
 
 ## Reading JSON
-To read JSON data using Drill, use a [file system storage plugin](link to plugin section) that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition. 
+To read JSON data using Drill, use a [file system storage plugin](/docs/connect-to-a-data-source) that defines the JSON format. You can use the `dfs` storage plugin, which includes the definition. 
 
-JSON data is often complex. Data can be deeply nested and semi-structured. but [you can use workarounds ](link to section) covered later.
+JSON data is often complex. Data can be deeply nested and semi-structured. but [you can use workarounds ](/docs/json-data-model#limitations-and-workaroumds) covered later.
 
 Drill reads tuples defined in single objects, having no comma between objects. A JSON object is an unordered set of name/value pairs. Curly braces delimit objects in the JSON file:
 
     { name: "Apples", desc: "Delicious" }
     { name: "Oranges", desc: "Florida Navel" }
     
-To read and [analyze complex JSON](link to Analyzing JSON) files, use the FLATTEN and KVGEN functions. Observe the following guidelines when reading JSON files:
+To read and [analyze complex JSON](/docs/json-data-model#analyzing-json) files, use the FLATTEN and KVGEN functions. Observe the following guidelines when reading JSON files:
 
 * Avoid queries that return objects larger than ??MB (16?).
   These queries might be far less performant than those that return smaller objects.
@@ -122,34 +118,31 @@ Drill performs the following actions, as shown in the complete [CTAS command exa
    
 * Creates a directory using table name.
 * Writes the JSON data to the directory in the workspace location.
-   
-Observe the following size limitations pertaining to JSON objects:
 
-* Objects must be smaller than the chunk size.
-* Objects must be smaller than ?GB (2?) on 32- and some 64-bit systems.
-* Objects must be smaller than the amount of memory available to Drill.
 
 ## Analyzing JSON
 
-Generally, you query JSON files using the following syntax:
+Generally, you query JSON files using the following syntax, which includes a table qualifier. The qualifier is typically required for querying complex data:
 
 * Dot notation to drill down into a JSON map.
 
-        SELECT level1.level2. . . . leveln FROM <storage plugin location>`myfile.json`
+        SELECT t.level1.level2. . . . leveln FROM <storage plugin location>`myfile.json` t
         
 * Use square brackets, array-style notation to drill down into a JSON array.
 
-        SELECT level1.level2[n][2] FROM <storage plugin location>`myfile.json`;
+        SELECT t.level1.level2[n][2] FROM <storage plugin location>`myfile.json` t;
     
   The first index position of an array is 0.
 
+Drill returns null when a document does not have the specified map or level.
+
 Using the following techniques, you can query complex, nested JSON:
 
-* Generate key/value pairs for loosely structured data
 * Flatten nested data 
+* Generate key/value pairs for loosely structured data
 
-### Generate Key/Value Pairs
-Use the ‘KVGen’ (Key Value Generator) with complex data that contains arbitrary maps consisting of dynamic and unknown element names, such as ticket_info in the following example:
+## Example: Flatten and Generate Key Values for Complex JSON
+This example uses the following data that represents unit sales of tickets to events that were sold over a period of for several days in different states:
 
     {
       "type": "ticket",
@@ -172,30 +165,53 @@ Use the ‘KVGen’ (Key Value Generator) with complex data that contains arbitr
       }
     }
     
+Take a look at the data in Drill:
 
-This query reads the data, and the output shows how Drill restructures it:
+    SELECT * FROM dfs.`/Users/drilluser/ticket_sales.json`;
+	+------------+------------+------------+------------+------------+
+	|    type    |  channel   |   month    |    day     |   sales    |
+	+------------+------------+------------+------------+------------+
+	| ticket     | 123455     | 12         | ["15","25","28","31"] | {"NY":"532806","PA":"112889","TX":"898999","UT":"10875"} |
+	| ticket     | 123456     | 12         | ["10","15","19","31"] | {"NY":"972880","PA":"857475","CA":"87350","OR":"49999"} |
+	+------------+------------+------------+------------+------------+
+	2 rows selected (0.041 seconds)
 
-    SELECT * FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
-    
-    +------------+------------+------------+
-	|    type    |   venue    |   sales    |
-	+------------+------------+------------+
-	| ticket     | 123455     | {"12-10":532806,"12-11":112889,"12-19":898999,"12-21":10875} |
-	| ticket     | 123456     | {"12-10":87350,"12-19":49999,"12-21":857475,"12-15":972880} |
-	+------------+------------+------------+
-	2 rows selected (0.895 seconds)
+### Flatten JSON Data
+The flatten function breaks the following _day arrays from the JSON example file shown earlier into separate rows.
 
-`KVGen` turns the dynamic map into an array of key-value pairs where keys represent the dynamic element names.
+    "_day": [ 15, 25, 28, 31 ] 
+    "_day": [ 10, 15, 19, 31 ]
 
-    SELECT kvgen(sales) Revenue FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`;
-    
-	+--------------+
-	|   Revenue    |
-	+--------------+
-	| [{"key":"12-10","value":532806},{"key":"12-11","value":112889},{"key":"12-19","value":898999},{"key":"12-21","value":10875}] |
-	| [{"key":"12-10","value":87350},{"key":"12-19","value":49999},{"key":"12-21","value":857475},{"key":"12-15","value":972880}] |
-	+--------------+
-	2 rows selected (0.341 seconds)
+Flatten the sales column of the ticket data onto separate rows, one row for each day in the array, for a better view of the data. Flatten copies the sales data related in the JSON object on each row.  Using the all (*) wildcard as the argument to flatten is not supported and returns an error.
+
+SELECT flatten(tkt._day) AS `day`, tkt.sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
+    +------------+------------+
+	|    day     |   sales    |
+	+------------+------------+
+	| 15         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
+	| 25         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
+	| 28         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
+	| 31         | {"NY":532806,"PA":112889,"TX":898999,"UT":10875} |
+	| 10         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
+	| 15         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
+	| 19         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
+	| 31         | {"NY":972880,"PA":857475,"CA":87350,"OR":49999} |
+	+------------+------------+
+	8 rows selected (0.072 seconds)
+
+### Generate Key/Value Pairs
+Use the kvgen (Key Value Generator) function to generate key/value pairs from complex data. Generating key/value pairs is often helpful when working with data that contains arbitrary maps consisting of dynamic and unknown element names, such as the ticket sales data by state. For example purposes, take a look at how kvgen breaks the sales data into keys and values representing the states and number of tickets sold:
+
+    SELECT kvgen(tkt.sales) AS state_sales FROM dfs.`/Users/drilluser/ticket_sales.json` tkt;
+	+-------------+
+	| state_sales |
+	+-------------+
+	| [{"key":"NY","value":532806},{"key":"PA","value":112889},{"key":"TX","value":898999},{"key":"UT","value":10875}] |
+	| [{"key":"NY","value":972880},{"key":"PA","value":857475},{"key":"CA","value":87350},{"key":"OR","value":49999}] |
+	+-------------+
+	2 rows selected (0.039 seconds)
+
+The purpose of using kvgen function is to allow queries against maps where the keys themselves represent data rather than a schema, as shown in the next example.
 
 ### Flatten JSON Data
 
@@ -218,33 +234,30 @@ This query reads the data, and the output shows how Drill restructures it:
 	8 rows selected (0.171 seconds)
 
 ### Example: Aggregate Loosely Structured Data
-Continuing with the previous example, make sure all text mode is set to false to sum numerical values. 
+Use flatten and kvgen together to analyze the data. Continuing with the previous example, make sure all text mode is set to false to sum numerical values. Drill returns an error if you attempt to sum data in in all text mode. 
 
     ALTER SYSTEM SET `store.json.all_text_mode` = false;
     
 Sum the ticket sales by combining the `sum`, `flatten`, and `kvgen` functions in a single query.
 
-    SELECT sum(tickettbl.tickets.`value`) AS Revenue 
-    FROM (SELECT flatten(kvgen(sales)) tickets 
-    FROM  dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json` ) tickettbl;
-    
-	+------------+
-	|  Revenue   |
+    SELECT SUM(tkt.tot_sales.`value`) AS TotalSales FROM (SELECT flatten(kvgen(sales)) tot_sales FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt;
+
+    +------------+
+	| TotalSales |
 	+------------+
 	| 3523273    |
 	+------------+
-	1 row selected (0.194 seconds)
-
+	1 row selected (0.081 seconds)
 
 ### Example: Aggregate and Sort Data
-Sum the ticket sales for each date in December, and sort by total sales in ascending order.
+Sum the ticket sales by state and group by state and sort in ascending order. 
 
-    SELECT `right`(tickettbl.tickets.key,2) December_Date, 
-    sum(tickettbl.tickets.`value`) Revenue 
-    FROM (select flatten(kvgen(sales)) tickets 
-    FROM dfs.`/Users/drilluser/drill/apache-drill-0.8.0-SNAPSHOT/ticket_sales.json`) tickettbl
-    GROUP BY `right`(tickettbl.tickets.key,2) 
-    ORDER BY Revenue;
+    SELECT `right`(tkt.tot_sales.key,2) State, 
+    SUM(tkt.tot_sales.`value`) AS TotalSales 
+    FROM (SELECT flatten(kvgen(sales)) tot_sales 
+    FROM dfs.`/Users/drilluser/ticket_sales.json`) tkt 
+    GROUP BY `right`(tkt.tot_sales.key,2) 
+    ORDER BY TotalSales;
 
 	+---------------+--------------+
 	| December_Date | Revenue      |
@@ -258,12 +271,44 @@ Sum the ticket sales for each date in December, and sort by total sales in ascen
 	5 rows selected (0.203 seconds)
 
 ### Example: Analyze a Map Field in an Array
-To access a map field in an array, use dot notation to drill down through the hierarchy of the JSON data to the field. The following example shows how to drill down to get the MAPBLKLOT property value the [City Lots San Francisco in .json](https://github.com/zemirco/sf-city-lots-json).
-
-![drill query flow]({{ site.baseurl }}/docs/img/json-workaround.png)
-
-        SELECT features[0].properties.MAPBLKLOT,  
-        FROM <storage location>.`citylots.json`;
+To access a map field in an array, use dot notation to drill down through the hierarchy of the JSON data to the field. Examples are based on the following [City Lots San Francisco in .json](https://github.com/zemirco/sf-city-lots-json), modified slightly as described in the empty array workaround in ["Limitations and Workarounds."](/docs/json-data-model#empty-array)
+
+{
+"type": "FeatureCollection",
+"features": [
+   { 
+   	 "type": "Feature", 
+     "properties": 
+     { 
+       "MAPBLKLOT": "0001001", 
+       "BLKLOT": "0001001", 
+       "BLOCK_NUM": "0001", 
+       "LOT_NUM": "001", 
+       "FROM_ST": "0", 
+       "TO_ST": "0", 
+       "STREET": "UNKNOWN", 
+       "ST_TYPE": null, 
+       "ODD_EVEN": "E" }, 
+       "geometry": 
+       { 
+          "type": "Polygon", 
+          "coordinates": 
+          [ [ 
+          [ -122.422003528252475, 37.808480096967251, 0.0 ], 
+          [ -122.422076013325281, 37.808835019815085, 0.0 ], 
+          [ -122.421102174348633, 37.808803534992904, 0.0 ], 
+          [ -122.421062569067274, 37.808601056818148, 0.0 ], 
+          [ -122.422003528252475, 37.808480096967251, 0.0 ] 
+          ] ] 
+       } 
+     },
+   { 
+      "type": "Feature", 
+   . . .
+
+This example shows you how to drill down using array notation plus dot notation in features[0].properties.MAPBLKLOT to get the MAPBLKLOT property value in the San Francisco city lots data:
+
+        SELECT features[0].properties.MAPBLKLOT, FROM dfs.`/Users/drilluser/citylots.json`;
           
         +------------+
 		|   EXPR$0   |
@@ -272,10 +317,10 @@ To access a map field in an array, use dot notation to drill down through the hi
 		+------------+
 		1 row selected (0.163 seconds)
 		
-To access the second geometry coordinate of the first city lot in the San Francisco city lots, use dot notation and array indexing notation:
+To access the second geometry coordinate of the first city lot in the San Francisco city lots, use array indexing notation for the coordinates as well as the features:
 		
 		SELECT features[0].geometry.coordinates[0][1] 
-		FROM <storage location>.`citylots.json`;
+        FROM dfs.`/Users/drilluser/citylots.json`;
 		+------------+
 		|   EXPR$0   |
 		+------------+
@@ -285,15 +330,30 @@ To access the second geometry coordinate of the first city lot in the San Franci
 
 More examples of drilling down into an array are shown in ["Selecting Nested Data for a Column"](/docs/query-3-selecting-nested-data-for-a-column). 
 
+### Example: Flatten an Array of Maps using a Subquery
+By flattening the following JSON file, which contains an array of maps, you can evaluate the records of the flattened data. 
+
+    {"name":"classic","fillings":[ {"name":"sugar","cal":500} , {"name":"flour","cal":300} ] }
+
+    SELECT flat.fill FROM (SELECT flatten(t.fillings) AS fill FROM dfs.flatten.`test.json` t) flat WHERE flat.fill.cal  > 300;
+
+    +------------+
+	|    fill    |
+	+------------+
+	| {"name":"sugar","cal":500} |
+	+------------+
+	1 row selected (0.421 seconds)
+
+Use a table qualifier for column fields and functions when working with complex data sets. Currently, you must use a subquery when operating on a flattened column. Eliminating the subquery and table qualifier in the WHERE clause, for example `flat.fillings[0].cal > 300`, does not evaluate all records of the flattened data against the predicate and produces the wrong results.
+
 ### Example: Analyze Map Fields in a Map
-This example uses a WHERE clause to drill down to a third level of the following JSON hierarchy to get the Id and weight of the person whose max_hdl exceeds 160, use dot notation as shown in the query that follows:
+This example uses a WHERE clause to drill down to a third level of the following JSON hierarchy to get the max_hdl greater than 160:
 
-    {
+       {
 	    "SOURCE": "Allegheny County",
 	    "TIMESTAMP": 1366369334989,
 	    "birth": {
 	        "id": 35731300,
-	        "dur": 215923,
 	        "firstname": "Jane",
 	        "lastname": "Doe",
 	        "weight": "CATEGORY_1",
@@ -304,31 +364,36 @@ This example uses a WHERE clause to drill down to a third level of the following
 	            "max_hdl": 200
 	        }
 	    }
-	} . . .
-
-	SELECT tbl.birth.id AS Id, tbl.birth.weight AS Weight 
-	FROM dfs.`/Users/drilluser/drill/vitalstat.json` AS tbl 
-	WHERE tbl.birth.id IN (
-	SELECT tbl1.birth.id 
-	FROM dfs.`/Users/drilluser/drill/vitalstat.json` AS tbl1 
-	WHERE tbl1.birth.bearer.max_hdl > 160); 
-	
-	+------------+------------+
-	|     Id     |   Weight   |
-	+------------+------------+
-	| 35731300   | CATEGORY_1 |
-	+------------+------------+
-	1 row selected (1.424 seconds)
+	}
+{
+            "SOURCE": "Marin County",
+            "TIMESTAMP": 1366369334,
+            "birth": {
+                "id": 35731309,
+                "firstname": "Somporn",
+                "lastname": "Thongnopneua",
+                "weight": "CATEGORY_2",
+                "bearer": {
+                    "father": "Jeiranan Thongnopneua",
+                    "ss": "208-25-2223",
+                    "max_ldl": 110,
+                    "max_hdl": 150
+                }
+            }
+        }
 
-## Querying Compressed JSON
+Use dot notation, for example `t.birth.lastname` and `t.birth.bearer.max_hdl` to drill down to the nested level:
 
-You can use Drill 0.8 and later to query compressed JSON in .gz files as well as uncompressed files having the .json extension as described in Reading and Writing JSON Files<<link to section>>. First, add the gz extension to a storage plugin, and then use that plugin to query the compressed file.
+SELECT t.birth.lastname AS Name, t.birth.weight AS Weight 
+FROM dfs.`Users/drilluser/vitalstat.json` t 
+WHERE t.birth.bearer.max_hdl < 160;
 
-      "extensions": [
-        "json",
-        "gz"
-      ]
-<<Is this going to be in 0.8?>>
++------------+------------+
+|    Name    |   Weight   |
++------------+------------+
+| Thongneoupeanu | CATEGORY_2 |
++------------+------------+
+1 row selected (0.142 seconds)
 
 ## Limitations and Workarounds
 In most cases, you can use a workaround, presented in the following sections, to overcome the following limitations:
@@ -337,6 +402,7 @@ In most cases, you can use a workaround, presented in the following sections, to
 * Complex nested data
 * Empty array
 * Lengthy JSON objects
+* Complex JSON objects
 * Nested column names
 * Schema changes
 * Selecting all in a JSON directory query 
@@ -344,12 +410,14 @@ In most cases, you can use a workaround, presented in the following sections, to
 ### Array at the root level
 Drill cannot read an array at the root level, outside an object.
 
-Workaround: Remove square brackets at the root of the object.
+Workaround: Remove square brackets at the root of the object, as shown in the following example.
+
+![drill query flow]({{ site.baseurl }}/docs/img/datasources-json-bracket.png)
 
 ### Complex nested data
 Drill cannot read some complex nested arrays unless you use a table qualifier.
 
-Workaround: To query n-level nested data, use table alias to remove ambiguity. The table alias is required; otherwise column names such as user_info are parsed as table names by the SQL parser. The qualifier is not needed for data that is not nested, as shown in the following example:
+Workaround: To query n-level nested data, use the table qualifier to remove ambiguity; otherwise, column names such as user_info are parsed as table names by the SQL parser. The qualifier is not needed for data that is not nested, as shown in the following example:
 
     {"dev_id": 0,
 	 "date":"07/26/2013",
@@ -388,33 +456,44 @@ For example, you cannot query the [City Lots San Francisco in .json](https://git
 After removing the extraneous square brackets in the coordinates array, you can drill down to query all the data for the lots.
 
 ### Lengthy JSON objects
+<<Jason will try to provide some statement about limits.>>
 
-Drilling down into lengthy JSON objects, having just a few or a single set of curly braces, requires flattening and generation of keys.
+### Complex JSON objects
+Complex arrays and maps can be difficult or impossible to query.
 
 Workaround: 
 
-Separate lengthy objects into many objects delimited by curly braces using the following functions:
+Separate lengthy objects into objects delimited by curly braces using the following functions:
  
-  * FLATTEN <<link to example>> separates a set of nested JSON objects into individual rows in a DRILL table.
-  * KVGEN <<link to example>> separates objects having more elements than optimal for querying.
+[flatten](/docs/json-data-model#flatten-json-data) separates a set of nested JSON objects into individual rows in a DRILL table.
+[kvgen](/docs/json-data-model#generate-key-value-pairs) separates objects having more elements than optimal for querying.
+
   
 ### Nested Column Names 
 
 You cannot use reserved words for nested column names because Drill returns null if you enclose n-level nested column names in back ticks. The previous example encloses the date and time column names in back ticks because the names are reserved words. The enclosure of column names in back ticks works because the date and time columns belong to the first level of the JSON object.
 
+For example, the following object contains the reserved word key, which you need to rename to `_key` or something other than non-reserved word:
+
+{
+      "type": "ticket",
+      "channel": 123455,
+      "_month": 12,
+      "_day": [ 15, 25, 28, 31 ],
+      "sales": {
+        "NY": 532806,
+        "PA": 112889,
+        "TX": 898999,
+        "UT": 10875
+        "key": [ 78946, 39107, 76311 ]
+      }
+}
+
 ### Schema changes
 Drill cannot read JSON files containing changes in the schema. For example, attempting to query an object having array elements of different data types cause an error:
 
-        . . .
-            "geometry": {
-                 "type": "Polygon",
-                 "coordinates": [
-                   [
-                     -122.42200352825247,
-                     37.80848009696725,
-                     0
-                   ],
-        . . .
+![drill query flow]({{ site.baseurl }}/docs/img/data-sources-schemachg.png)
+
 Drill interprets numbers that do not have a decimal point as BigInt values. In this example, Drill recognizes the first two coordinates as doubles and the third coordinate as a BigInt, which causes an error. 
                 
 Workaround: Set the `store.json.all_text_mode` property, described earlier, to true.
@@ -422,7 +501,7 @@ Workaround: Set the `store.json.all_text_mode` property, described earlier, to t
     ALTER SYSTEM SET `store.json.all_text_mode` = true;
 
 ### Selecting all in a JSON directory query
-Drill currently returns only fields common to all the files in a [directory query](link to basics tutorial) that selects all (SELECT *) JSON files.
+Drill currently returns only fields common to all the files in a [directory query](/docs/lesson-3-create-a-storage-plugin#query-multiple-files-in-a-directory) that selects all (SELECT *) JSON files.
 
 Workaround: Query each file individually.
 


Mime
View raw message