drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From krish...@apache.org
Subject [01/11] drill git commit: edits for 1.4
Date Mon, 14 Dec 2015 23:48:52 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 161af8f5c -> 7c9401a32


edits for 1.4


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a694d587
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a694d587
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a694d587

Branch: refs/heads/gh-pages
Commit: a694d587b4d688ee4e0d0d1d80bcfd85eb002949
Parents: a5ade47
Author: Kris Hahn <krishahn@apache.org>
Authored: Mon Dec 14 10:55:20 2015 -0800
Committer: Kris Hahn <krishahn@apache.org>
Committed: Mon Dec 14 15:46:37 2015 -0800

----------------------------------------------------------------------
 .../010-partition-pruning-introduction.md       |  4 ++--
 .../020-migrating-partitioned-data.md           | 20 ++++++++++++++------
 .../030-using-partition-pruning.md              |  2 +-
 _docs/rn/007-1.4.0-rn.md                        |  4 +---
 4 files changed, 18 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/a694d587/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 77c16d8..2a94e3d 100755
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -8,12 +8,12 @@ Partition pruning is a performance optimization that limits the number of
files
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition
filters are present, the underlying Scan operator reads all files in all directories and then
sends the data to operators, such as Filter, downstream. When partition filters are present,
the query planner pushes the filters down to the Scan if possible. The Scan reads only the
directories that match the partition filters, thus reducing disk I/O.
 
 ## Using Partitioned Drill 1.1-1.2 Data
-Before using partitioned Drill 1.1-1.2 data in Drill 1.3, you need to migrate the data. Migrate
Parquet data as described in "Migrating Partitioned Data". 
+Before using partitioned Drill 1.1-1.2 data in Drill 1.3, you need to migrate the data. Migrate
Parquet data as described in ["Migrating Partitioned Data"]({{site.baseurl}}/docs/migrating-partitioned-data/).

 
 {% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include
endimportant.html %}
 
 ## Partitioning Data
-Prior to the release of Drill 1.1, partition pruning involved time-consuming manual setup
tasks. Using the PARTITION BY clause in the CTAS command simplifies the process. "How to Partition
Data" describes this process.
+Prior to the release of Drill 1.1, partition pruning involved time-consuming manual setup
tasks. Using the PARTITION BY clause in the CTAS command simplifies the process. ["How to
Partition Data"]{{site.baseurl}}(/docs/using-partition-pruning/#how-to-partition-data) describes
this process.
 
 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/a694d587/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
b/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
index 729ab41..03f2cfb 100755
--- a/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
+++ b/_docs/performance-tuning/partition-pruning/020-migrating-partitioned-data.md
@@ -16,14 +16,16 @@ The upgrade tool simply inserts a version number in the metadata to mark
the fil
 
 <!-- The bug fix eliminated the risk of inaccurate metadata that could cause incorrect
results when querying Hive- and Pig-generated Parquet files. No such risk exists with Drill-generated
Parquet files. Querying Drill-generated Parquet files, regardless of the Drill version, yields
accurate results. Drill-generated Parquet files, regardless of the Drill release, contain
accurate metadata. -->
 
-## How to Migrate Data
-Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade)to modify one
file at a time. The temp directory holds a copy of the file that is currently being modified
for recovery in the event of a system failure. 
+
+## Preparing for the Migration
+Set aside sufficient time for the migration. In a test by Drill developers, it took 32 minutes
to upgrade 1TB data in 840 files and 370 minutes to upgrade 100 GB data in 200k files. Although
the size of files is a factor in the upgrade time, the number of files is the most significant
factor.
 
 System administrators can write a shell script to run the upgrade tool simultaneously on
multiple sub-directories.
 
-## Preparing for the Migration
-In a test by Drill developers, it took 32 minutes to upgrade 1TB data in 840 files and
-370 minutes to upgrade 100 GB data in 200k files. Although the size of files is a factor
in the upgrade time, the number of files is the most significant factor.
+Back up the data to be migrated and create one or more `temp` directories as described in
the next section before migrating the data.
+
+## How to Migrate Data
+Use the [drill-upgrade tool](https://github.com/parthchandra/drill-upgrade) to modify one
file at a time. The `temp` directory or directories hold a copy for recovery of the file(s)
currently being modified in the event of a system failure. Inspecting the `temp` directory
can also indicate the success or failure of an unattended migration.
 
 To migrate Parquet data for use in Drill 1.3 that you partitioned and generated in Drill
1.1 or 1.2, follow these steps:
 
@@ -41,10 +43,16 @@ To migrate Parquet data for use in Drill 1.3 that you partitioned and
generated
    `java -Dlog.path=/home/rchallapalli/work/drill-upgrade/upgrade.log -cp drill-upgrade-1.0-jar-with-dependencies.jar
org.apache.drill.upgrade.Upgrade_12_13 --tempDir=maprfs:///drill/upgrade-temp maprfs:///drill/testdata/`
 
 ## Checking the Success of the Migration
+If you perform an unattended migration, check that the temp directory or directories are
empty. Empty directories indicate success.
 
 ## Handling of Migration Failure
 
-If a network connection goes down, or if a user cancels the operation, the file that was
being processed at the time of cancellation could be corrupted. So we should always copy the
file back from the temp directory. Now if we re-run the upgrade tool, it will skip the files
that it has already processed and only updates the remaining files.
+If a network connection goes down, or if a user cancels the operation, the file that was
being processed at the time of cancellation could be corrupted. To recover from such a situation,
perform the following steps:
+
+1. Copy the file back from the temp directory to your directory of Parquet files. 
+2. Re-run the upgrade tool.
+
+The tool skips the files that it has already processed and only updates the remaining files.
 
 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/a694d587/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md b/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md
index 6660a39..e6620cb 100755
--- a/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md
+++ b/_docs/performance-tuning/partition-pruning/030-using-partition-pruning.md
@@ -7,7 +7,7 @@ In Drill 1.1.0 and later, if the data source is Parquet, no data organization
ta
 
 ## How to Partition Data
 
-Write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/) clause
in the CTAS statement. 
+In Drill 1.1.0 and later, write Parquet data using the [PARTITION BY]({{site.baseurl}}/docs/partition-by-clause/)
clause in the CTAS statement. 
 
 The Parquet writer first sorts data by the partition keys, and then creates a new file when
it encounters a new value for the partition columns. During partitioning, Drill creates separate
files, but not separate directories, for different partitions. Each file contains exactly
one partition value, but there can be multiple files for the same partition value. 
 

http://git-wip-us.apache.org/repos/asf/drill/blob/a694d587/_docs/rn/007-1.4.0-rn.md
----------------------------------------------------------------------
diff --git a/_docs/rn/007-1.4.0-rn.md b/_docs/rn/007-1.4.0-rn.md
index a5dca04..717e80b 100644
--- a/_docs/rn/007-1.4.0-rn.md
+++ b/_docs/rn/007-1.4.0-rn.md
@@ -5,9 +5,7 @@ parent: "Release Notes"
 
 **Release date:**  December 14, 2015
 
-Today, we're happy to announce the availability of Drill 1.4.0, providing the following bug
fixes. 
-
-## Bug Fixes
+Today, we're happy to announce the availability of Drill 1.4.0, providing the following bug
fixes and improvements. 
     
 <h2>        Sub-task
 </h2>


Mime
View raw message