impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2840: Don't store table location in partition location
Date Wed, 13 Apr 2016 15:27:59 GMT
Jim Apple has uploaded a new patch set (#8).

Change subject: IMPALA-2840: Don't store table location in partition location
......................................................................

IMPALA-2840: Don't store table location in partition location

For a table with location "ABC", most partitions will have locations
like "ABC/DEF=2". The "ABC" part of the location does not need to be
stored in Catalog for each partition; we can compress it down to one
int in the common case.

This is done by stripping from each partition location the last N
directories (where N is the number of clustering columns) and storing
the resulting string in a cache of partition location prefixes. In the
cache, this location prefix string is mapped to an int.  Partition
locations are then stored as a tuple consisting of that int and a
suffix string; the partition location can be reconstructed as the
concatenation of the prefix string (from the cache) and the suffix.

Though this scheme was designed in the expectation that most
partitions will be stored in directories like
"/part_col_1=1.23/part_col_2=234/", it works even when that is not the
case.

TODO: Since each partition stores the literal values for the
partitioning columns, we could also elide the column names and values
when partitions are placed in directories like
"/part_col_1=1.23/part_col_2=234/"

Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb
---
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/com/cloudera/impala/analysis/LoadDataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java
A fe/src/main/java/com/cloudera/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/util/ListMap.java
M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
12 files changed, 416 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/55/2355/8
-- 
To view, visit http://gerrit.cloudera.org:8080/2355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>

Mime
View raw message