hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization
Date Tue, 18 Feb 2014 15:48:20 GMT


Hive QA commented on HIVE-6455:

{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:

Test results:
Console output:

Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/' failed with exit status 1 and
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1382/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'conf/hive-default.xml.template'
Reverted 'ql/src/test/results/clientnegative/udf_format_number_wrong2.q.out'
Reverted 'ql/src/test/results/clientnegative/udf_format_number_wrong4.q.out'
Reverted 'ql/src/test/results/clientnegative/udf_format_number_wrong6.q.out'
Reverted 'ql/src/test/results/clientnegative/udf_format_number_wrong1.q.out'
Reverted 'ql/src/test/results/clientpositive/udf_format_number.q.out'
Reverted 'ql/src/test/queries/clientnegative/udf_format_number_wrong6.q'
Reverted 'ql/src/test/queries/clientpositive/udf_format_number.q'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/generic/'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target
shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target
hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target
itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target
itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target
hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target
hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1569394.

At revision 1569394.
+ patchCommandPath=/data/hive-ptest/working/scratch/
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/
+ /data/hive-ptest/working/scratch/ /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1

This message is automatically generated.


> Scalable dynamic partitioning and bucketing optimization
> --------------------------------------------------------
>                 Key: HIVE-6455
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: optimization
>         Attachments: HIVE-6455.1.patch
> The current implementation of dynamic partition works by keeping at least one record
writer open per dynamic partition directory. In case of bucketing there can be multispray
file writers which further adds up to the number of open record writers. The record writers
of column oriented file format (like ORC, RCFile etc.) keeps some sort of in-memory buffers
(value buffer or compression buffers) open all the time to buffer up the rows and compress
them before flushing it to disk. Since these buffers are maintained per column basis the amount
of constant memory that will required at runtime increases as the number of partitions and
number of columns per partition increases. This often leads to OutOfMemory (OOM) exception
in mappers or reducers depending on the number of open record writers. Users often tune the
JVM heapsize (runtime memory) to get over such OOM issues. 
> With this optimization, the dynamic partition columns and bucketing columns (in case
of bucketed tables) are sorted before being fed to the reducers. Since the partitioning and
bucketing columns are sorted, each reducers can keep only one record writer open at any time
thereby reducing the memory pressure on the reducers. This optimization is highly scalable
as the number of partition and number of columns per partition increases at the cost of sorting
the columns.

This message was sent by Atlassian JIRA

View raw message