Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8703511D85 for ; Fri, 22 Aug 2014 09:14:12 +0000 (UTC) Received: (qmail 63500 invoked by uid 500); 22 Aug 2014 09:14:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 63441 invoked by uid 500); 22 Aug 2014 09:14:11 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 63425 invoked by uid 500); 22 Aug 2014 09:14:11 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 63419 invoked by uid 99); 22 Aug 2014 09:14:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 09:14:11 +0000 Date: Fri, 22 Aug 2014 09:14:11 +0000 (UTC) From: "Hive QA (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106654#comment-14106654 ] Hive QA commented on HIVE-7832: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663613/HIVE-7832.2.patch {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 6118 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.columnProjection[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.metaData[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testPredicatePushdown[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSeek[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testSnappy[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStripeLevelStats[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testTimestamp[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testUnionAndTimestamp[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testWithoutIndex[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testZeroCopySeek[1] org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/453/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-453/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663613 > Do ORC dictionary check at a finer level and preserve encoding across stripes > ----------------------------------------------------------------------------- > > Key: HIVE-7832 > URL: https://issues.apache.org/jira/browse/HIVE-7832 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.14.0 > Reporter: Prasanth J > Assignee: Prasanth J > Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch > > > Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)