Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8608411AAE for ; Fri, 22 Aug 2014 23:48:11 +0000 (UTC) Received: (qmail 43519 invoked by uid 500); 22 Aug 2014 23:48:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 43442 invoked by uid 500); 22 Aug 2014 23:48:11 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 43429 invoked by uid 500); 22 Aug 2014 23:48:11 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 43426 invoked by uid 99); 22 Aug 2014 23:48:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 23:48:11 +0000 Date: Fri, 22 Aug 2014 23:48:11 +0000 (UTC) From: "Hive QA (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107717#comment-14107717 ] Hive QA commented on HIVE-7832: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663706/HIVE-7832.3.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6119 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagement[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testMemoryManagement[1] org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/468/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/468/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-468/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663706 > Do ORC dictionary check at a finer level and preserve encoding across stripes > ----------------------------------------------------------------------------- > > Key: HIVE-7832 > URL: https://issues.apache.org/jira/browse/HIVE-7832 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.14.0 > Reporter: Prasanth J > Assignee: Prasanth J > Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch > > > Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)