Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 10A87200B23 for ; Sun, 19 Jun 2016 19:54:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 092AB160A53; Sun, 19 Jun 2016 17:54:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 51240160A4E for ; Sun, 19 Jun 2016 19:54:06 +0200 (CEST) Received: (qmail 92546 invoked by uid 500); 19 Jun 2016 17:54:05 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 92532 invoked by uid 99); 19 Jun 2016 17:54:05 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 19 Jun 2016 17:54:05 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 4CF262C033A for ; Sun, 19 Jun 2016 17:54:05 +0000 (UTC) Date: Sun, 19 Jun 2016 17:54:05 +0000 (UTC) From: "Hive QA (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 19 Jun 2016 17:54:07 -0000 [ https://issues.apache.org/jira/browse/HIVE-13985?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1533= 8661#comment-15338661 ]=20 Hive QA commented on HIVE-13985: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811504/HIVE-13985.5.patc= h {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10246 tests e= xecuted *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateFor= SubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repair org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index= _bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_table_nonprint= able org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDel= ayedLocalityNodeCommErrorImmediateAllocation org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsC= heck org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testTableCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/179= /testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1= 79/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCo= mmit-HIVE-MASTER-Build-179/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811504 - PreCommit-HIVE-MASTER-Build > ORC improvements for reducing the file system calls in task side > ---------------------------------------------------------------- > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC > Affects Versions: 1.3.0, 2.2.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch= , HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, HIVE-13985-branch-2= .1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, HIVE-13985.3.patch, HIVE-= 13985.4.patch, HIVE-13985.5.patch > > > HIVE-13840 fixed some issues with addition file system invocations during= split generation. Similarly, this jira will fix issues with additional fil= e system invocations on the task side. To avoid reading footers on the task= side, users can set hive.orc.splits.include.file.footer to true which will= serialize the orc footers on the splits. But this has issues with serializ= ing unwanted information like column statistics and other metadata which ar= e not really required for reading orc split on the task side. We can reduce= the payload on the orc splits by serializing only the minimum required inf= ormation (stripe information, types, compression details). This will decrea= se the payload on the orc splits and can potentially avoid OOMs in applicat= ion master (AM) during split generation. This jira also address other issue= s concerning the AM cache. The local cache used by AM is soft reference cac= he. This can introduce unpredictability across multiple runs of the same qu= ery. We can cache the serialized footer in the local cache and also use str= ong reference cache which should avoid memory pressure and will have better= predictability. > One other improvement that we can do is when hive.orc.splits.include.file= .footer is set to false, on the task side we make one additional file syste= m call to know the size of the file. If we can serialize the file length in= the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)