Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B10EA19D90 for ; Tue, 19 Apr 2016 20:04:25 +0000 (UTC) Received: (qmail 96599 invoked by uid 500); 19 Apr 2016 20:04:25 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 96574 invoked by uid 500); 19 Apr 2016 20:04:25 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 96545 invoked by uid 99); 19 Apr 2016 20:04:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Apr 2016 20:04:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 76DFD2C1F4E for ; Tue, 19 Apr 2016 20:04:25 +0000 (UTC) Date: Tue, 19 Apr 2016 20:04:25 +0000 (UTC) From: "Rohit Dholakia (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-12049: ---------------------------------- Attachment: HIVE-12049.25.patch > Provide an option to write serialized thrift objects in final tasks > ------------------------------------------------------------------- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 > Reporter: Rohit Dholakia > Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, HIVE-12049.15.patch, HIVE-12049.16.patch, HIVE-12049.17.patch, HIVE-12049.18.patch, HIVE-12049.19.patch, HIVE-12049.2.patch, HIVE-12049.25.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch, new-driver-profiles.png, old-driver-profiles.png > > > For each fetch request to HiveServer2, we pay the penalty of deserializing the row objects and translating them into a different representation suitable for the RPC transfer. In a moderate to high concurrency scenarios, this can result in significant CPU and memory wastage. By having each task write the appropriate thrift objects to the output files, HiveServer2 can simply stream a batch of rows on the wire without incurring any of the additional cost of deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator can use to write thrift formatted row batches to the output file. Using the pluggable property of the {{hive.query.result.fileformat}}, we can set it to use SequenceFile and write a batch of thrift formatted rows as a value blob. The FetchTask can now simply read the blob and send it over the wire. On the client side, the *DBC driver can read the blob and since it is already formatted in the way it expects, it can continue building the ResultSet the way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)