Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 771B91839A for ; Wed, 18 Nov 2015 00:53:11 +0000 (UTC) Received: (qmail 30292 invoked by uid 500); 18 Nov 2015 00:53:11 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 30147 invoked by uid 500); 18 Nov 2015 00:53:11 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 30126 invoked by uid 99); 18 Nov 2015 00:53:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Nov 2015 00:53:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 37AC82C1F63 for ; Wed, 18 Nov 2015 00:53:11 +0000 (UTC) Date: Wed, 18 Nov 2015 00:53:11 +0000 (UTC) From: "Alan Gates (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-12443) Hive Streaming should expose encoding and serdes for testing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009957#comment-15009957 ] Alan Gates commented on HIVE-12443: ----------------------------------- Sorry, I wasn't clear. I don't want to intercept encode, I want to encode without writing. In both RecordWriter's today, the write method contains: {code} Object encodedRow = encode(record); int bucket = getBucket(encodedRow); updaters.get(bucket).insert(transactionId, encodedRow); {code} Both encoding the record and writing of the record are done in this one method. I want to be able to encode the record without writing. Thus I want to take encode, which is private in both, elevate it to AbstractRecordWriter and make it public so I can call it from capybara without writing data into an updater. This will allow capybara to split the stream coming into HiveEndPoint, sending one to Hive normal, and using the other to load data into a benchmark. > Hive Streaming should expose encoding and serdes for testing > ------------------------------------------------------------ > > Key: HIVE-12443 > URL: https://issues.apache.org/jira/browse/HIVE-12443 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure, Transactions > Affects Versions: 2.0.0 > Reporter: Alan Gates > Assignee: Alan Gates > Attachments: HIVE-12443.patch > > > Currently how records are passed into the hive streaming RecordWriter are converted from the inbound format to Hive format is opaque. The encoding and writing are done in a single call to RecordWriter.write(). This is problematic for test tools that want to intercept the record stream and write it to a benchmark in addition to Hive. > All existing RecordWriters have an encode and getSerDe methods. I propose to expose these by making them public in AbstractRecordWriter, and making AbstractRecordWriter a public class (it is currently package private). This keeps the RecordWriter interface clean (stream writers will not need to directly call these methods) and avoids any backwards incompatible changes. Having AbstractRecordWriter public is also desirable for anyone who wants to write their own RecordWriter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)