Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7009A103AC for ; Fri, 20 Dec 2013 14:44:52 +0000 (UTC) Received: (qmail 39462 invoked by uid 500); 20 Dec 2013 14:44:31 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 38934 invoked by uid 500); 20 Dec 2013 14:44:19 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 38886 invoked by uid 99); 20 Dec 2013 14:44:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Dec 2013 14:44:15 +0000 Date: Fri, 20 Dec 2013 14:44:15 +0000 (UTC) From: "Deepak Kumar V (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (AVRO-1418) AvroMultipleOutputs should support sync-able writers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Deepak Kumar V created AVRO-1418: ------------------------------------ Summary: AvroMultipleOutputs should support sync-able writers Key: AVRO-1418 URL: https://issues.apache.org/jira/browse/AVRO-1418 Project: Avro Issue Type: New Feature Reporter: Deepak Kumar V Priority: Minor DataFileWriter supports APIs like sync() (that allows to emit synchronization markers) so that DataFileReader could later use sync() or seek() to move to a particular synchronization point. AvroMultipleOutputs does not support or provide a way to invoke sync on its individual writers. Besides its design limits it not be extended. I) Provide support for MarkableAvroMultipleOutputs that exposes a public api to invoke synch on a named output. Ex: public void sync(String namedOutput, String baseOutputPath) throws IOException, InterruptedException {} To achieve above AvroMultipleOutputs should be modified so as to allow support for additional behavior. The following must be marked as protected instead of private 1) private static void checkBaseOutputPath(String outputPath) {} from private. 2) private static void checkNamedOutputName(JobContext job, String namedOutput, boolean alreadyDefined) {} from private. 3) private TaskInputOutputContext context; 4) private Set namedOutputs; II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for individual writers is again closed for extension. It must allow to invoke sync() on writer. To achieve that the following private members must be marked protected. 1) private final DataFileWriter mAvroFileWriter; A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API to invoke sync on its writer. public void sync() throws IOException {} III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat and uses MarkableAvroKeyValueRecordWriter. -- This message was sent by Atlassian JIRA (v6.1.4#6159)