avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Kumar V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1418) AvroMultipleOutputs should support sync-able writers
Date Fri, 20 Dec 2013 14:46:13 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854003#comment-13854003
] 

Deepak Kumar V commented on AVRO-1418:
--------------------------------------

I can take up this issue as i have already implemented this for my project and wanted to contribute
it back to Avro.

> AvroMultipleOutputs should support sync-able writers
> ----------------------------------------------------
>
>                 Key: AVRO-1418
>                 URL: https://issues.apache.org/jira/browse/AVRO-1418
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Deepak Kumar V
>            Priority: Minor
>
> DataFileWriter supports APIs like sync() (that allows to emit synchronization markers)
so that DataFileReader could later use sync() or seek() to move to a particular synchronization
point.
> AvroMultipleOutputs does not support or provide a way to invoke sync on its individual
writers. Besides its design limits it not be extended. 
> I) Provide support for MarkableAvroMultipleOutputs that exposes a public api to invoke
synch on a named output.
> Ex: public void sync(String namedOutput, String baseOutputPath) throws IOException, InterruptedException
{}
> To achieve above AvroMultipleOutputs should be modified so as to allow support for additional
behavior. The following must be marked as protected instead of private
> 1) private static void checkBaseOutputPath(String outputPath) {}  from private.
> 2) private static void checkNamedOutputName(JobContext job, String namedOutput, boolean
alreadyDefined) {} from private.
> 3) private TaskInputOutputContext<?, ?, ?, ?> context;
> 4) private Set<String> namedOutputs;
> II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for individual
writers is again closed for extension. It must allow to invoke sync() on writer.
> To achieve that the following private members must be marked protected.
> 1) private final DataFileWriter<GenericRecord> mAvroFileWriter;
> A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API to invoke
sync on its writer.
> public void sync() throws IOException {}
> III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat and uses
MarkableAvroKeyValueRecordWriter. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message