avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Kumar V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1418) AvroMultipleOutputs should support sync-able writers
Date Fri, 20 Dec 2013 14:44:15 GMT
Deepak Kumar V created AVRO-1418:
------------------------------------

             Summary: AvroMultipleOutputs should support sync-able writers
                 Key: AVRO-1418
                 URL: https://issues.apache.org/jira/browse/AVRO-1418
             Project: Avro
          Issue Type: New Feature
            Reporter: Deepak Kumar V
            Priority: Minor


DataFileWriter supports APIs like sync() (that allows to emit synchronization markers) so
that DataFileReader could later use sync() or seek() to move to a particular synchronization
point.

AvroMultipleOutputs does not support or provide a way to invoke sync on its individual writers.
Besides its design limits it not be extended. 

I) Provide support for MarkableAvroMultipleOutputs that exposes a public api to invoke synch
on a named output.
Ex: public void sync(String namedOutput, String baseOutputPath) throws IOException, InterruptedException
{}

To achieve above AvroMultipleOutputs should be modified so as to allow support for additional
behavior. The following must be marked as protected instead of private
1) private static void checkBaseOutputPath(String outputPath) {}  from private.
2) private static void checkNamedOutputName(JobContext job, String namedOutput, boolean alreadyDefined)
{} from private.
3) private TaskInputOutputContext<?, ?, ?, ?> context;
4) private Set<String> namedOutputs;

II) AvroKeyValueRecordWriter that is used by AvroMultipleOutputs as writers for individual
writers is again closed for extension. It must allow to invoke sync() on writer.

To achieve that the following private members must be marked protected.
1) private final DataFileWriter<GenericRecord> mAvroFileWriter;

A MarkableAvroKeyValueRecordWriter must be provided that exposes a public API to invoke sync
on its writer.
public void sync() throws IOException {}

III) A MarkableAvroKeyValueOutputFormat that extends AvroKeyValueOutputFormat and uses MarkableAvroKeyValueRecordWriter.








--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message