avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Is it possible to append to an already existing avro file
Date Wed, 06 Feb 2013 00:08:41 GMT
The Jira is:

https://issues.apache.org/jira/browse/AVRO-1035

It is possible to append to an existing Avro file:

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)

Should we close that issue as "fixed"?

Doug

On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelmalak@yahoo.com> wrote:
> Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?
>
> What is the status of such a capability, a year out from when the issue below was raised?
>
> On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
wrote:
>
>> Thanks for your reply, I suspected this.
>>
>> I will create a JIRA ticket.
>>
>> Vyacheslav
>>
>> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>>
>>>
>>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
>>> wrote:
>>>
>>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>>> interested how to append to a file on the arbitrary file system, not
>>>> only on the local one.
>>>>
>>>> I want to get an OutputStream based on the Path and the FileSystem
>>>> implementation and then pass it for appending to avro methods.
>>>>
>>>> Is that possible?
>>>
>>> It is not possible without modifying DataFileWriter. Please open a JIRA
>>> ticket.
>>>
>>> It could not simply append to an OutputStream, since it must either:
>>> * Seek to the start to validate the schemas match and find the sync
>>> marker, or
>>> * Trust that the schemas match and find the sync marker from the last
>>> block
>>>
>>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>>> could add something to the mapred module that takes a Path and
>>> FileSystem and returns something that implemements an interface that
>>> DataFileWriter can append to.  This would be something that is both a
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> and an OutputStream, or has both an InputStream from the start of the
>>> existing file and an OutputStream at the end.
>>>
>>>> Thanks,
>>>> Vyacheslav
>>>>
>>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Use the appendTo feature of the DataFileWriter. See
>>>>>
>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>
>>>>> For a quick setup example, read also:
>>>>>
>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>
>>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>>> <vyacheslav.zholudev@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is it possible to append to an already existing avro file when it
was
>>>>>> written and closed before?
>>>>>>
>>>>>> If I use
>>>>>> outputStream = fs.append(avroFilePath);
>>>>>>
>>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>>
>>>>>> Probably because the schema is written twice and some other issues.
>>>>>>
>>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>>> gets
>>>>>> overwritten.
>>>>>>
>>>>>> Thanks,
>>>>>> Vyacheslav
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> Customer Ops. Engineer
>>>>> Cloudera | http://tiny.cloudera.com/about
>

On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelmalak@yahoo.com> wrote:
> Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS?
>
> What is the status of such a capability, a year out from when the issue below was raised?
>
> On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
wrote:
>
>> Thanks for your reply, I suspected this.
>>
>> I will create a JIRA ticket.
>>
>> Vyacheslav
>>
>> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>>
>>>
>>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
>>> wrote:
>>>
>>>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>>>> interested how to append to a file on the arbitrary file system, not
>>>> only on the local one.
>>>>
>>>> I want to get an OutputStream based on the Path and the FileSystem
>>>> implementation and then pass it for appending to avro methods.
>>>>
>>>> Is that possible?
>>>
>>> It is not possible without modifying DataFileWriter. Please open a JIRA
>>> ticket.
>>>
>>> It could not simply append to an OutputStream, since it must either:
>>> * Seek to the start to validate the schemas match and find the sync
>>> marker, or
>>> * Trust that the schemas match and find the sync marker from the last
>>> block
>>>
>>> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
>>> could add something to the mapred module that takes a Path and
>>> FileSystem and returns something that implemements an interface that
>>> DataFileWriter can append to.  This would be something that is both a
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> and an OutputStream, or has both an InputStream from the start of the
>>> existing file and an OutputStream at the end.
>>>
>>>> Thanks,
>>>> Vyacheslav
>>>>
>>>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Use the appendTo feature of the DataFileWriter. See
>>>>>
>>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>>>>
>>>>> For a quick setup example, read also:
>>>>>
>>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>>>>
>>>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>>>> <vyacheslav.zholudev@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> is it possible to append to an already existing avro file when it
was
>>>>>> written and closed before?
>>>>>>
>>>>>> If I use
>>>>>> outputStream = fs.append(avroFilePath);
>>>>>>
>>>>>> then later on I get: java.io.IOException: Invalid sync!
>>>>>>
>>>>>> Probably because the schema is written twice and some other issues.
>>>>>>
>>>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>>>> gets
>>>>>> overwritten.
>>>>>>
>>>>>> Thanks,
>>>>>> Vyacheslav
>>>>>
>>>>> --
>>>>> Harsh J
>>>>> Customer Ops. Engineer
>>>>> Cloudera | http://tiny.cloudera.com/about
>

Mime
View raw message