avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Zholudev <vyacheslav.zholu...@gmail.com>
Subject Re: Is it possible to append to an already existing avro file
Date Wed, 22 Feb 2012 09:57:48 GMT
Thanks for your reply, I suspected this. 

I will create a JIRA ticket.

Vyacheslav

On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:

> 
> On 2/21/12 7:29 AM, "Vyacheslav Zholudev" <vyacheslav.zholudev@gmail.com>
> wrote:
> 
>> Yep, I saw that method as well as the stackoverflow post. However, I'm
>> interested how to append to a file on the arbitrary file system, not only
>> on the local one. 
>> 
>> I want to get an OutputStream based on the Path and the FileSystem
>> implementation and then pass it for appending to avro methods.
>> 
>> Is that possible?
> 
> It is not possible without modifying DataFileWriter. Please open a JIRA
> ticket.  
> 
> It could not simply append to an OutputStream, since it must either:
> * Seek to the start to validate the schemas match and find the sync
> marker, or
> * Trust that the schemas match and find the sync marker from the last block
> 
> DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we
> could add something to the mapred module that takes a Path and FileSystem
> and returns
> something that implemements an interface that DataFileWriter can append
> to.  This would be something that is both a
> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInp
> ut.html
> and an OutputStream, or has both an InputStream from the start of the
> existing file and an OutputStream at the end.
> 
> 
> 
> 
>> 
>> Thanks,
>> Vyacheslav
>> 
>> On Feb 21, 2012, at 5:29 AM, Harsh J wrote:
>> 
>>> Hi,
>>> 
>>> Use the appendTo feature of the DataFileWriter. See
>>> 
>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileW
>>> riter.html#appendTo(java.io.File)
>>> 
>>> For a quick setup example, read also:
>>> 
>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-exis
>>> ting-avro-data-file
>>> 
>>> On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev
>>> <vyacheslav.zholudev@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> is it possible to append to an already existing avro file when it was
>>>> written and closed before?
>>>> 
>>>> If I use
>>>> outputStream = fs.append(avroFilePath);
>>>> 
>>>> then later on I get: java.io.IOException: Invalid sync!
>>>> 
>>>> Probably because the schema is written twice and some other issues.
>>>> 
>>>> If I use outputStream = fs.create(avroFilePath); then the avro file
>>>> gets
>>>> overwritten.
>>>> 
>>>> Thanks,
>>>> Vyacheslav
>>> 
>>> 
>>> 
>>> -- 
>>> Harsh J
>>> Customer Ops. Engineer
>>> Cloudera | http://tiny.cloudera.com/about
>> 
> 
> 


Mime
View raw message