avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Is it possible to append to an already existing avro file
Date Wed, 06 Feb 2013 18:17:13 GMT
Hey Michael,

It does implement the regular Java OutputStream interface, as seen in
the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.

Here's a sample program that works on Hadoop 2.x in my tests:
https://gist.github.com/QwertyManiac/4724582

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org> wrote:
>
>> From: Doug Cutting <cutting@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelmalak@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutting@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



--
Harsh J

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org> wrote:
>
>> From: Doug Cutting <cutting@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelmalak@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutting@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



--
Harsh J

On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com> wrote:
> I don't believe a Hadoop FileSystem is a Java OutputStream?
>
> --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org> wrote:
>
>> From: Doug Cutting <cutting@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:27 PM
>> It will work on an OutputStream that
>> supports append.
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>> java.io.OutputStream)
>>
>> So it depends on how well HDFS implements
>> FileSystem#append(), not on
>> any changes in Avro.
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>
>> I have no recent personal experience with append in
>> HDFS.  Does anyone
>> else here?
>>
>> Doug
>>
>> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelmalak@yahoo.com>
>> wrote:
>> > My understanding is that will append to a file on the
>> local filesystem, but not to a file on HDFS.
>> >
>> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
>> wrote:
>> >
>> >> From: Doug Cutting <cutting@apache.org>
>> >> Subject: Re: Is it possible to append to an already
>> existing avro file
>> >> To: user@avro.apache.org
>> >> Date: Tuesday, February 5, 2013, 5:08 PM
>> >> The Jira is:
>> >>
>> >> https://issues.apache.org/jira/browse/AVRO-1035
>> >>
>> >> It is possible to append to an existing Avro file:
>> >>
>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>
>> >> Should we close that issue as "fixed"?
>> >>
>> >> Doug
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
>> <michaelmalak@yahoo.com>
>> >> wrote:
>> >> > Was a JIRA ticket ever created regarding
>> appending to
>> >> an existing Avro file on HDFS?
>> >> >
>> >> > What is the status of such a capability, a
>> year out
>> >> from when the issue below was raised?
>> >> >
>> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>> "Vyacheslav
>> >> Zholudev" <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Thanks for your reply, I suspected this.
>> >> >>
>> >> >> I will create a JIRA ticket.
>> >> >>
>> >> >> Vyacheslav
>> >> >>
>> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
>> wrote:
>> >> >>
>> >> >>>
>> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
>> Zholudev"
>> >> <vyacheslav.zholudev@gmail.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Yep, I saw that method as well as
>> the
>> >> stackoverflow post. However, I'm
>> >> >>>> interested how to append to a file
>> on the
>> >> arbitrary file system, not
>> >> >>>> only on the local one.
>> >> >>>>
>> >> >>>> I want to get an OutputStream
>> based on the
>> >> Path and the FileSystem
>> >> >>>> implementation and then pass it
>> for
>> >> appending to avro methods.
>> >> >>>>
>> >> >>>> Is that possible?
>> >> >>>
>> >> >>> It is not possible without modifying
>> >> DataFileWriter. Please open a JIRA
>> >> >>> ticket.
>> >> >>>
>> >> >>> It could not simply append to an
>> OutputStream,
>> >> since it must either:
>> >> >>> * Seek to the start to validate the
>> schemas
>> >> match and find the sync
>> >> >>> marker, or
>> >> >>> * Trust that the schemas match and
>> find the
>> >> sync marker from the last
>> >> >>> block
>> >> >>>
>> >> >>> DataFileWriter cannot refer to Hadoop
>> classes
>> >> such as FileSystem, but we
>> >> >>> could add something to the mapred
>> module that
>> >> takes a Path and
>> >> >>> FileSystem and returns something that
>> >> implemements an interface that
>> >> >>> DataFileWriter can append to.
>> This would
>> >> be something that is both a
>> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >> >>> and an OutputStream, or has both an
>> InputStream
>> >> from the start of the
>> >> >>> existing file and an OutputStream at
>> the end.
>> >> >>>
>> >> >>>> Thanks,
>> >> >>>> Vyacheslav
>> >> >>>>
>> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
>> J
>> >> wrote:
>> >> >>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> Use the appendTo feature of
>> the
>> >> DataFileWriter. See
>> >> >>>>>
>> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >> >>>>>
>> >> >>>>> For a quick setup example,
>> read also:
>> >> >>>>>
>> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >> >>>>>
>> >> >>>>> On Tue, Feb 21, 2012 at 3:15
>> AM,
>> >> Vyacheslav Zholudev
>> >> >>>>> <vyacheslav.zholudev@gmail.com>
>> >> wrote:
>> >> >>>>>> Hi,
>> >> >>>>>>
>> >> >>>>>> is it possible to append
>> to an
>> >> already existing avro file when it was
>> >> >>>>>> written and closed
>> before?
>> >> >>>>>>
>> >> >>>>>> If I use
>> >> >>>>>> outputStream =
>> >> fs.append(avroFilePath);
>> >> >>>>>>
>> >> >>>>>> then later on I get:
>> >> java.io.IOException: Invalid sync!
>> >> >>>>>>
>> >> >>>>>> Probably because the
>> schema is
>> >> written twice and some other issues.
>> >> >>>>>>
>> >> >>>>>> If I use outputStream =
>> >> fs.create(avroFilePath); then the avro file
>> >> >>>>>> gets
>> >> >>>>>> overwritten.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Vyacheslav
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Harsh J
>> >> >>>>> Customer Ops. Engineer
>> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >> >
>> >>
>>



-- 
Harsh J

Mime
View raw message