avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Malak <michaelma...@yahoo.com>
Subject Re: Is it possible to append to an already existing avro file
Date Thu, 07 Feb 2013 00:42:40 GMT
Thanks so much for the code -- it works great!

Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append.

--- On Wed, 2/6/13, Harsh J <harsh@cloudera.com> wrote:

> From: Harsh J <harsh@cloudera.com>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Wednesday, February 6, 2013, 11:17 AM
> Hey Michael,
> 
> It does implement the regular Java OutputStream interface,
> as seen in
> the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
> 
> Here's a sample program that works on Hadoop 2.x in my
> tests:
> https://gist.github.com/QwertyManiac/4724582
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cutting@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <michaelmalak@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cutting@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> --
> Harsh J
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cutting@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <michaelmalak@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cutting@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> --
> Harsh J
> 
> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelmalak@yahoo.com>
> wrote:
> > I don't believe a Hadoop FileSystem is a Java
> OutputStream?
> >
> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cutting@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:27 PM
> >> It will work on an OutputStream that
> >> supports append.
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> >> java.io.OutputStream)
> >>
> >> So it depends on how well HDFS implements
> >> FileSystem#append(), not on
> >> any changes in Avro.
> >>
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> >>
> >> I have no recent personal experience with append
> in
> >> HDFS.  Does anyone
> >> else here?
> >>
> >> Doug
> >>
> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
> <michaelmalak@yahoo.com>
> >> wrote:
> >> > My understanding is that will append to a file
> on the
> >> local filesystem, but not to a file on HDFS.
> >> >
> >> > --- On Tue, 2/5/13, Doug Cutting <cutting@apache.org>
> >> wrote:
> >> >
> >> >> From: Doug Cutting <cutting@apache.org>
> >> >> Subject: Re: Is it possible to append to
> an already
> >> existing avro file
> >> >> To: user@avro.apache.org
> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> >> The Jira is:
> >> >>
> >> >> https://issues.apache.org/jira/browse/AVRO-1035
> >> >>
> >> >> It is possible to append to an existing
> Avro file:
> >> >>
> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>
> >> >> Should we close that issue as "fixed"?
> >> >>
> >> >> Doug
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
> Malak
> >> <michaelmalak@yahoo.com>
> >> >> wrote:
> >> >> > Was a JIRA ticket ever created
> regarding
> >> appending to
> >> >> an existing Avro file on HDFS?
> >> >> >
> >> >> > What is the status of such a
> capability, a
> >> year out
> >> >> from when the issue below was raised?
> >> >> >
> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> >> "Vyacheslav
> >> >> Zholudev" <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Thanks for your reply, I
> suspected this.
> >> >> >>
> >> >> >> I will create a JIRA ticket.
> >> >> >>
> >> >> >> Vyacheslav
> >> >> >>
> >> >> >> On Feb 21, 2012, at 6:02 PM,
> Scott Carey
> >> wrote:
> >> >> >>
> >> >> >>>
> >> >> >>> On 2/21/12 7:29 AM,
> "Vyacheslav
> >> Zholudev"
> >> >> <vyacheslav.zholudev@gmail.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Yep, I saw that method as
> well as
> >> the
> >> >> stackoverflow post. However, I'm
> >> >> >>>> interested how to append
> to a file
> >> on the
> >> >> arbitrary file system, not
> >> >> >>>> only on the local one.
> >> >> >>>>
> >> >> >>>> I want to get an
> OutputStream
> >> based on the
> >> >> Path and the FileSystem
> >> >> >>>> implementation and then
> pass it
> >> for
> >> >> appending to avro methods.
> >> >> >>>>
> >> >> >>>> Is that possible?
> >> >> >>>
> >> >> >>> It is not possible without
> modifying
> >> >> DataFileWriter. Please open a JIRA
> >> >> >>> ticket.
> >> >> >>>
> >> >> >>> It could not simply append to
> an
> >> OutputStream,
> >> >> since it must either:
> >> >> >>> * Seek to the start to
> validate the
> >> schemas
> >> >> match and find the sync
> >> >> >>> marker, or
> >> >> >>> * Trust that the schemas
> match and
> >> find the
> >> >> sync marker from the last
> >> >> >>> block
> >> >> >>>
> >> >> >>> DataFileWriter cannot refer
> to Hadoop
> >> classes
> >> >> such as FileSystem, but we
> >> >> >>> could add something to the
> mapred
> >> module that
> >> >> takes a Path and
> >> >> >>> FileSystem and returns
> something that
> >> >> implemements an interface that
> >> >> >>> DataFileWriter can append
> to.
> >> This would
> >> >> be something that is both a
> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >> >>> and an OutputStream, or has
> both an
> >> InputStream
> >> >> from the start of the
> >> >> >>> existing file and an
> OutputStream at
> >> the end.
> >> >> >>>
> >> >> >>>> Thanks,
> >> >> >>>> Vyacheslav
> >> >> >>>>
> >> >> >>>> On Feb 21, 2012, at 5:29
> AM, Harsh
> >> J
> >> >> wrote:
> >> >> >>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> Use the appendTo
> feature of
> >> the
> >> >> DataFileWriter. See
> >> >> >>>>>
> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >> >>>>>
> >> >> >>>>> For a quick setup
> example,
> >> read also:
> >> >> >>>>>
> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >> >>>>>
> >> >> >>>>> On Tue, Feb 21, 2012
> at 3:15
> >> AM,
> >> >> Vyacheslav Zholudev
> >> >> >>>>> <vyacheslav.zholudev@gmail.com>
> >> >> wrote:
> >> >> >>>>>> Hi,
> >> >> >>>>>>
> >> >> >>>>>> is it possible to
> append
> >> to an
> >> >> already existing avro file when it was
> >> >> >>>>>> written and
> closed
> >> before?
> >> >> >>>>>>
> >> >> >>>>>> If I use
> >> >> >>>>>> outputStream =
> >> >> fs.append(avroFilePath);
> >> >> >>>>>>
> >> >> >>>>>> then later on I
> get:
> >> >> java.io.IOException: Invalid sync!
> >> >> >>>>>>
> >> >> >>>>>> Probably because
> the
> >> schema is
> >> >> written twice and some other issues.
> >> >> >>>>>>
> >> >> >>>>>> If I use
> outputStream =
> >> >> fs.create(avroFilePath); then the avro
> file
> >> >> >>>>>> gets
> >> >> >>>>>> overwritten.
> >> >> >>>>>>
> >> >> >>>>>> Thanks,
> >> >> >>>>>> Vyacheslav
> >> >> >>>>>
> >> >> >>>>> --
> >> >> >>>>> Harsh J
> >> >> >>>>> Customer Ops.
> Engineer
> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >> >
> >> >>
> >>
> 
> 
> 
> -- 
> Harsh J
> 

Mime
View raw message