hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Writing click stream data to hadoop
Date Thu, 31 May 2012 02:37:05 GMT
Thanks for correcting me there on the syncFs call Luke. I seemed to
have missed that method when searching branch-1 code.

On Thu, May 31, 2012 at 6:54 AM, Luke Lu <llu@apache.org> wrote:
>
> SequenceFile.Writer#syncFs is in Hadoop 1.0.0 (actually since
> 0.20.205), which calls the underlying FSDataOutputStream#sync which is
> actually hflush semantically (data not durable in case of data center
> wide power outage). hsync implementation is not yet in 2.0. HDFS-744
> just brought hsync in trunk.
>
> __Luke
>
> On Fri, May 25, 2012 at 9:30 AM, Harsh J <harsh@cloudera.com> wrote:
> > Mohit,
> >
> > Not if you call sync (or hflush/hsync in 2.0) periodically to persist
> > your changes to the file. SequenceFile doesn't currently have a
> > sync-API inbuilt in it (in 1.0 at least), but you can call sync on the
> > underlying output stream instead at the moment. This is possible to do
> > in 1.0 (just own the output stream).
> >
> > Your use case also sounds like you may want to simply use Apache Flume
> > (Incubating) [http://incubator.apache.org/flume/] that already does
> > provide these features and the WAL-kinda reliability you seek.
> >
> > On Fri, May 25, 2012 at 8:24 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> >> We get click data through API calls. I now need to send this data to our
> >> hadoop environment. I am wondering if I could open one sequence file and
> >> write to it until it's of certain size. Once it's over the specified size I
> >> can close that file and open a new one. Is this a good approach?
> >>
> >> Only thing I worry about is what happens if the server crashes before I am
> >> able to cleanly close the file. Would I lose all previous data?
> >
> >
> >
> > --
> > Harsh J




--
Harsh J

Mime
View raw message