hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: File is closed but data is not visible
Date Wed, 12 Aug 2009 13:05:03 GMT
Are you explicitly calling close on the FSDataOutputStream that you received
from the FileSystem.create method?
It sounds like the close is actually happening in the finalizer method on
the object.

Can you post the relevant code, or provide a cut down demonstrator?

On Wed, Aug 12, 2009 at 5:57 AM, Palleti, Pallavi <
pallavi.palleti@corp.aol.com> wrote:

> Hi Jason,
>
> The file is neither visible via Namenode UI nor via program(checking
> whether a file exists).
>
> There is no caching happening at the application level. The application
> is pretty simple. We are taking apache logs and trying to put into
> timely buckets based on the logged time of records. We are creating 4
> files(one for every 15 minutes) for every hour. So, at the client side,
> we are looking into the logs and see if the data belongs to the current
> interval, then we are writing into the currently opened HDFS file. If it
> belongs to new interval, the old file is closed and new file is created.
> I have been logging the time at which the file is being created and at
> which the file is being closed at my client side. And, I could see that
> the file is getting closed at expected time period. But, when I look for
> the same file in hadoop cluster, it is still not created and if I wait
> for another 1 to 2 hours, I could see the file.
>
> Thanks
> Pallavi
>
>
> -----Original Message-----
> From: Jason Venner [mailto:jason.hadoop@gmail.com]
> Sent: Wednesday, August 12, 2009 6:03 PM
> To: common-user@hadoop.apache.org
> Subject: Re: File is closed but data is not visible
>
> Is it possible that your application is caching some data and not
> refreshing
> it when you expect?
> The HDFS file visibility semantics are well understood, and your case
> does
> not fit with that understanding.
> A factor that hints strongly at this is that your file is visible via
> the
> Namenode UI, there is nothing special about that UI
>
> On Tue, Aug 11, 2009 at 9:00 PM, Pallavi Palleti <
> pallavi.palleti@corp.aol.com> wrote:
>
> > Hi Raghu,
> >
> > The file doesn't appear in the cluster when I saw it from Namenode UI.
> > Also, I have a monitor at cluster side which checks whether file is
> created
> > and throws an exception when it is not created. And, it threw an
> exception
> > saying "File not found".
> >
> > Thanks
> > Pallavi
> > ----- Original Message -----
> > From: "Raghu Angadi" <rangadi@yahoo-inc.com>
> > To: common-user@hadoop.apache.org
> > Sent: Wednesday, August 12, 2009 12:10:12 AM GMT +05:30 Chennai,
> Kolkata,
> > Mumbai, New Delhi
> > Subject: Re: File is closed but data is not visible
> >
> >
> > Your assumption is correct. When you close the file, others can read
> the
> > data. There is no delay expected before the data is visible. If there
> is
> > an error either write() or close() would throw an error.
> >
> > When you say data is not visible do you mean readers can not see the
> > file or can not see the data? Is it guaranteed that readers open the
> > file _after_ close returns on the writer?
> >
> > Raghu.
> >
> > Palleti, Pallavi wrote:
> > > Hi Jason,
> > >
> > > Apologies for missing version information in my previous mail. I am
> > > using hadoop-0.18.3. I am getting FSDataOutputStream object using
> > > fs.create(new Path(some_file_name)), where fs is FileSystem object.
> And,
> > > I am closing the file using close().
> > >
> > > Thanks
> > > Pallavi
> > >
> > > -----Original Message-----
> > > From: Jason Venner [mailto:jason.hadoop@gmail.com]
> > > Sent: Tuesday, August 11, 2009 6:24 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: Re: File is closed but data is not visible
> > >
> > > Please provide information on what version of hadoop you are using
> and
> > > the
> > > method of opening and closing the file.
> > >
> > >
> > > On Tue, Aug 11, 2009 at 12:48 AM, Pallavi Palleti <
> > > pallavi.palleti@corp.aol.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> We have an application where we pull logs from an external
> server(far
> > > apart
> > >> from hadoop cluster) to hadoop cluster. Sometimes, we could see
> huge
> > > delay
> > >> (of 1 hour or more) in actually seeing the data in HDFS though the
> > > file has
> > >> been closed and the variable is set to null from the external
> > > application.I
> > >> was in the impression that when I close the file, the data gets
> > > reflected in
> > >> hadoop cluster. Now, in this situation, it is even more complicated
> to
> > >> handle write failures as it is giving false impression to the
> client
> > > that
> > >> data has been written to HDFS. Kindly clarify if my perception is
> > > correct.
> > >> If yes, Could some one tell me what is causing the delay in
> actually
> > > showing
> > >> the data. During those cases, how can we tackle write failures (due
> to
> > > some
> > >> temporary issues like data node not available, disk is full) as
> there
> > > is no
> > >> way, we can figure out the failure at the client side?
> > >>
> > >> Thanks
> > >> Pallavi
> > >>
> > >
> > >
> > >
> >
> >
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message