hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pallavi Palleti <pallavi.pall...@corp.aol.com>
Subject File is closed but data is not visible
Date Tue, 11 Aug 2009 07:48:51 GMT
Hi all,

We have an application where we pull logs from an external server(far apart from hadoop cluster)
to hadoop cluster. Sometimes, we could see huge delay (of 1 hour or more) in actually seeing
the data in HDFS though the file has been closed and the variable is set to null from the
external application.I was in the impression that when I close the file, the data gets reflected
in hadoop cluster. Now, in this situation, it is even more complicated to handle write failures
as it is giving false impression to the client that data has been written to HDFS. Kindly
clarify if my perception is correct. If yes, Could some one tell me what is causing the delay
in actually showing the data. During those cases, how can we tackle write failures (due to
some temporary issues like data node not available, disk is full) as there is no way, we can
figure out the failure at the client side?

Thanks
Pallavi

Mime
View raw message