hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Re: HDFS File being written
Date Wed, 17 Aug 2011 18:17:30 GMT
Just my $0.02, but I think you really ought to push back to have 
whoever's creating the files upstream do it in one of the manners I 
described.  This way will be way too error prone.  I mean, think about 
it:  with your current set up you not only can't reliably know if the 
creator is finished writing to a file, but you also can't know if even 
if they are finished whether the file write was completed successfully. 
  The creator could have aborted the file write in the middle - either 
purposely or inadvertently - and you'll be trying to process an 
incomplete file.

You really need to employ *some* method to reliably determine when a 
file is successfully uploaded, or you're going to wind up with a very 
buggy system.


On 08/17/2011 01:41 PM, Adam Shook wrote:
> Sadly, I don't have control over naming the files.  They are being ingested in HDFS by
powers out of my control.  I'll mess around with the modification times and see if I can get
a good solution.  If anyone knows of a way that seems less hackish, I am all ears.
> Thanks, Adam
> -----Original Message-----
> From: David Rosenstrauch [mailto:darose@darose.net]
> Sent: Wednesday, August 17, 2011 1:22 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: HDFS File being written
> On 08/17/2011 12:57 PM, Adam Shook wrote:
>> Hello All,
>> Is there any clean way to tell from the API (v0.20.2) that a file in HDFS is currently
being written to?  I've seen some exceptions before related to it, but I was hoping there
is a clean way and Google isn't turning anything up for me.
>> Thanks!
>> -- Adam
> You might be able to do it to some extent using
> FileStatus.getModificationTime()
> (http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html#getModificationTime()),
> but this would really be a hack, IMO, and not something you should rely on.
> I think you'd be better off either a) writing the file to a temp
> directory, or b) writing it with a .tmp extension, and then moving or
> renaming it once the file write is complete.
> HTH,
> DR
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1392 / Virus Database: 1520/3840 - Release Date: 08/17/11

View raw message