hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronnie Dove <ron...@oceansync.com>
Subject Re: How to delete files older than X days in HDFS/Hadoop
Date Sun, 27 Nov 2011 04:26:25 GMT
Hello Raimon, 

I like the idea of being able to search through files on HDFS so that we can find keywords
or timestamp criteria, something that OceanSync will be doing in the future as a tool option.
 The others have told you some great ideas but I wanted to help you out from a Java API perspective.
 If you are a Java programmer, you would utilize FileSystem.listFiles() which returns the
directory listing in a FileStatus[] format.  You would crawl through the FileStatus Array
in search for whether the FileStatus is a file or a directory.  If it is a file, you will
check the time stamp of the file using the FileStatus.getModificationTime().  If its a directory
than it will be processed again using a while loop to check the contents of that directory.
 This sounds tough but as part of testing this, it is fairly fast and accurate.  Below are
the two API's that are needed as part of this method:


Ronnie Dove
OceanSync Management Developer
"RDove" on irc.freenode.net #Hadoop

----- Original Message -----
From: Raimon Bosch <raimon.bosch@gmail.com>
To: common-user@hadoop.apache.org
Sent: Saturday, November 26, 2011 10:01 AM
Subject: How to delete files older than X days in HDFS/Hadoop


I'm wondering how to delete files older than X days with HDFS/Hadoop. On
linux we can do it with the folowing command:

find ~/datafolder/* -mtime +7 -exec rm {} \;

Any ideas?

View raw message