hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: How to delete files older than X days in HDFS/Hadoop
Date Mon, 28 Nov 2011 04:37:01 GMT
If you're able to put your data in directories named by date (i.e.
yyyyMMdd), you can take advantage of the fact that the HDFS client will
return directories in sort order of the name, which returns the most recent
dirs last. You can then cron a bash script that deletes all the but last N
directories returned where N is how many days you want to keep.

On Sat, Nov 26, 2011 at 8:26 PM, Ronnie Dove <ronnie@oceansync.com> wrote:

> Hello Raimon,
> I like the idea of being able to search through files on HDFS so that we
> can find keywords or timestamp criteria, something that OceanSync will be
> doing in the future as a tool option.  The others have told you some great
> ideas but I wanted to help you out from a Java API perspective.  If you are
> a Java programmer, you would utilize FileSystem.listFiles() which returns
> the directory listing in a FileStatus[] format.  You would crawl through
> the FileStatus Array in search for whether the FileStatus is a file or a
> directory.  If it is a file, you will check the time stamp of the file
> using the FileStatus.getModificationTime().  If its a directory than it
> will be processed again using a while loop to check the contents of that
> directory.  This sounds tough but as part of testing this, it is fairly
> fast and accurate.  Below are the two API's that are needed as part of this
> method:
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileStatus.html
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html
> ________________________________
> Ronnie Dove
> OceanSync Management Developer
> http://www.oceansync.com
> "RDove" on irc.freenode.net #Hadoop
> ----- Original Message -----
> From: Raimon Bosch <raimon.bosch@gmail.com>
> To: common-user@hadoop.apache.org
> Cc:
> Sent: Saturday, November 26, 2011 10:01 AM
> Subject: How to delete files older than X days in HDFS/Hadoop
> Hi,
> I'm wondering how to delete files older than X days with HDFS/Hadoop. On
> linux we can do it with the folowing command:
> find ~/datafolder/* -mtime +7 -exec rm {} \;
> Any ideas?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message