hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject Re: Identifying new files on HDFS
Date Wed, 25 Mar 2015 17:41:56 GMT

Have you considered taking snapshot of files at close of business and compare it with the
new snapshot and process only new ones? Just a simple shell script will do.

Let your email find you with BlackBerry from Vodafone

-----Original Message-----
From: Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomireddy@whishworks.com>
Date: Wed, 25 Mar 2015 09:55:57 
To: <user@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Identifying new files on HDFS


We have a requirement to process only new files in HDFS on a daily basis. I
am sure this is a general requirement in many ETL kind of processing
scenarios. Just wondering if there is a way to identify new files that are
added to a path in HDFS? For example, assume already some files were
present for sometime. Now I have added new files today. So wanted to
process only those new files. What is the best way to achieve this.

Thanks & Regards

*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW
*T:  +44 20 3475 7980*
*M: **+44 7481 298 360*
*W: *ww <http://www.whishworks.com/>w.whishworks.com

<http://www.whishworks.com/blog/>  <https://twitter.com/WHISHWORKS>

The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

View raw message