hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: File Reloading
Date Fri, 31 May 2013 15:30:04 GMT
I might not have understood your usecase properly so I apologize for that.

But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
it piecemeal or in chunks. If that is indeed the case, then your never
ending job can use generic techniques to check whether file's signature or
any property has changed from the last time and only process it if it has
changed. You file writing/updating process can update the file
independently of the reading/processing one.


On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:

> I am new to hadoop so apologize beforehand for my very-fundamental
> question.
> Lets assume that I have a file stored into hadoop that it gets updated
> once a day, Also assume that there is a task running at the back end of
> hadoop that never stops. How could I reload this file so that hadoop starts
> considering the updated values than the old ones???

View raw message