hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adamantios Corais <adamantios.cor...@gmail.com>
Subject Re: File Reloading
Date Fri, 31 May 2013 15:51:02 GMT
@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

Mime
View raw message