Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of adamantios.corais@gmail.com
 designates 74.125.83.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEo-6+QzJOAkSJdA5im31uXmeaNc0fMaM1P=FvT3U6gv0_CF2g@mail.gmail.com>
References: 
 <CAJ9RK0u2YRLO5LVvYotAemKNUE2sgYX121z8HOAsQea1cBcaxA@mail.gmail.com>
	<CAEo-6+QzJOAkSJdA5im31uXmeaNc0fMaM1P=FvT3U6gv0_CF2g@mail.gmail.com>
Date: Fri, 31 May 2013 17:51:02 +0200
Message-ID: 
 <CAJ9RK0vLJgU7BFanT-jercwVXW3wBp3HMv5kFM0qsx-UcVDFYw@mail.gmail.com>
Subject: Re: File Reloading
From: Adamantios Corais <adamantios.corais@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0160c2f4f61ec504de059479

--089e0160c2f4f61ec504de059479
Content-Type: text/plain; charset=ISO-8859-1

@Raj: so, updating the data and storing them into the same destination
would work?

@Shahab the file is very small, and therefore I am expecting to read it at
once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus <shahab.yunus@gmail.com>wrote:

> I might not have understood your usecase properly so I apologize for that.
>
> But what I think here you need is something outside of Hadoop/HDFS. I am
> presuming that you need to read the whole updated file when you are going
> to process it with your never-ending job, right? You don't expect to read
> it piecemeal or in chunks. If that is indeed the case, then your never
> ending job can use generic techniques to check whether file's signature or
> any property has changed from the last time and only process it if it has
> changed. You file writing/updating process can update the file
> independently of the reading/processing one.
>
> Regards,
> Shahab
>
>
> On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
>> I am new to hadoop so apologize beforehand for my very-fundamental
>> question.
>>
>> Lets assume that I have a file stored into hadoop that it gets updated
>> once a day, Also assume that there is a task running at the back end of
>> hadoop that never stops. How could I reload this file so that hadoop starts
>> considering the updated values than the old ones???
>>
>
>

--089e0160c2f4f61ec504de059479
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">@Raj: so, updating the data and storing them into the same=
 destination would work?<br><br>@Shahab the file is very small, and therefo=
re I am expecting to read it at once. what would you suggest?<br></div><div=
 class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Fri, May 31, 2013 at 5:30 PM, Shahab =
Yunus <span dir=3D"ltr">&lt;<a href=3D"mailto:shahab.yunus@gmail.com" targe=
t=3D"_blank">shahab.yunus@gmail.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex">
<div dir=3D"ltr">I might not have understood your usecase properly so I apo=
logize for that.=A0<div><br></div><div>But what I think here you need is so=
mething outside of Hadoop/HDFS. I am presuming that you need to read the wh=
ole updated file when you are going to process it with your never-ending jo=
b, right? You don&#39;t expect to read it piecemeal or in chunks. If that i=
s indeed the case, then your never ending job can use generic techniques to=
 check whether file&#39;s signature or any property has changed from the la=
st time and only process it if it has changed. You file writing/updating pr=
ocess can update the file independently of the reading/processing one.</div=
>

<div><br></div><div>Regards,</div><div>Shahab</div></div><div class=3D"HOEn=
Zb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <span dir=3D"l=
tr">&lt;<a href=3D"mailto:adamantios.corais@gmail.com" target=3D"_blank">ad=
amantios.corais@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I am new to hadoop so apolo=
gize beforehand for my very-fundamental question.<br><br>Lets assume that I=
 have a file stored into hadoop that it gets updated once a day, Also assum=
e that there is a task running at the back end of hadoop that never stops. =
How could I reload this file so that hadoop starts considering the updated =
values than the old ones???<br>


</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e0160c2f4f61ec504de059479--