Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ADA35F5B1 for ; Fri, 31 May 2013 15:51:35 +0000 (UTC) Received: (qmail 61936 invoked by uid 500); 31 May 2013 15:51:30 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 61702 invoked by uid 500); 31 May 2013 15:51:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 61682 invoked by uid 99); 31 May 2013 15:51:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 May 2013 15:51:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adamantios.corais@gmail.com designates 74.125.83.54 as permitted sender) Received: from [74.125.83.54] (HELO mail-ee0-f54.google.com) (74.125.83.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 May 2013 15:51:23 +0000 Received: by mail-ee0-f54.google.com with SMTP id d49so133039eek.27 for ; Fri, 31 May 2013 08:51:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fXV0nst5t+NYwAJFNyRhiBxVIt068udSrtoDxCmzAUA=; b=PY09bNDeHSjnrPrccGMMT+AQwOoH12GvaVPLmHgB6r9nITbaCHD1D3brCigWf5JAXN ny+KgfZg3AIKgnCGNVNnFYDqCxA1gZVzJ1nrKpiWBKRxahDec5y/muamaCPTn5gzf9L5 zFadHxKZGyd+LIy/YSVmj+VZ4X/BVfXDLeDTkCaXN8mEU+47pXKqLjNX5GRVIXs+KRvJ jFVpZ1rdB7OGlK1sbwY/uIwksrmNaAnAu8G3WOSm5WV4G4ZvcLKIVeJ9nqIikVVmh7Ki 0LWCPdGlhaFEP9wg2FKPRvF2msSye+PFpISI2LD1Q6O9Ab36f807fFXLn0VC22rP3MyV LTxQ== MIME-Version: 1.0 X-Received: by 10.15.45.196 with SMTP id b44mr13789884eew.6.1370015462190; Fri, 31 May 2013 08:51:02 -0700 (PDT) Received: by 10.15.55.199 with HTTP; Fri, 31 May 2013 08:51:02 -0700 (PDT) In-Reply-To: References: Date: Fri, 31 May 2013 17:51:02 +0200 Message-ID: Subject: Re: File Reloading From: Adamantios Corais To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e0160c2f4f61ec504de059479 X-Virus-Checked: Checked by ClamAV on apache.org --089e0160c2f4f61ec504de059479 Content-Type: text/plain; charset=ISO-8859-1 @Raj: so, updating the data and storing them into the same destination would work? @Shahab the file is very small, and therefore I am expecting to read it at once. what would you suggest? On Fri, May 31, 2013 at 5:30 PM, Shahab Yunus wrote: > I might not have understood your usecase properly so I apologize for that. > > But what I think here you need is something outside of Hadoop/HDFS. I am > presuming that you need to read the whole updated file when you are going > to process it with your never-ending job, right? You don't expect to read > it piecemeal or in chunks. If that is indeed the case, then your never > ending job can use generic techniques to check whether file's signature or > any property has changed from the last time and only process it if it has > changed. You file writing/updating process can update the file > independently of the reading/processing one. > > Regards, > Shahab > > > On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais < > adamantios.corais@gmail.com> wrote: > >> I am new to hadoop so apologize beforehand for my very-fundamental >> question. >> >> Lets assume that I have a file stored into hadoop that it gets updated >> once a day, Also assume that there is a task running at the back end of >> hadoop that never stops. How could I reload this file so that hadoop starts >> considering the updated values than the old ones??? >> > > --089e0160c2f4f61ec504de059479 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
@Raj: so, updating the data and storing them into the same= destination would work?

@Shahab the file is very small, and therefo= re I am expecting to read it at once. what would you suggest?


On Fri, May 31, 2013 at 5:30 PM, Shahab = Yunus <shahab.yunus@gmail.com> wrote:
I might not have understood your usecase properly so I apo= logize for that.=A0

But what I think here you need is so= mething outside of Hadoop/HDFS. I am presuming that you need to read the wh= ole updated file when you are going to process it with your never-ending jo= b, right? You don't expect to read it piecemeal or in chunks. If that i= s indeed the case, then your never ending job can use generic techniques to= check whether file's signature or any property has changed from the la= st time and only process it if it has changed. You file writing/updating pr= ocess can update the file independently of the reading/processing one.

Regards,
Shahab


On Fri, May 31, 2013 at 11:23 AM, Adamantios Corais <ad= amantios.corais@gmail.com> wrote:
I am new to hadoop so apolo= gize beforehand for my very-fundamental question.

Lets assume that I= have a file stored into hadoop that it gets updated once a day, Also assum= e that there is a task running at the back end of hadoop that never stops. = How could I reload this file so that hadoop starts considering the updated = values than the old ones???


--089e0160c2f4f61ec504de059479--