Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AFEECCAC0 for ; Mon, 13 Aug 2012 10:53:29 +0000 (UTC) Received: (qmail 21567 invoked by uid 500); 13 Aug 2012 10:53:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 21107 invoked by uid 500); 13 Aug 2012 10:53:27 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 21073 invoked by uid 99); 13 Aug 2012 10:53:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 10:53:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ginzman@hotmail.com designates 65.55.90.21 as permitted sender) Received: from [65.55.90.21] (HELO snt0-omc1-s10.snt0.hotmail.com) (65.55.90.21) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 10:53:17 +0000 Received: from SNT002-W137 ([65.55.90.7]) by snt0-omc1-s10.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 13 Aug 2012 03:52:56 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_0b299807-e468-4922-80c7-a5707c4eac13_" X-Originating-IP: [81.218.111.54] From: David Ginzburg To: "mapreduce-user@hadoop.apache.org" Subject: Locks in M/R framework Date: Mon, 13 Aug 2012 10:52:56 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 13 Aug 2012 10:52:56.0449 (UTC) FILETIME=[CBE8F710:01CD7941] --_0b299807-e468-4922-80c7-a5707c4eac13_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi=2C I have an HDFS folder and M/R job that periodically updates it by replacing= the data with newly generated data. I have a different M/R job that periodically or ad-hoc process the data in = the folder. The second job =2Cnaturally=2C fails sometime=2C when the data is replaced = by newly generated data and the job plan including the input paths have alr= eady been submitted. Is there an elegant solution ? My current though is to query the jobtracker for running jobs and go over a= ll the input files=2C in the job XML to know if The swap should block until= the input path is no longer in any current executed input path job. =20 = --_0b299807-e468-4922-80c7-a5707c4eac13_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi=2C

I have an HDFS fold= er and M/R job that periodically updates it by replacing the data with newl= y generated data.

I have a different M/R job that periodically or ad= -hoc process the data in the folder.

The second job =2Cnaturally=2C = fails sometime=2C when the data is replaced by newly generated data and the= job plan including the input paths have already been submitted.

Is = there an elegant solution ?

My current though is to query the jobtra= cker for running jobs and go over all the input files=2C in the job XML to = know if The swap should block until the input path is no longer in any curr= ent executed input path job.



 =3B
<= /body> = --_0b299807-e468-4922-80c7-a5707c4eac13_--