Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 34AF51103A for ; Mon, 16 Jun 2014 17:19:18 +0000 (UTC) Received: (qmail 46918 invoked by uid 500); 16 Jun 2014 17:19:13 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46606 invoked by uid 500); 16 Jun 2014 17:19:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46593 invoked by uid 99); 16 Jun 2014 17:19:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jun 2014 17:19:12 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jun 2014 17:19:07 +0000 Received: by mail-ie0-f180.google.com with SMTP id rl12so5136340iec.25 for ; Mon, 16 Jun 2014 10:18:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=2Qdc4yZdk6zFxqXy2WHWN8MOX+4ZZkK2mkfrRpbbiFk=; b=X/xSgJbduORJWz9lX6Y6MzZzcN9RNTqwjpQFPOf5dCRc4Pi8j3YDSidpyXvjedelkn utb5pTyrnWoJ8DP8lxujd/1EXY8+OlwT5qtt2gum8ZHWVomvd7HauUJZ/WJDHsdV6n/N 7p+vRPfrRiFZ2h4epEWvRQsqOrZ9eo6Xq7Rnqe71T/d3QTszL2Axw/uFbEA2msFTTvER FaoWVuBSGATH//2GjskWF0IY7OtelHS2UL5Ke1BUGp0fXctTJH0GCz2WbhaGoGNKoZXB s9QrKZAha5joTpaSmZC96NKBU0aDxbupYv9oqiScjC18E94FUwXH8KEiGLnAZTjCiZdM d8uA== X-Gm-Message-State: ALoCoQlxSxS30lPEWJ/4uH6oaHFO/Av8yH3IinEJWmJMvNDnBkx5zfD0hgiWVbax6qvcGE9LN5Tf X-Received: by 10.43.53.73 with SMTP id vp9mr9195179icb.61.1402939126679; Mon, 16 Jun 2014 10:18:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.231.146 with HTTP; Mon, 16 Jun 2014 10:18:26 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Mon, 16 Jun 2014 22:48:26 +0530 Message-ID: Subject: Re: Recover HDFS lease after crash To: "" Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org You are likely hitting this: https://issues.apache.org/jira/browse/HDFS-3848 On Mon, Jun 16, 2014 at 10:17 PM, Bogdan Raducanu wrote: > Thanks. I tried to call recoverLease before doing fs.append. Now I'm getting > only the AlreadyBeingCreatedException > ("org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): > failed to create file /lease_fix for DFSClient_NONMAPREDUCE_394503315_1 for > client 10.0.0.1 because current leaseholder is trying to recreate file.") > once and then it seems to work. > > But it's curious why I'm getting that exception now. I traced it to this > code, in FSNamesystem.java: > > // > // We found the lease for this file. And surprisingly the original > // holder is trying to recreate this file. This should never occur. > // > if (!force && lease != null) { > Lease leaseFile = leaseManager.getLeaseByPath(src); > if ((leaseFile != null && leaseFile.equals(lease)) || > lease.getHolder().equals(holder)) { > throw new AlreadyBeingCreatedException( > "failed to create file " + src + " for " + holder + > " for client " + clientMachine + > " because current leaseholder is trying to recreate file."); > } > } > > It seems to me that that exception will always be thrown because > lease.getHolder().equals(holder) is always true. It should've been > leaseFile.getHolder().equals(holder) perhaps. > > > On Mon, Jun 16, 2014 at 5:47 PM, Ted Yu wrote: >> >> Please take a look at the following method in DFSClient: >> >> /** >> >> * Recover a file's lease >> >> * @param src a file's path >> >> * @return true if the file is already closed >> >> * @throws IOException >> >> */ >> >> boolean recoverLease(String src) throws IOException { >> >> Cheers >> >> >> >> On Mon, Jun 16, 2014 at 8:26 AM, Anonymous wrote: >>> >>> Hello, >>> >>> I have a long running application that opens a file and periodically >>> appends to it. If this application is killed and then restarted it cannot >>> open the same file again for some time (~ 1minute). First, it gets the >>> AlreadyBeingCreated exception (which I guess means namenode doesn't yet know >>> the program crashed) and then the RecoveryInProgress exception (which I >>> guess means the namenode proceeded to close and release the file after >>> inactivity). After about 1 minute it starts to work again. >>> >>> What is the correct way to recover from this? Is there API for recovering >>> the lease and resuming appending faster? DFSClient sets a randomized client >>> name. If it were to send the same client name as before the crash, would it >>> receive a lease on the file faster? >>> >>> Thanks >> >> > -- Harsh J