Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63E2E10AA8 for ; Thu, 13 Feb 2014 21:54:01 +0000 (UTC) Received: (qmail 92841 invoked by uid 500); 13 Feb 2014 21:53:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 92547 invoked by uid 500); 13 Feb 2014 21:53:56 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 92284 invoked by uid 500); 13 Feb 2014 21:53:56 -0000 Delivered-To: apmail-hadoop-hbase-user@hadoop.apache.org Received: (qmail 92223 invoked by uid 99); 13 Feb 2014 21:53:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 21:53:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of magnito@gmail.com designates 209.85.128.170 as permitted sender) Received: from [209.85.128.170] (HELO mail-ve0-f170.google.com) (209.85.128.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 21:53:49 +0000 Received: by mail-ve0-f170.google.com with SMTP id cz12so9221900veb.29 for ; Thu, 13 Feb 2014 13:53:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GulSyca4qNH3L+qTXSF6rr9A+l3wAGqLugCuY9qJ2Ls=; b=O+nXYuyHtg98wdHxTy7k/EWtwYsze8+xtrUqf4HovK+ujkq2kTlG9a8RkMEOJD70g9 b46NbyHSxrtwDPkOHUgcC5lkTFazkgdOR3rCzmFVb42D0JCPV82Vpdlwy/veT+Zyo8M4 KKsHvMX1mrqYP2OuCwAjGVJi0NM+kmr/YNzN7IW+/g4jfy/qC6dIVaoZhHPRKmIeP8Go r04Dlwu8tCtcI2CQmpVSibbTaNNEJd2/H9cN0+v9zigi/d8+FUNtXYoTxgLxv1CG6Dpp HROERRmk3WSs+91InedWl548H63GvDHb9G5XDzZ/DYOu76f6SDcILlkJTZvQ0asi49eA zw8Q== MIME-Version: 1.0 X-Received: by 10.58.181.230 with SMTP id dz6mr1717614vec.35.1392328409001; Thu, 13 Feb 2014 13:53:29 -0800 (PST) Received: by 10.58.187.162 with HTTP; Thu, 13 Feb 2014 13:53:28 -0800 (PST) In-Reply-To: References: Date: Thu, 13 Feb 2014 13:53:28 -0800 Message-ID: Subject: Re: Question about dead datanode From: Jack Levin To: "user@hbase.apache.org" Cc: hbase-user Content-Type: multipart/alternative; boundary=047d7b673bda3ab68e04f250b82e X-Virus-Checked: Checked by ClamAV on apache.org --047d7b673bda3ab68e04f250b82e Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable This might be related: http://hadoop.6.n7.nabble.com/Question-on-opening-file-info-from-namenode-i= n-DFSClient-td6679.html > In hbase, we open the file once and keep it open. File is shared > amongst all clients. > Does it mean its perma cached if datanode is dead? -Jack On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin wrote: > As far as I can tell I am hitting this issue: > > > http://grepcode.com/search/usages?type=3Dmethod&id=3Drepository.cloudera.= com%24content%24repositories%24releases@com.cloudera.hadoop%24hadoop-core@0= .20.2-320@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%2= 8long%29&k=3Du > > > > 1581 > // search cached blocks first > > 1582 > *int* targetBlockIdx =3D locatedBlocks .findBlock (offset); > > 1583 > *if* (targetBlockIdx < 0) { // block is not cached > > > Our RS DFSClient is asking for a block on a dead datanode because the blo= ck is somehow cached in DDFClient. It seems that after DN dies, DFSClients= in 90.5v of HBase do not drop the cache reference where those blocks are. = Seems like a problem. It would be good if there was an ability for that c= ache to expire because our dead DN was down since Sunday. > > > -Jack > > > > > On Thu, Feb 13, 2014 at 11:23 AM, Stack wrote: > >> RS opens files and then keeps them open as long as the RS is alive. We'= re >> failing read of this replica and then we succeed getting the block >> elsewhere? You get that exception every time? What hadoop version Jack= ? >> You have short-circuit reads on? >> St.Ack >> >> >> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin wrote: >> >> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / >> shows >> > no issues. >> > >> > >> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin wrote= : >> > >> > > Good morning -- >> > > I had a question, we have had a datanode go down, and its been down >> for >> > > few days, however hbase is trying to talk to that dead datanode stil= l >> > > 2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: >> Failed to >> > > connect to /10.101.5.5:50010 for file >> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/269063868813825054= 4 >> for >> > > block 805865 >> > > >> > > so, question is, how come RS trying to talk to dead datanode, its on >> in >> > > HDFS list even. >> > > >> > > Isn't the RS is just HDFS client? And it should not talk to offline= d >> > HDFS >> > > datanode that went down? This caused a lot of issues in our cluster= . >> > > >> > > Thanks, >> > > -Jack >> > > >> > >> > > --047d7b673bda3ab68e04f250b82e--