Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of magnito@gmail.com designates
 209.85.128.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHyy+G-MiN_WE=vKLtNRw9VyAxdMWRQ+xYyiiRrSs+K4bWvRqw@mail.gmail.com>
References: 
 <CAHyy+G-Y+8bGvT5v-8Y6hHS+_=iJ8w1nrJ+9PKv5ZP-50F_=nA@mail.gmail.com>
	<CAHyy+G82DQwF=L9VJqJfTdv1ZzY0wWwpuyYNMwR20HA43U3CPA@mail.gmail.com>
	<CADcMMgEFh8+kJ-wwLBtRoaP+p1+jDeNYhLYTijhMMt19_Tfyew@mail.gmail.com>
	<CAHyy+G-MiN_WE=vKLtNRw9VyAxdMWRQ+xYyiiRrSs+K4bWvRqw@mail.gmail.com>
Date: Thu, 13 Feb 2014 13:53:28 -0800
Message-ID: 
 <CAHyy+G8P_Aq_DidungNP6dW2HsFrhZnHHu+AOxJxGc5=ETUsEQ@mail.gmail.com>
Subject: Re: Question about dead datanode
From: Jack Levin <magnito@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Cc: hbase-user <hbase-user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b673bda3ab68e04f250b82e

--047d7b673bda3ab68e04f250b82e
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This might be related:

http://hadoop.6.n7.nabble.com/Question-on-opening-file-info-from-namenode-i=
n-DFSClient-td6679.html

> In hbase, we open the file once and keep it open.  File is shared
> amongst all clients.
>

Does it mean its perma cached if datanode is dead?

-Jack


On Thu, Feb 13, 2014 at 1:41 PM, Jack Levin <magnito@gmail.com> wrote:

> As far as I can tell I am hitting this issue:
>
>
> http://grepcode.com/search/usages?type=3Dmethod&id=3Drepository.cloudera.=
com%24content%24repositories%24releases@com.cloudera.hadoop%24hadoop-core@0=
.20.2-320@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%2=
8long%29&k=3Du
>
>
>
> 1581 <http://grepcode.com/file/repository.cloudera.com/content/repositori=
es/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hd=
fs/DFSClient.java#1581>
> // search cached blocks first
>
> 1582 <http://grepcode.com/file/repository.cloudera.com/content/repositori=
es/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hd=
fs/DFSClient.java#1582>
> *int* targetBlockIdx =3D locatedBlocks <http://grepcode.com/file/reposito=
ry.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-co=
re/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStrea=
m.0locatedBlocks>.findBlock <http://grepcode.com/file/repository.cloudera.c=
om/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320=
/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock=
%28long%29>(offset);
>
>  1583 <http://grepcode.com/file/repository.cloudera.com/content/repositor=
ies/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/h=
dfs/DFSClient.java#1583>
> *if* (targetBlockIdx < 0) { // block is not cached
>
>
> Our RS DFSClient is asking for a block on a dead datanode because the blo=
ck is somehow cached in DDFClient.  It seems that after DN dies, DFSClients=
 in 90.5v of HBase do not drop the cache reference where those blocks are. =
 Seems like a problem.  It would be good if there was an ability for that c=
ache to expire because our dead DN was down since Sunday.
>
>
> -Jack
>
>
>
>
> On Thu, Feb 13, 2014 at 11:23 AM, Stack <stack@duboce.net> wrote:
>
>> RS opens files and then keeps them open as long as the RS is alive.  We'=
re
>> failing read of this replica and then we succeed getting the block
>> elsewhere?  You get that exception every time?  What hadoop version Jack=
?
>>  You have short-circuit reads on?
>> St.Ack
>>
>>
>> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <magnito@gmail.com> wrote:
>>
>> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck /
>> shows
>> > no issues.
>> >
>> >
>> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <magnito@gmail.com> wrote=
:
>> >
>> > >  Good morning --
>> > > I had a question, we have had a datanode go down, and its been down
>> for
>> > > few days, however hbase is trying to talk to that dead datanode stil=
l
>> > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient:
>> Failed to
>> > > connect to /10.101.5.5:50010 for file
>> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/269063868813825054=
4
>> for
>> > > block 805865
>> > >
>> > > so, question is, how come RS trying to talk to dead datanode, its on
>> in
>> > > HDFS list even.
>> > >
>> > > Isn't the RS is just HDFS client?  And it should not talk to offline=
d
>> > HDFS
>> > > datanode that went down?  This caused a lot of issues in our cluster=
.
>> > >
>> > > Thanks,
>> > > -Jack
>> > >
>> >
>>
>
>

--047d7b673bda3ab68e04f250b82e--