hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: Question about dead datanode
Date Thu, 13 Feb 2014 21:41:11 GMT
As far as I can tell I am hitting this issue:

http://grepcode.com/search/usages?type=method&id=repository.cloudera.com%24content%24repositories%24releases@com.cloudera.hadoop%24hadoop-core@0.20.2-320@org%24apache%24hadoop%24hdfs%24protocol@LocatedBlocks@findBlock%28long%29&k=u


1581 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1581>
// search cached blocks first

1582 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1582>
*int* targetBlockIdx = locatedBlocks
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.DFSInputStream.0locatedBlocks>.findBlock
<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java#LocatedBlocks.findBlock%28long%29>(offset);

1583 <http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/hdfs/DFSClient.java#1583>
*if* (targetBlockIdx < 0) { // block is not cached


Our RS DFSClient is asking for a block on a dead datanode because the
block is somehow cached in DDFClient.  It seems that after DN dies,
DFSClients in 90.5v of HBase do not drop the cache reference where
those blocks are.  Seems like a problem.  It would be good if there
was an ability for that cache to expire because our dead DN was down
since Sunday.


-Jack




On Thu, Feb 13, 2014 at 11:23 AM, Stack <stack@duboce.net> wrote:

> RS opens files and then keeps them open as long as the RS is alive.  We're
> failing read of this replica and then we succeed getting the block
> elsewhere?  You get that exception every time?  What hadoop version Jack?
>  You have short-circuit reads on?
> St.Ack
>
>
> On Thu, Feb 13, 2014 at 10:41 AM, Jack Levin <magnito@gmail.com> wrote:
>
> > I meant its in the 'dead' list on HDFS namenode page. Hadoop fsck / shows
> > no issues.
> >
> >
> > On Thu, Feb 13, 2014 at 10:38 AM, Jack Levin <magnito@gmail.com> wrote:
> >
> > >  Good morning --
> > > I had a question, we have had a datanode go down, and its been down for
> > > few days, however hbase is trying to talk to that dead datanode still
> > >  2014-02-13 08:57:23,073 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> to
> > > connect to /10.101.5.5:50010 for file
> > > /hbase/img39/6388c3574c32c409e8387d3c4d10fcdb/att/2690638688138250544
> for
> > > block 805865
> > >
> > > so, question is, how come RS trying to talk to dead datanode, its on in
> > > HDFS list even.
> > >
> > > Isn't the RS is just HDFS client?  And it should not talk to offlined
> > HDFS
> > > datanode that went down?  This caused a lot of issues in our cluster.
> > >
> > > Thanks,
> > > -Jack
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message