hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Xu <wenhao...@gmail.com>
Subject Re: Error of "Got error in response to OP_READ_BLOCK for file"
Date Tue, 10 May 2011 12:44:26 GMT
Thanks J-D. A little more confused that is it looks when we have a corrupt
hbase table or some inconsistency data, we will got lots of message like
that. But if the hbase table is proper, we will also get some lines of
messages like that.

How could I identify if it comes from a corruption in data or just some
mis-hit in the scenario you mentioned?



On Tue, May 10, 2011 at 6:23 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Very often the "cannot open filename" happens when the region in
> question was reopened somewhere else and that region was compacted. As
> to why it was reassigned, most of the time it's because of garbage
> collections taking too long. The master log should have all the
> required evidence, and the region server should print some "slept for
> Xms" (where X is some number of ms) messages before everything goes
> bad.
>
> Here are some general tips on debugging problems in HBase
> http://hbase.apache.org/book/trouble.html
>
> J-D
>
> On Sat, May 7, 2011 at 2:10 AM, Stanley Xu <wenhao.xu@gmail.com> wrote:
> > Dear all,
> >
> > We were using HBase 0.20.6 in our environment, and it is pretty stable in
> > the last couple of month, but we met some reliability issue from last
> week.
> > Our situation is very like the following link.
> >
> http://search-hadoop.com/m/UJW6Efw4UW/Got+error+in+response+to+OP_READ_BLOCK+for+file&subj=HBase+fail+over+reliability+issues
> >
> > When we use a hbase client to connect to the hbase table, it looks stuck
> > there. And we can find the logs like
> >
> > WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /
> > 10.24.166.74:50010 for *file*
> /hbase/users/73382377/data/312780071564432169
> > for block -4841840178880951849:java.io.IOException: *Got* *error* in *
> > response* to
> > OP_READ_BLOCK for *file* /hbase/users/73382377/data/312780071564432169
> for
> > block -4841840178880951849
> >
> > INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on 60020,
> call
> > get([B@25f907b4, row=963aba6c5f351f5655abdc9db82a4cbd, maxVersions=1,
> > timeRange=[0,9223372036854775807), families={(family=data, columns=ALL})
> > from 10.24.117.100:2365: *error*: java.io.IOException: Cannot open
> filename
> > /hbase/users/73382377/data/312780071564432169
> > java.io.IOException: Cannot open filename
> > /hbase/users/73382377/data/312780071564432169
> >
> >
> > WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(
> > 10.24.166.74:50010,
> storageID=DS-14401423-10.24.166.74-50010-1270741415211,
> > infoPort=50075, ipcPort=50020):
> > *Got* exception while serving blk_-4841840178880951849_50277 to /
> > 10.25.119.113
> > :
> > java.io.IOException: Block blk_-4841840178880951849_50277 is not valid.
> >
> > in the server side.
> >
> > And if we do a flush and then a major compaction on the ".META.", the
> > problem just went away, but will appear again some time later.
> >
> > At first we guess it might be the problem of xceiver. So we set the
> xceiver
> > to 4096 as the link here.
> > http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html
> >
> > But we still get the same problem. It looks that a restart of the whole
> > HBase cluster will fix the problem for a while, but actually we could not
> > say always trying to restart the server.
> >
> > I am waiting online, will really appreciate any message.
> >
> >
> > Best wishes,
> > Stanley Xu
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message