hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bi,hongyu—mike <boyl...@gmail.com>
Subject Re: client call scan on some region hang
Date Wed, 04 Feb 2015 08:00:36 GMT
thanks ted:)

2015-02-04 11:19 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:

> Will integrate patch once QA run finishes.
>
> Thanks
>
> On Tue, Feb 3, 2015 at 5:40 PM, Bi,hongyu—mike <boylook@gmail.com> wrote:
>
> > Hi ted,
> > sorry for the late response,
> > i just file a jira https://issues.apache.org/jira/browse/HBASE-12957 for
> > this issue
> > thanks
> >
> > 2015-01-07 23:40 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >
> > > In 0.98, HRegionServer is annotated with @InterfaceAudience.Private
> > > In 1.0+, it is annotated
> > > with @InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.TOOLS)
> > >
> > > FYI
> > >
> > > On Tue, Jan 6, 2015 at 7:55 PM, Bi,hongyu—mike <boylook@gmail.com>
> > wrote:
> > >
> > > > Thanks Ted , I didn't notice that ;P
> > > >
> > > > 2015-01-07 11:47 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> > > >
> > > > > In master and branch-1 branches, there is no 'GetResponse get()'
> > method
> > > > in
> > > > > HRegionServer anymore.
> > > > >
> > > > > FYI
> > > > >
> > > > > On Tue, Jan 6, 2015 at 7:26 PM, Bi,hongyu—mike <boylook@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > KeyOnlyFilter may improve the scan speed but I don't think the
> scan
> > > may
> > > > > > finish less than leaseTimeout in such case;
> > > > > > From the HRegionServer#get I see:
> > > > > >
> > > > > > HRegion region = getRegion(regionName);  here getRegion may
throw
> > > > > > NotServingRegionException that is need  by isSuccessfulScan;
> > > > > >
> > > > > > and HRegionServer#get can return as soon as possible;
> > > > > >
> > > > > > 2015-01-07 11:00 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> > > > > >
> > > > > > > For isSuccessfulScan(), I see:
> > > > > > >
> > > > > > >   scan.setBatch(1)
> > > > > > >   scan.setCaching(1)
> > > > > > >   scan.setFilter(FirstKeyOnlyFilter.new())
> > > > > > >
> > > > > > > How about adding a KeyOnlyFilter as well ?
> > > > > > >
> > > > > > > On Tue, Jan 6, 2015 at 6:37 PM, Bi,hongyu—mike <
> > boylook@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Ted,
> > > > > > > > Finally I resolved the issue, the RC is :region_mover
will
> call
> > > > > > > > isSuccessfulScan to scan the startkey of the moved
region
> which
> > > > > filled
> > > > > > > with
> > > > > > > > lots of expired cells,so it seems scan hang;
> > > > > > > > I think isSuccessfulScan is just to test whether the
moved
> > region
> > > > is
> > > > > > > > readable or not, why not to use get instead which
may avoid
> > such
> > > > case
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2015-01-06 20:59 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> > > > > > > >
> > > > > > > > > Can you pastebin region server log ?
> > > > > > > > >
> > > > > > > > > When the scan is being performed, can you get
jstack and
> > > pastebin
> > > > > it
> > > > > > ?
> > > > > > > > >
> > > > > > > > > 0.94.15 was an old release, any chance of upgrade
?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > On Jan 6, 2015, at 2:34 AM, Bi,hongyu—mike
<
> > > boylook@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > sorry , forgot to attach the version:
0.94.15;
> > > > > > > > > >
> > > > > > > > > > and i call compact (as well as many times
flush region)
> > from
> > > > > hbase
> > > > > > > > shell
> > > > > > > > > > didn't take effect, no compaction happened;
> > > > > > > > > >
> > > > > > > > > > 2015-01-06 18:26 GMT+08:00 Bi,hongyu—mike
<
> > boylook@gmail.com
> > > >:
> > > > > > > > > >
> > > > > > > > > >> scan debug log:
> > > > > > > > > >> 15/01/06 18:20:56 DEBUG client.ClientScanner:
Creating
> > > scanner
> > > > > > over
> > > > > > > T
> > > > > > > > > >> starting at key 'Rowx'
> > > > > > > > > >> 15/01/06 18:20:56 DEBUG client.ClientScanner:
Advancing
> > > > internal
> > > > > > > > scanner
> > > > > > > > > >> to startKey at 'Rowx'
> > > > > > > > > >> 15/01/06 18:20:56 DEBUG client.MetaScanner:
Scanning
> > .META.
> > > > > > starting
> > > > > > > > at
> > > > > > > > > >> row=XXXX for max=10 rows using
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@427b7b5d
> > > > > > > > > >> 15/01/06 18:20:56 DEBUG
> > > > > > > > > >> client.HConnectionManager$HConnectionImplementation:
> > Cached
> > > > > > location
> > > > > > > > for
> > > > > > > > > >> <THAT REGION> is RS_IP:60020
> > > > > > > > > >> ......
> > > > > > > > > >> 15/01/06 18:21:16 DEBUG zookeeper.ClientCnxn:
Got ping
> > > > response
> > > > > > for
> > > > > > > > > >> sessionid: 0x3499df682b076cf after 0ms
> > > > > > > > > >> 15/01/06 18:21:36 DEBUG zookeeper.ClientCnxn:
Got ping
> > > > response
> > > > > > for
> > > > > > > > > >> sessionid: 0x3499df682b076cf after 0ms
> > > > > > > > > >> 15/01/06 18:21:56 DEBUG zookeeper.ClientCnxn:
Got ping
> > > > response
> > > > > > for
> > > > > > > > > >> sessionid: 0x3499df682b076cf after 0ms
> > > > > > > > > >> 15/01/06 18:21:56 DEBUG zookeeper.ClientCnxn:
Reading
> > reply
> > > > > > > > > >> sessionid:0x3499df682b076cf, packet::
clientPath:null
> > > > > > > serverPath:null
> > > > > > > > > >> finished:false header:: 9,4  replyHeader::
> > > 9,21519728740,-101
> > > > > > > > request::
> > > > > > > > > >> '/hbase/table/T,F  response::
> > > > > > > > > >> 15/01/06 18:21:56 DEBUG
> > > > > > > > > >> client.HConnectionManager$HConnectionImplementation:
> > Removed
> > > > > <THAT
> > > > > > > > > REGION>
> > > > > > > > > >> for tableName=T from cache because of
Rowx
> > > > > > > > > >> 15/01/06 18:21:56 DEBUG
> > > > > > > > > >> client.HConnectionManager$HConnectionImplementation:
> > Cached
> > > > > > location
> > > > > > > > for
> > > > > > > > > >> <THAT REGION> is RS_IP:60020
> > > > > > > > > >> 15/01/06 18:21:56 DEBUG client.ClientScanner:
Advancing
> > > > internal
> > > > > > > > scanner
> > > > > > > > > >> to startKey at 'Rowx'
> > > > > > > > > >>
> > > > > > > > > >> 2015-01-06 18:09 GMT+08:00 Bi,hongyu—mike
<
> > > boylook@gmail.com
> > > > >:
> > > > > > > > > >>
> > > > > > > > > >>> write traffic is ok:
> > > > > > > > > >>> 2015-01-06 17:46:01,127 WARN
> > > > > > > > org.apache.hadoop.hbase.ipc.SecureServer:
> > > > > > > > > >>> (responseTooSlow):
> > > > > {"processingtimems":68,"call":"multi(Region=Rx
> > > > > > > of
> > > > > > > > > 149
> > > > > > > > > >>> actions and first row key= Rowx),
rpc version=1, client
> > > > > > version=29,
> > > > > > > > > >>> methodsFingerPrint=-1105746420","client":"IP:port}
> > > > > > > > > >>>
> > > > > > > > > >>> scan on that region slow:
> > > > > > > > > >>> 015-01-06 16:23:25,087 ERROR
> > > > > > > > > >>> org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > > > > > > >>>
> org.apache.hadoop.hbase.ipc.CallerDisconnectedException:
> > > > > Aborting
> > > > > > > on
> > > > > > > > > >>> region Rx, call next(8002464006782223710,
1, 0), rpc
> > > > version=1,
> > > > > > > > client
> > > > > > > > > >>> version=29, methodsFingerPrint=-1771721648
from
> > > > > > > 10.201.202.31:31285
> > > > > > > > > >>> after 87821 ms, since caller disconnected
> > > > > > > > > >>>        at
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:438);
> > > > > > > > > >>>
> > > > > > > > > >>> hbase hfile -r 'Rx' -p     can produce
the result
> > > > > > > > > >>>
> > > > > > > > > >>> 2015-01-06 18:03 GMT+08:00 Bi,hongyu—mike
<
> > > boylook@gmail.com
> > > > >:
> > > > > > > > > >>>
> > > > > > > > > >>>> Hi  all,
> > > > > > > > > >>>>
> > > > > > > > > >>>> There's one region which can
take write request but
> > scan;
> > > > > > > > > >>>> If I scan on that region I'll
get scanner lease
> > > timeout(60s
> > > > by
> > > > > > > > > >>>> default),while I can scan
other region of  the same
> > table
> > > > and
> > > > > > get
> > > > > > > > the
> > > > > > > > > >>>> result less than 10ms(our
slow rpc threadhold is
> 10ms);
> > > > > > > > > >>>>
> > > > > > > > > >>>> hbck report OK, and I use "hbase
hfile" tool to check
> > that
> > > > > > > region's
> > > > > > > > > >>>> storefile and the region ,which
all extract the
> result;
> > > > > > > > > >>>>
> > > > > > > > > >>>> so I don't have any idea on
it...
> > > > > > > > > >>>> any help will be appreciate,
many thanks!
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message