hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Azuryy Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-7251) Avoid flood logs during client disconnect during batch get operation
Date Sat, 01 Dec 2012 14:25:58 GMT

     [ https://issues.apache.org/jira/browse/HBASE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Azuryy Yu updated HBASE-7251:
-----------------------------

    Description: 
Background:

A smart guy in the company want to read data from the HBASE in batch, the code like the following:(just
demonstrate, not runnable):

	List<Get> gets = new ArrayList<Get>();

	for(int i =0; i < n; ++i){
		gets.add("some row key here");

		if (i % 10000 == 0){
			Results[] results = htable.get(gets);
			//process results here.  so delete some code
		}
	}



Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" operation in
his code.

One region server becomes very slow, and crashed after 30mins becauseof OOM, but we got 15GB
log file.
there are flood logs as following:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d),
rpc version=1, client version=29, methodsFingerPrint=-56040613 from 10.1.1.1:57933 after 3980
ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)

Fix:
Server is stop "get" but cannot exit from the "for" loop, so write flood logs here.
My patch just log one exception log instead of flood logs. 
Importantly, server stop processing immediately if client timeout or disconnect.

Test:
I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, when it becomes
very slow to get results, I pressed ctrl+C, then there is only ONE CallerDisconnectedException
exception log and the server stop reading immediately, LOG generate the last log entry:

 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a ClosedChannelException,
this means that the server was processing a request but the client went away. The error message
was: null




  was:
Background:

A smart guy in the company want to read data from the HBASE in batch, the code like the following:(just
demonstrate, not runnable):

	List<Get> gets = new ArrayList<Get>();

	for(int i =0; i < n; ++i){
		gets.add("some row key here");

		if (i % 10000 == 0){
			Results[] results = htable.get(gets);
			//process results here. 
			//delete some code
		}
	}



Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" operation in
his code.

One region server becomes very slow, and crashed after 30mins becauseof OOM, but we got 15GB
log file.
there are flood logs as following:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d),
rpc version=1, client version=29, methodsFingerPrint=-56040613 from 10.204.8.148:57933 after
3980 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)

Fix:
Server is stop "get" but cannot exit from the "for" loop, so write flood logs here.
My patch just log one exception log instead of flood logs. 
Importantly, server stop processing immediately if client timeout or disconnect.

Test:
I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, when it becomes
very slow to get results, I pressed ctrl+C, then there is only ONE CallerDisconnectedException
exception log and the server stop reading immediately, LOG generate the last log entry:

 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a ClosedChannelException,
this means that the server was processing a request but the client went away. The error message
was: null




    
> Avoid flood logs during client disconnect during batch get operation
> --------------------------------------------------------------------
>
>                 Key: HBASE-7251
>                 URL: https://issues.apache.org/jira/browse/HBASE-7251
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.2
>            Reporter: Azuryy Yu
>              Labels: patch
>             Fix For: 0.94.3
>
>         Attachments: HBASE-7251.patch
>
>
> Background:
> A smart guy in the company want to read data from the HBASE in batch, the code like the
following:(just demonstrate, not runnable):
> 	List<Get> gets = new ArrayList<Get>();
> 	for(int i =0; i < n; ++i){
> 		gets.add("some row key here");
> 		if (i % 10000 == 0){
> 			Results[] results = htable.get(gets);
> 			//process results here.  so delete some code
> 		}
> 	}
> Yes, you know that, this guy forgot "gets.clear()"  after each "htable.get()" operation
in his code.
> One region server becomes very slow, and crashed after 30mins becauseof OOM, but we got
15GB log file.
> there are flood logs as following:
> ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d),
rpc version=1, client version=29, methodsFingerPrint=-56040613 from 10.1.1.1:57933 after 3980
ms, since caller disconnected
>         at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
>         at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
>         at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
>         at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
>         at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
>         at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>         at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)
> Fix:
> Server is stop "get" but cannot exit from the "for" loop, so write flood logs here.
> My patch just log one exception log instead of flood logs. 
> Importantly, server stop processing immediately if client timeout or disconnect.
> Test:
> I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, when it becomes
very slow to get results, I pressed ctrl+C, then there is only ONE CallerDisconnectedException
exception log and the server stop reading immediately, LOG generate the last log entry:
>  WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a ClosedChannelException,
this means that the server was processing a request but the client went away. The error message
was: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message