phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-2408) Update statistics fails to complete
Date Fri, 04 Dec 2015 22:34:11 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042350#comment-15042350
] 

Samarth Jain edited comment on PHOENIX-2408 at 12/4/15 10:34 PM:
-----------------------------------------------------------------

Spent the last couple of days trying to figure out what is going on here. On my laptop (1
region server), I loaded a table with 400 millions rows distributed over 8 regions. I added
logging in a few places to see what is going on.  I see errors like these in my logs on the
server side:

Exception caught in post scanner open for scan: 4. Exception: org.apache.hadoop.hbase.ipc.CallerDisconnectedException:
Aborting on region TESTXYZ,\x04\x00\x00\x00\x00\x00\x00\x00\x00,1449215361195.5fa492cebc9f25b9602ecaf1d4601daf.,
call org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl@3efcf4dd after 121324
ms, since caller disconnected
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4144)
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4061)
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4048)
	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:288)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:191)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1305)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1619)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1694)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1658)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1300)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3214)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)

It looks like org.apache.hadoop.hbase.ipc.CallerDisconnectedException is a regular IOException
and not a DoNotRetryIOException. As a result, the BaseScannerRegionObserver#doPostScannerOpen()
re-throws a regular IO exception back to the client resulting in retries. These retries however
are never successful and we end up retrying the default number of times (31).

One thought I had was that I may be maxing out the IO on my laptop SSD. But then, reducing
the number of region server handler threads from default to 2 (to limit the I/O) didn't help
either.

Will keep digging.


was (Author: samarthjain):
Spent the last couple of days trying to figure out what is going on here. On my laptop (1
region server), I loaded a table with 400 millions rows distributed over 8 regions. I added
logging in a few places to see what is going on.  I see errors like these in my logs on the
server side:

Exception caught in post scanner open for scan: 4. Exception: org.apache.hadoop.hbase.ipc.CallerDisconnectedException:
Aborting on region TESTXYZ,\x04\x00\x00\x00\x00\x00\x00\x00\x00,1449215361195.5fa492cebc9f25b9602ecaf1d4601daf.,
call org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl@3efcf4dd after 121324
ms, since caller disconnected
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4144)
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4061)
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4048)
	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:288)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:191)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1305)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1619)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1694)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1658)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1300)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3214)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30946)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2093)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)

It looks like org.apache.hadoop.hbase.ipc.CallerDisconnectedException is a regular IOException
and not a DoNotRetryIOException. As a result, the BaseScannerRegionObserver#doPostScannerOpen()
re-throws a regular IO exception back to the client resulting in retries. These retries however
are never successful and we end up retrying the default number of times (31).

One thought I had was that I may be maxing out the IO on my laptop SSD. But then, reducing
the number of region server handler threads from default to 2 (to limit the I/O) didn't help
either.

> Update statistics fails to complete
> -----------------------------------
>
>                 Key: PHOENIX-2408
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2408
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Samarth Jain
>             Fix For: 4.7.0
>
>
> On a production cluster, when UPDATE STATISTICS is run, it fails to complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message