hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16604) Scanner retries on IOException can cause the scans to miss data
Date Tue, 27 Sep 2016 04:25:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525026#comment-15525026
] 

ramkrishna.s.vasudevan commented on HBASE-16604:
------------------------------------------------

bq.Thus, ClientScanner will re-open a new RegionScanner by sending a new scan request and
get a new scanner name.
Let me check this. I was wrong may be.
bq.The heap will be reset correctly, because the region scanner is closed for good. A completely
new RegionScanner will be constructed from scratch.
Ok.
bq.Is it the case that if the scanner is already closed, shipped() will not free up the blocks?
The problem here is that the finally block will reset the rpcCallBack with the RpcShippedCallBack
and since the lease is already removed we don't add back the lease and so the LeaseExpiry
logic does not work which actually does the return of the blocks. Anyway I think I have a
better logic now to fix this after seeing your comments. Will be back here.
bq.Yes, I have checked that in other contexts where we close the scanner in case of exception,
we still call the coprocessor methods.
Ok. If you have verified it then it is fine. 
Thanks a lot [~enis].

> Scanner retries on IOException can cause the scans to miss data 
> ----------------------------------------------------------------
>
>                 Key: HBASE-16604
>                 URL: https://issues.apache.org/jira/browse/HBASE-16604
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Scanners
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>
>         Attachments: HBASE-16604-branch-1.3-addendum.patch, hbase-16604_v1.patch, hbase-16604_v2.patch,
hbase-16604_v3.branch-1.patch, hbase-16604_v3.patch
>
>
> Debugging an ITBLL failure, where the Verify did not "see" all the data in the cluster,
I've noticed that if we end up getting a generic IOException from the HFileReader level, we
may end up missing the rest of the data in the region. I was able to manually test this, and
this stack trace helps to understand what is going on: 
> {code}
> 2016-09-09 16:27:15,633 INFO  [hconnection-0x71ad3d8a-shared--pool21-t9] client.ScannerCallable(376):
Open scanner=1 for scan={"loadColumnFamiliesOnDemand":null,"startRow":"","stopRow":"","batch":-1,"cacheBlocks":true,"totalColumns":1,"maxResultSize":2097152,"families":{"testFamily":["testFamily"]},"caching":100,"maxVersions":1,"timeRange":[0,9223372036854775807]}
on region region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
hostname=hw10676,51833,1473463626529, seqNum=2
> 2016-09-09 16:27:15,634 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] regionserver.RSRpcServices(2196):
scan request:scanner_id: 1 number_of_rows: 100 close_scanner: false next_call_seq: 0 client_handles_partials:
true client_handles_heartbeats: true renew: false
> 2016-09-09 16:27:15,635 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] regionserver.RSRpcServices(2510):
Rolling back next call seqId
> 2016-09-09 16:27:15,635 INFO  [B.fifo.QRpcServer.handler=5,queue=0,port=51833] regionserver.RSRpcServices(2565):
Throwing new ServiceExceptionjava.io.IOException: Could not reseek StoreFileScanner[HFileScanner
for reader reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, currentSize=1567264, freeSize=1525578848,
maxSize=1527146112, heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368,
multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, lastKey=zzz/testFamily:testFamily/1473463634271/Put,
avgKeyLen=35, avgValueLen=3, entries=17576, length=866998, cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0]
to key /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
> 2016-09-09 16:27:15,635 DEBUG [B.fifo.QRpcServer.handler=5,queue=0,port=51833] ipc.CallRunner(110):
B.fifo.QRpcServer.handler=5,queue=0,port=51833: callId: 26 service: ClientService methodName:
Scan size: 26 connection: 192.168.42.75:51903
> java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, currentSize=1567264, freeSize=1525578848,
maxSize=1527146112, heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368,
multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, lastKey=zzz/testFamily:testFamily/1473463634271/Put,
avgKeyLen=35, avgValueLen=3, entries=17576, length=866998, cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0]
to key /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
> 	at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
> 	... 17 more
> 2016-09-09 16:27:15,653 INFO  [hconnection-0x71ad3d8a-shared--pool21-t10] client.ScannerCallable(261):
Got exception making request scanner_id: 1 number_of_rows: 100 close_scanner: false next_call_seq:
0 client_handles_partials: true client_handles_heartbeats: true renew: false to region=testScanThrowsException,,1473463632707.b2adfb618e5d0fe225c1dc40c0eabfee.,
hostname=hw10676,51833,1473463626529, seqNum=2
> java.io.IOException: java.io.IOException: Could not reseek StoreFileScanner[HFileScanner
for reader reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, currentSize=1567264, freeSize=1525578848,
maxSize=1527146112, heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368,
multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, lastKey=zzz/testFamily:testFamily/1473463634271/Put,
avgKeyLen=35, avgValueLen=3, entries=17576, length=866998, cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0]
to key /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
> 	at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
> 	... 17 more
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
> 	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:256)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> 	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
> 	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:1)
> 	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
> 	at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException):
java.io.IOException: Could not reseek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:51795/user/enis/test-data/d6fb1c70-93c1-4099-acb7-5723fc05a737/data/default/testScanThrowsException/b2adfb618e5d0fe225c1dc40c0eabfee/testFamily/5a213cc23b714e5e8e1a140ebbe72f2c,
compression=none, cacheConf=blockCache=LruBlockCache{blockCount=0, currentSize=1567264, freeSize=1525578848,
maxSize=1527146112, heapSize=1567264, minSize=1450788736, minFactor=0.95, multiSize=725394368,
multiFactor=0.5, singleSize=362697184, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false,
prefetchOnOpen=false, firstKey=aaa/testFamily:testFamily/1473463633859/Put, lastKey=zzz/testFamily:testFamily/1473463634271/Put,
avgKeyLen=35, avgValueLen=3, entries=17576, length=866998, cur=/testFamily:/OLDEST_TIMESTAMP/Minimum/vlen=0/seqid=0]
to key /testFamily:testFamily/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:224)
> 	at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:268)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:815)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:803)
> 	at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:625)
> 	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5887)
> 	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5674)
> 	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2434)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32295)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2127)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$2.run(RpcExecutor.java:108)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: foo bar baz
> 	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:207)
> 	... 17 more
> 	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1225)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
> 	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214)
> 	... 9 more
> 2016-09-09 16:27:15,963 INFO  [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824]
regionserver.RSRpcServices(2196): scan request:region { type: REGION_NAME value: "hbase:meta,,1"
} scan { start_row: "testScanThrowsException,,99999999999999" max_versions: 1 cache_blocks:
true small: true reversed: true caching: 1 } number_of_rows: 1 close_scanner: true client_handles_partials:
true client_handles_heartbeats: true
> 2016-09-09 16:27:15,965 INFO  [B.priority.fifo.QRpcServer.handler=8,queue=0,port=51824]
regionserver.RSRpcServices(2549): class org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:cells_per_result:
4 scanner_id: 15 more_results: false stale: false partial_flag_per_result: false more_results_in_region:
true heartbeat_message: false
> 2016-09-09 16:27:15,967 INFO  [B.fifo.QRpcServer.handler=0,queue=0,port=51833] regionserver.RSRpcServices(2196):
scan request:scanner_id: 1 number_of_rows: 100 close_scanner: false next_call_seq: 0 client_handles_partials:
true client_handles_heartbeats: true renew: false
> 2016-09-09 16:27:15,967 INFO  [B.fifo.QRpcServer.handler=0,queue=0,port=51833] regionserver.RSRpcServices(2549):
class org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id: 1 more_results:
true ttl: 60000 stale: false more_results_in_region: false
> 2016-09-09 16:27:15,968 INFO  [B.fifo.QRpcServer.handler=1,queue=0,port=51833] regionserver.RSRpcServices(2196):
scan request:scanner_id: 1 number_of_rows: 0 close_scanner: true client_handles_partials:
true client_handles_heartbeats: true
> 2016-09-09 16:27:15,968 INFO  [B.fifo.QRpcServer.handler=1,queue=0,port=51833] regionserver.RSRpcServices(2549):
class org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanResponse:scanner_id: 1 more_results:
false
> 2016-09-09 16:27:15,971 INFO  [main] hbase.HBaseTestingUtility(1111): Shutting down minicluster
> {code}
> The ScannerCallable is retryable if it gets a IOException. Before the HBASE-13662 fix,
the retry was still failing due to OutOfOrderScannerException, which then causes the Scanner
stack to be reset, thus masking the issue. After HBASE-13662, we can retry the same scan,
but in case of we are doing a {{reseek()}} which fails in middle with an IOException, the
KVHeaps and the Scanner state on the server side is left dirty. 
> It seems to be much safer to reset the scanner state in case of IOExceptions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message