Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 28AFA11590 for ; Sun, 8 Jun 2014 06:52:02 +0000 (UTC) Received: (qmail 9496 invoked by uid 500); 8 Jun 2014 06:52:02 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 9441 invoked by uid 500); 8 Jun 2014 06:52:02 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 9431 invoked by uid 99); 8 Jun 2014 06:52:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Jun 2014 06:52:02 +0000 Date: Sun, 8 Jun 2014 06:52:02 +0000 (UTC) From: "Anoop Sam John (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11295) Long running scan produces OutOfOrderScannerNextException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021126#comment-14021126 ] Anoop Sam John commented on HBASE-11295: ---------------------------------------- U can see in ClientScanner that we will treat outof order exception. We are not throwing back to app layer immediately. We will recreate a scanner with start row as the last previous fetched row. But again this also throws same exception with out fetching any thing we will stop and throw back to app. Else we will end in this way of infinite retries. ( pls note that the retries with same scannerid is finite no# ) In ur case this might be happening. U sleep in filter. If u are having such a long running scenario ( one next call will take more time may be becuase of complex filtering or so ) try reducing scanner caching ( default is 100) and/or increasing client time out. I dont think there is any problem in code. The log says the retry from client on out of order exception. > Long running scan produces OutOfOrderScannerNextException > --------------------------------------------------------- > > Key: HBASE-11295 > URL: https://issues.apache.org/jira/browse/HBASE-11295 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.96.0 > Reporter: Jeff Cunningham > Attachments: OutOfOrderScannerNextException.tar.gz > > > Attached Files: > HRegionServer.java - instramented from 0.96.1.1-cdh5.0.0 > HBaseLeaseTimeoutIT.java - reproducing JUnit 4 test > WaitFilter.java - Scan filter (extends FilterBase) that overrides filterRowKey() to sleep during invocation > SpliceFilter.proto - Protobuf defintiion for WaitFilter.java > OutOfOrderScann_InstramentedServer.log - instramented server log > Steps.txt - this note > Set up: > In HBaseLeaseTimeoutIT, create a scan, set the given filter (which sleeps in overridden filterRowKey() method) and set it on the scan, and scan the table. > This is done in test client_0x0_server_150000x10(). > Here's what I'm seeing (see also attached log): > A new request comes into server (ID 1940798815214593802 - RpcServer.handler=96) and a RegionScanner is created for it, cached by ID, immediately looked up again and cached RegionScannerHolder's nextCallSeq incremeted (now at 1). > The RegionScan thread goes to sleep in WaitFilter#filterRowKey(). > A short (variable) period later, another request comes into the server (ID 8946109289649235722 - RpcServer.handler=98) and the same series of events happen to this request. > At this point both RegionScanner threads are sleeping in WaitFilter.filterRowKey(). After another period, the client retries another scan request which thinks its next_call_seq is 0. However, HRegionServer's cached RegionScannerHolder thinks the matching RegionScanner's nextCallSeq should be 1. -- This message was sent by Atlassian JIRA (v6.2#6252)