Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 69598 invoked from network); 13 Apr 2011 16:15:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Apr 2011 16:15:54 -0000 Received: (qmail 55355 invoked by uid 500); 13 Apr 2011 16:15:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 55333 invoked by uid 500); 13 Apr 2011 16:15:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 55325 invoked by uid 99); 13 Apr 2011 16:15:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 16:15:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ghelmling@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-ww0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Apr 2011 16:15:46 +0000 Received: by wwf26 with SMTP id 26so715464wwf.20 for ; Wed, 13 Apr 2011 09:15:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=HmtkN2cdaPyqv2EOSI6YlJbDeNIFwPzRGQtwIapZPLU=; b=mTY92lrrA8aIrI3JTR5H8aRQciJejLmqoYqsbmY3Meg78s9D5fh6QvMijpi5SxkmwM J5Qt8rb3iEGw5OQlXDt5q5PakdDfzipIhEYPcbvtyKuMuaOSYBk93oAjhhdNHkYcxuE9 UwRJZHi/xFVX8fdJcADSW0kAQAuWZV/WQQntc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GsDvTXZsRtOqQo9TRXm6sSa7hOIzxUdiUwE6wY5Pmr4DkWjWhw/cASSI6O45Tv5Juz b7L4CAJJ2txHaZ/MrsykOcaa+V+657WVUtdp83ukE9alf2QKgI2+r9a8T6Wk5lAaoLF8 gm7AMUb/uvo5Xkxnqib+HOxhQl9NaMiy4yvD8= MIME-Version: 1.0 Received: by 10.216.254.39 with SMTP id g39mr4580922wes.108.1302711326566; Wed, 13 Apr 2011 09:15:26 -0700 (PDT) Received: by 10.216.85.78 with HTTP; Wed, 13 Apr 2011 09:15:26 -0700 (PDT) In-Reply-To: References: Date: Wed, 13 Apr 2011 09:15:26 -0700 Message-ID: Subject: Re: A possible bug in the scanner. From: Gary Helmling To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001517492470dd6bbb04a0cf1d24 X-Virus-Checked: Checked by ClamAV on apache.org --001517492470dd6bbb04a0cf1d24 Content-Type: text/plain; charset=ISO-8859-1 Hi Vidhya, So it sounds like the timeout thread is timing out the scanner when it takes more than 60 seconds reading through the large column family store file without returning anything to the client? Even without the TTL expiration being applied, I think I've heard of this in other cases where a very restrictive filter was used on a large table scan. If this is the case, it certainly seems like we should handle it better. We could do something as simple as refreshing the scanner timestamp every X rows when iterating server side. I'll check the code and open a JIRA (if we don't have one existing). Thanks for detailing the problem. --gh On Wed, Apr 13, 2011 at 7:44 AM, Vidhyashankar Venkataraman < vidhyash@yahoo-inc.com> wrote: > Hi > We had enabled scanner caching but I don't think it is the same issue > because scanner.next in this case is blocking: the scanner is busy in the > region server but hasn't returned anything yet since a row to be returned > hasn't been found yet (all rows have expired but are still there since they > havent been compacted yet). > > Vidhya > > On 4/13/11 1:44 AM, "Ted Yu" wrote: > > Have you read the following thread ? > "ScannerTimeoutException when a scan enables caching, no exception when it > doesn't"Did you enable caching ? If not, it is different issue. > > On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman < > vidhyash@yahoo-inc.com> wrote: > > > (This could be a known issue. Please let me know if it is). > > > > We had a set of uncompacted store files in a region. One of the column > > families had a store file of 5 Gigs. The other column families were > pretty > > small (a few megabytes at most). > > > > It so turned out that all these files had rows whose TTL had expired. > Now > > when this region was scanned (which should yield a result of a null set), > we > > got Scanner timeouts and UnknownScannerExceptions. > > > > And when we tried scanning the region without the large column family, > the > > scanner returned back safely with no result. > > > > So, I major compacted it and the scan started working correctly. > > > > So it looks like timeouts happen if the scanner does not return any > output > > for a specified time. > > Which isn't exactly the correct thing to do, because it could be the case > > that the scanner was indeed busy but it just so happened that there are > no > > rows yet to return back to the client. > > > > We can try increasing the scanner timeout, but this doesn't resolve the > > underlying problem. Is this a know issue? > > > > Thank you > > Vidhya > > > > --001517492470dd6bbb04a0cf1d24--