Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 920F2DB00 for ; Thu, 26 Jul 2012 17:44:54 +0000 (UTC) Received: (qmail 198 invoked by uid 500); 26 Jul 2012 17:44:52 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 149 invoked by uid 500); 26 Jul 2012 17:44:52 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 141 invoked by uid 99); 26 Jul 2012 17:44:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 17:44:52 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qc0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jul 2012 17:44:46 +0000 Received: by qcsd16 with SMTP id d16so1805020qcs.14 for ; Thu, 26 Jul 2012 10:44:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=YhWv9FfdcVM5gsqBx2AQkyKImJXHh9T65/WQ5FG72a8=; b=jeAZgo0ievsNmC7IPvNQa1dE8qbmxjwMS7XAKiutg2STIF/JDpP11gqQeN5zgon7Qq sMO8u8kFDqL4QknAmvpGyoKNBI8P4p25XxCkNO7UHJw6sw3zWTI0DXdMZM/JPEtl+cEQ neeGpRvAFtp5R915XCb3dDAkFkwUJZJ9JBrShJJhRpzYCrLtWGEFLEp8VZBu3l9dqATK xHvV5xAoKMbQAUUH1VEL8IN7sN8Hw34lV8co3+ybGrv64R+KY3+/muiEv0JRkujkM/5L +BDvP55vDF9I2LZgcyTBs0x+fz3sFphkNStawIO6mFB338Rx6QkdsR0B3RdQeALvbfJe 3S+w== MIME-Version: 1.0 Received: by 10.229.135.4 with SMTP id l4mr13544072qct.39.1343324665561; Thu, 26 Jul 2012 10:44:25 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.49.61.71 with HTTP; Thu, 26 Jul 2012 10:44:25 -0700 (PDT) In-Reply-To: References: Date: Thu, 26 Jul 2012 10:44:25 -0700 X-Google-Sender-Auth: 4m6_c8TNhw8MJLhcVKFmlQXUvH0 Message-ID: Subject: Re: silently aborted scans when using hbase.client.scanner.max.result.size From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Damn! Well that's a big bug then but it seems that HBASE-2214 would fix it since the client would pass it's own maxsize? Although, reading the patch, it doesn't seem so since if it wasn't configured on the client and it wasn't passed on the Scan then the region server will pickup In the patch: - this.maxScannerResultSize = conf.getLong( + if (scan.getMaxResultSize() > 0) { + this.maxScannerResultSize = scan.getMaxResultSize(); + } else { + this.maxScannerResultSize = conf.getLong( HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE); + } If in the else clause you set the new value on the scan then the region server would always receive the right amount of data. Then you have to wonder why the region server would even set its own since it's just likely to cause trouble. Or maybe it's the client that shouldn't care. I'll add a comment to that jira too. J-D On Thu, Jul 26, 2012 at 1:05 AM, Ferdy Galema wrote: > Thanks man!! It is really that simple! That is crazy. I've been running > this property serverside-only for such a long time but never really > experienced the effects until using a higher caching value. (Which is > perfectly explainable). Wherever this property is mentioned, is surely must > be documented that it is critical to use it both server and client. (Unless > you enjoy missing rows at random.) > > Thanks again. > Ferdy > > On Wed, Jul 25, 2012 at 9:07 PM, Jean-Daniel Cryans wrote: > >> That looks nasty. >> >> Could it be that your client doesn't know about the max result size? >> Looking at ClientScanner.next() we iterate while this is true: >> >> } while (remainingResultSize > 0 && countdown > 0 && >> nextScanner(countdown, values == null)); >> >> Let's say the region server returns less rows than needed, like 1240, >> but the caching is set to 1241. The remaining size would still be >> higher than zero and so would the countdown (its value would be 1). So >> it's gonna try to get the nextScanner. If you have just one region it >> would stop there. >> >> But that would be the case if you have 1 region and did not set the >> config on the client-side. >> >> J-D >> >> On Wed, Jul 25, 2012 at 5:04 AM, Ferdy Galema >> wrote: >> > I was experiencing aborted scans on certain conditions. In these cases I >> > was simply missing so many rows that only a fraction was inputted, >> without >> > warning. After lots of testing I was able to pinpoint and reproduce the >> > error when scanning over a single region, single column family, single >> > store file. So really just a single (major_compacted) storefile. I scan >> > over this region using a single Scan in a local jobtracker context. (So >> not >> > mapreduce, although this has exactly the same behaviour). Finally, I >> > noticed the number of input rows is dependent on the >> > hbase.client.scanner.caching property. See following example runs that >> > scans over this region with a specific start and stop key: >> > >> > -Dhbase.client.scanner.caching=1 >> > inputrows=1506 >> > >> > -Dhbase.client.scanner.caching=10000 >> > inputrows=1240 >> > >> > -Dhbase.client.scanner.caching=1240 >> > inputrows=1506 >> > >> > -Dhbase.client.scanner.caching=1241 >> > inputrows=1240 >> > >> > This is weird huh? So setting the cache to 1241 in this case aborts the >> > scan silently. Removing the stoprow yields the same amout. Setting the >> > caching to 1 with no stoprow yields all rows. (Several hundreds of >> > thousands). >> > >> > Neither the client nor the regionserver log any warning whatsoever. I had >> > the hbase.client.scanner.max.result.size set to 90100100. After removing >> > this property it all works like a charm!!! All rows are properly >> inputted, >> > regardless of hbase.client.scanner.caching. As an extra verification I >> > checked the regionserver for warnings that I would expect without this >> > property and this seems to be the case: >> > 2012-07-25 11:46:52,889 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> Server >> > handler 8 on 60 >> > 020, responseTooLarge for: next(-1937592840574159040, 10000) from >> > x.x.x.x:39398: Size: 3 >> > 38.1m >> > 2012-07-25 11:47:14,359 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> Server >> > handler 9 on 60 >> > 020, responseTooLarge for: next(-1937592840574159040, 10000) from >> > x.x.x.x:39407: Size: 1 >> > 86.6m >> > >> > So, anyone know what this could be? I am willing to debug this behaviour >> at >> > the regionserver level, but before I do I want to make sure I am not >> > running into something that has already been solved. This is >> > on hbase-0.90.6-cdh3u4, using snappy. >>