Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8C614DF34 for ; Thu, 9 Aug 2012 11:00:57 +0000 (UTC) Received: (qmail 21758 invoked by uid 500); 9 Aug 2012 11:00:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 21712 invoked by uid 500); 9 Aug 2012 11:00:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 21696 invoked by uid 99); 9 Aug 2012 11:00:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 11:00:54 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sambit19@gmail.com designates 209.85.213.41 as permitted sender) Received: from [209.85.213.41] (HELO mail-yw0-f41.google.com) (209.85.213.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 11:00:48 +0000 Received: by yhr47 with SMTP id 47so355858yhr.14 for ; Thu, 09 Aug 2012 04:00:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xxEDwqHyti+hTjapXjD40D+FfpwcLsVOwGk/YPdSqzk=; b=OypyL2ebJXCKioKMDK0YglKtSAtxlgoVeOIeEBDiIlT+gRvvAWhRliJGkZlunPrGCW w5h6k0ISJGBjLI6XPNWLNgDnfTdVC5aolqHsamTnu8s6DfU9jbgBGzNZDrpAYdajhQgc 3GuD89l5mc2MAXWCOw+WspZDd50bpRIhcKwMSA8F3RiMVlq/Ygv8Pbawh91MpVec4U2H fjOLb/tr9Zg7Y9fTXBN+ZjIDO6AAC+ax1YIrdHdvkBU8JnYqiVy+/JcdVi1Dw1CzQsqX jITCIIKQw17snF7nI0wzohc89ERH5O+0zq08ittR0N6fxHzqwZw7O4kTWiBmUKLs7xb6 WFYA== MIME-Version: 1.0 Received: by 10.60.2.3 with SMTP id 3mr4435366oeq.0.1344510027543; Thu, 09 Aug 2012 04:00:27 -0700 (PDT) Received: by 10.76.142.34 with HTTP; Thu, 9 Aug 2012 04:00:27 -0700 (PDT) In-Reply-To: References: Date: Thu, 9 Aug 2012 16:30:27 +0530 Message-ID: Subject: Re: multi-threaded HTablePool, incrementColumnValue, compaction and large data set From: Sambit Tripathy To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8fb205f296b3db04c6d32260 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb205f296b3db04c6d32260 Content-Type: text/plain; charset=ISO-8859-1 So did you get any success with the problem? Well, I think you can try using it with Asynchbase, a hbase client used in OpenTSDB. On Mon, Jan 16, 2012 at 6:46 AM, Neil Yalowitz wrote: > I'm seeing something unusual here and I wanted to see if it has occurred > for any other HBase 0.90 users. I've read several emails here that > recommend NOT using multi-threading in an MR job, so that's certainly under > consideration. If anyone could add to their experiences with > multi-threading in an MR job it would be very helpful. We are testing both > implementations (with threading and without), but the threaded solution is > causing the problem. > > We are processing log files with PUTs in the Map and a followup > incrementColumnValue() to a separate "counts" table in the Reducer. The > reduce phase uses multi-threading. The Reducer initializes an HTablePool > in the setup(), starts threads in the reduce() (to a > Java BlockingQueue/CompletionService) which do the incrementColumnValue() > and depending on the value returned create a PUT in the "counter" table, > and in the cleanup() performs a completionService.take() which is ignored > and flushes the PUTs queued by the threads. > > There are no issues for approximately the first 100GB of data inserted. > After approximately 100GB however, every subsequent job has a freeze > during the Reduce phase. What I see happening is at some point the Reduce > (where the incrementColumnValue() takes place) tasks are "hung" and > eventually killed with reason: task client has not responded for 600 > seconds. The counters in the reduce job seem to grow briefly but then all > the tasks' counter stop increasing and the task is eventually killed. > > Oddly, the problem does not occur if compaction is completely disabled (not > just major, but also setting hbase.hstore.compactionThreshold = 9999999 > and hbase.hstore.blockingStoreFiles = 9999999). > > Could there be a bug with HTablePool for large datasets and compaction? > Again, this works as expected for approximately the first 100 jobs (1GB > each) but consistently fails after that. Also to repeat, the problem does > not occur with ALL compaction disabled. > > Difficult problem to describe, but I'm hoping someone may have some > feedback and/or similar experiences. I can provide code examples if anyone > is curious. > > > > Neil Yalowitz > --e89a8fb205f296b3db04c6d32260--