Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54F639DD4 for ; Sun, 9 Oct 2011 18:04:12 +0000 (UTC) Received: (qmail 70527 invoked by uid 500); 9 Oct 2011 18:04:10 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 70479 invoked by uid 500); 9 Oct 2011 18:04:10 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 70471 invoked by uid 99); 9 Oct 2011 18:04:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Oct 2011 18:04:10 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bryanck@gmail.com designates 209.85.210.45 as permitted sender) Received: from [209.85.210.45] (HELO mail-pz0-f45.google.com) (209.85.210.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Oct 2011 18:04:01 +0000 Received: by pzk2 with SMTP id 2so15820576pzk.4 for ; Sun, 09 Oct 2011 11:03:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=N/kwW13Ag/OTBQLUNWa+B/vADalcufllZpTSTTmQjVU=; b=aSYRun+rjsWY8DNKn+IFTLbJ6o8M3H4hQG0r8V4/LDnnr8DasHpprkf6tKfhU16sa1 FJJl+iHq8Yhdm0VTT8zDw7sgTiDtpDBO66kLcWrgxa8JQ6rk+qFIcN/kDKVGpNLTvV/O hfVib0imztQ3V1/B5TN/A7Um4UYjA+2LWxSC4= Received: by 10.68.40.234 with SMTP id a10mr30845358pbl.120.1318183420132; Sun, 09 Oct 2011 11:03:40 -0700 (PDT) Received: from [172.16.194.98] (c-67-180-232-12.hsd1.ca.comcast.net. [67.180.232.12]) by mx.google.com with ESMTPS id v7sm15476907pbr.10.2011.10.09.11.03.38 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 09 Oct 2011 11:03:39 -0700 (PDT) Subject: Re: Using Scans in parallel Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: text/plain; charset=iso-8859-1 From: Bryan Keller In-Reply-To: <2B36CF16-70D7-4C20-AB4B-556BBF954F4A@gmail.com> Date: Sun, 9 Oct 2011 11:03:39 -0700 Cc: user@hbase.apache.org Content-Transfer-Encoding: quoted-printable Message-Id: <555220C7-C39E-4CB4-8D6E-AEB33D2EE82D@gmail.com> References: <1317858156.31795.YahooMailNeo@web121711.mail.ne1.yahoo.com> <1317877872.52516.YahooMailNeo@web121716.mail.ne1.yahoo.com> <2B36CF16-70D7-4C20-AB4B-556BBF954F4A@gmail.com> To: Bryan Keller X-Mailer: Apple Mail (2.1244.3) X-Virus-Checked: Checked by ClamAV on apache.org On further thought, it seems this might be a serious issue, as two = unrelated processes within an application may be scanning the same table = at the same time. On Oct 9, 2011, at 10:59 AM, Bryan Keller wrote: > I was not able to get consistent results using multiple scanners in = parallel on a table. I implemented a counter test that used 8 scanners = in parallel on a table with 2m rows with 2k+ columns each, and the = results were not consistent. There were no errors thrown, but the count = was off by as much as 2%. Using a single thread gave the same (correct) = result every run. >=20 > I tried various approaches, such as creating an HTable and opening a = connection per thread, but I was not able to get stable results. I would = do some testing before using parallel scanners as described here. >=20 >=20 > On Oct 5, 2011, at 10:11 PM, lars hofhansl wrote: >=20 >> That's part of it, the other part is to get the region demarcations. >> You can also just get the smallest and largest key of the table and = pick other demarcations for your scans. Then your individual scans will = likely cover multiple regions and regionservers. >>=20 >>=20 >> Your threading model depends on your needs. If you interested in = lowest latency you want to keep your regionservers busy for each query. >> What exactly that means depends on your setup. Maybe you split up the = overall scan so that no more than N scans are active at any = regionserver. >>=20 >> If you're more interested in overall predictability, you might not = want parallelize each scan too much. >>=20 >>=20 >>=20 >> ----- Original Message ----- >> From: Sam Seigal >> To: user@hbase.apache.org; lars hofhansl >> Cc: "hbase-user@hadoop.apache.org" >> Sent: Wednesday, October 5, 2011 6:18 PM >> Subject: Re: Using Scans in parallel >>=20 >> So the whole point of getting the region locations is to ensure that >> there is one thread per region server ? >>=20 >>=20 >> On Wed, Oct 5, 2011 at 4:42 PM, lars hofhansl = wrote: >>> Hi Sam, >>>=20 >>>=20 >>> There were some attempts to build this in. In the end I think the = exact patterns are different based on what one is trying to achieve. >>> Currently what you can do is getting all the region locations = (HTable.getRegionLocations). =46rom the HRegionInfos you can >>> get the regions start and end keys. >>> Now you can issue parallel scan for as many regions as you want (by = create a Scan object with start and row set to the region's >>> start and end key). >>> You probably want to group the regions by regionserver and have one = thread per region server, or something. >>>=20 >>>=20 >>> -- Lars >>> ________________________________ >>> From: Sam Seigal >>> To: hbase-user@hadoop.apache.org >>> Sent: Wednesday, October 5, 2011 4:29 PM >>> Subject: Using Scans in parallel >>>=20 >>> Hi , >>>=20 >>> Is there a known way to be able to do Scan's in parallel (in = different >>> threads even) and then sort/combine the output ? >>>=20 >>> For a row key like: >>>=20 >>> prefix-event_type-event_id >>> prefix-event_type-event_id >>>=20 >>> I want to declare two scan objects (for say event_id_type foo) >>>=20 >>> Scan 1 =3D> 0-foo >>> Scan 2 =3D> 1-foo >>>=20 >>> execute the scans in parallel (maybe even in different threads) and >>> then merge the results ? >>>=20 >>> Thank you, >>>=20 >>> Sam >>>=20 >>=20 >=20