Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5ACFFC230 for ; Thu, 4 Dec 2014 12:34:22 +0000 (UTC) Received: (qmail 10986 invoked by uid 500); 4 Dec 2014 12:34:20 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 10908 invoked by uid 500); 4 Dec 2014 12:34:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 73039 invoked by uid 99); 3 Dec 2014 20:34:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2014 20:34:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.82.254.106] (HELO mail1.bemta7.messagelabs.com) (216.82.254.106) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2014 20:34:12 +0000 Received: from [216.82.253.163] by server-10.bemta-7.messagelabs.com id 44/57-02700-7337F745; Wed, 03 Dec 2014 20:31:51 +0000 X-Env-Sender: mas9161@nyp.org X-Msg-Ref: server-15.tower-166.messagelabs.com!1417638708!23823700!7 X-Originating-IP: [143.104.101.20] X-StarScan-Received: X-StarScan-Version: 6.12.4; banners=nyp.org,-,- X-VirusChecked: Checked Received: (qmail 28773 invoked from network); 3 Dec 2014 20:31:51 -0000 Received: from unknown (HELO NYSGEXED01.nyp.org) (143.104.101.20) by server-15.tower-166.messagelabs.com with AES128-SHA encrypted SMTP; 3 Dec 2014 20:31:51 -0000 Received: from smtp.nyp.org (10.172.133.188) by NYSGEXED01.nyp.org (10.172.133.180) with Microsoft SMTP Server id 14.3.174.1; Wed, 3 Dec 2014 15:31:39 -0500 Received: from NYSGMBXA03.a.wcmc-ad.net ([fe80::d16f:eb7:1ba3:82f4]) by NYSGCAS04.a.wcmc-ad.net ([::1]) with mapi id 14.03.0158.001; Wed, 3 Dec 2014 15:31:22 -0500 From: Marc Sturm To: "user@hbase.apache.org" Subject: question about composite rowKey and performance difference between getScanner() and get(Get[]) Thread-Topic: question about composite rowKey and performance difference between getScanner() and get(Get[]) Thread-Index: AdAPN0hcqBpnM4+mStmLFyH+ELyh6w== Date: Wed, 3 Dec 2014 20:31:20 +0000 Message-ID: <5E77A44B25C5AB439A6894E4BD7FDE5E300ABCA3@NYSGMBXA03.a.wcmc-ad.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.172.132.4] x-received-by: Exchange Content-Type: multipart/alternative; boundary="_000_5E77A44B25C5AB439A6894E4BD7FDE5E300ABCA3NYSGMBXA03awcmc_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org --_000_5E77A44B25C5AB439A6894E4BD7FDE5E300ABCA3NYSGMBXA03awcmc_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I have a many to many relationship that I am trying to model in hbase, and= I want to be sure I am not missing anything so please let me know or poin= t to the right documentation. Let's say I have an A to B many to many relationship, the query parameter = takes A unique id and returns all the B uniqueids related to A with their = properties and values. The first solution I found is having two tables: one with the rowKey equal= to A's unique id, the table column identifiers are equal to B's unique id= s related to A, the second table has its rowKeys equal to B unique ids and= its columns contain the property values. So the query is two steps, it fi= rst does a get on A to collect all the B uniqueIds and then does a second = get on the B passing as a parameter an array of B rowkeys. When I run the = second query, I can get a latency much longer on the first query and then = good low latency on subsequent queries with same parameter. I believe that= 's a caching issue... The second solution is having one table with a composite rowkey equal to A= uniqueid + B uniqueid, I will then have duplicate B uniqueid rows. But wh= en I do a scan on the just the first part of the rowKey (A uniqueid) the r= esponse time and latency is more consistent and better (smaller). So, my questions are threefold: 1) which way is the best, 2) what is the p= erformance difference between a scan and a get with multiple rowkeys (I th= ink scan is faster because the data is not or less "distributed") and 3) h= ow can we make the get with multiple rowkeys more consistent? Thank you for your help, Marc This electronic message is intended to be for the use only of the named re= cipient, and may contain information that is confidential or privileged. = If you are not the intended recipient, you are hereby notified that any di= sclosure, copying, distribution or use of the contents of this message is = strictly prohibited. If you have received this message in error or are no= t the named recipient, please notify us immediately by contacting the send= er at the electronic mail=20address noted above, and delete and destroy al= l copies of this message. Thank you. --_000_5E77A44B25C5AB439A6894E4BD7FDE5E300ABCA3NYSGMBXA03awcmc_--