Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 30C8CF389 for ; Tue, 30 Apr 2013 18:03:54 +0000 (UTC) Received: (qmail 91627 invoked by uid 500); 30 Apr 2013 18:03:52 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 91539 invoked by uid 500); 30 Apr 2013 18:03:52 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 91531 invoked by uid 99); 30 Apr 2013 18:03:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:03:52 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of rsingh@care.com does not designate 165.212.64.22 as permitted sender) Received: from [165.212.64.22] (HELO gwo2.mbox.net) (165.212.64.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:03:45 +0000 Received: from gwo2.mbox.net (localhost [127.0.0.1]) by gwo2.mbox.net (Postfix) with ESMTP id 3b0VwD17kwzgGXsB; Tue, 30 Apr 2013 18:03:24 +0000 (UTC) X-USANET-Received: from gwo2.mbox.net [127.0.0.1] by gwo2.mbox.net via mtad (C8.MAIN.3.82G) with ESMTP id 250RDdsDR0288Mo2; Tue, 30 Apr 2013 18:03:17 -0000 X-USANET-Routed: 6 gwsout-disclaimer Q:watd X-USANET-Routed: 10 gwsout-externalarchive C:gwsarchive:625 care.com.g.tcvxu@incoming02.seccas.com X-USANET-Routed: 3 gwsout-vs Q:bmvirus X-USANET-GWS2-Tenant: care.com X-USANET-GWS2-Tagid: CRCM Received: from S1P5HUB5.EXCHPROD.USA.NET [165.212.120.254] by gwo2.mbox.net via smtad (C8.MAIN.3.90E) with ESMTPS id XID386RDdsDR1735Xo2; Tue, 30 Apr 2013 18:03:17 -0000 X-USANET-Source: 165.212.120.254 OUT rsingh@care.com S1P5HUB5.EXCHPROD.USA.NET X-USANET-MsgId: XID386RDdsDR1735Xo2 Received: from S1P5DAG1A.EXCHPROD.USA.NET ([169.254.1.77]) by S1P5HUB5.EXCHPROD.USA.NET ([10.120.223.35]) with mapi id 14.03.0123.003; Tue, 30 Apr 2013 06:03:16 -1200 From: Rupinder Singh To: "user@hive.apache.org" CC: "user@hbase.apache.org" Subject: RE: Very poor read performance with composite keys in hbase Thread-Topic: Very poor read performance with composite keys in hbase Thread-Index: Ac5Fyuytf4JItyQES9KBHDUzhZlHdQAANf0AAAAyj+A= Date: Tue, 30 Apr 2013 18:03:15 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [122.179.37.17] Content-Type: multipart/alternative; boundary="_000_FCB1703CBF9D9F4E930F44680631697E3F6ED607S1P5DAG1AEXCHPR_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_FCB1703CBF9D9F4E930F44680631697E3F6ED607S1P5DAG1AEXCHPR_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Here it is: select * from event where key.name=3D'Signup' and key.dateCreated=3D'2013-0= 3-06 16:39:55.353' and key.uid=3D'7af4c330-5988-4255-9250-924ce5864e3bf'; From: kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com] Sent: Tuesday, April 30, 2013 11:25 PM To: user@hive.apache.org Cc: user@hbase.apache.org Subject: Re: Very poor read performance with composite keys in hbase Can you show your query that is taking 700 seconds? On Tue, Apr 30, 2013 at 12:48 PM, Rupinder Singh > wrote: Hi, I have an hbase cluster where I have a table with a composite key. I map th= is table to a Hive external table using which I insert/select data into/fro= m this table: CREATE EXTERNAL TABLE event(key struct, {more columns here}) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '~' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" =3D ":key, other columns ") TBLPROPERTIES ("hbase.table.name" =3D "event"); The table has about 10 million rows. When I do a select * using all 3 compo= nents of the key, essentially selecting just 1 row, the response time is al= most 700 sec, which seems pretty bad. For comparison purpose, I created another table with a simple string key, a= nd the rest of the columns etc same. The key is a string UUID. Table has sa= me number of column families and same number of rows. CREATE EXTERNAL TABLE test_event(key string, blah blah..... TBLPROPERTIES ("hbase.table.name" =3D "test_event"= ); When I select a single row from this table by doing select * where key=3D's= omething', the response time is 35 sec. This seems to indicate that in case of composite keys, there is a full tabl= e scan happening. This seems weird. What am I missing here? Is there something special I need to do to get good= read performance if I am using composite keys ? Insert performance in both cases is comparable and is as per expectation. Any help is appreciated. Here is the env spec: Amazon EMR Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GHz each. M= aster 7.5 GB RAM, 2 CPUs of 2.2 GHz each Hive Cluster - 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 GHz. Master 3.75= GB RAM, 1 CPU of 1.8 GHz Thanks Rupinder This email is intended for the person(s) to whom it is addressed and may co= ntain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use,= distribution, copying, or disclosure by any person other than the addresse= e(s) is strictly prohibited. If you have received this email in error, plea= se notify the sender immediately by return email and delete the message and= any attachments from your system. -- Swarnim --_000_FCB1703CBF9D9F4E930F44680631697E3F6ED607S1P5DAG1AEXCHPR_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Here it is:

 <= /p>

select * from event where key.name=3D’= Signup’ and key.dateCreated=3D’2013-03-06 16:39:55.353’ a= nd key.uid=3D’7af4c330-5988-4255-9250-924ce5864e3bf’;

 <= /p>

 <= /p>

From: kulkarni= .swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
Sent: Tuesday, April 30, 2013 11:25 PM
To: user@hive.apache.org
Cc: user@hbase.apache.org
Subject: Re: Very poor read performance with composite keys in hbase=

 

Can you show your query that is taking 700 seconds?<= o:p>

 

On Tue, Apr 30, 2013 at 12:48 PM, Rupinder Singh <= ;rsingh@care.com&g= t; wrote:

Hi,

 

I have an hbase cluster where I have a table with a composite key.= I map this table to a Hive external table using which I insert/select data= into/from this table:

CREATE EXTERNA= L TABLE event(key struct<name:string,dateCreated:string,uid:string>, = {more columns here})

ROW FORMAT DEL= IMITED

COLLECTION ITE= MS TERMINATED BY '~'

STORED BY 'org= .apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROP= ERTIES ("hbase.columns.mapping" =3D ":key, other columns &qu= ot;)

TBLPROPERTIES = ("hbase.table.na= me" =3D "event");

 

The table has about 10 million rows. When I do a select * using al= l 3 components of the key, essentially selecting just 1 row, the response t= ime is almost 700 sec, which seems pretty bad.

 

For comparison purpose, I created another table with a simple stri= ng key, and the rest of the columns etc same. The key is a string UUID. Tab= le has same number of column families and same number of rows.

CREATE EXTERNA= L TABLE test_event(key string, blah blah…..

TBLPROPERTIES = ("hbase.table.na= me" =3D "test_event");

 

When I select a single row from this table by doing select * where= key=3D’something’, the response time is 35 sec.

 

This seems to indicate that in case of composite keys, there is a = full table scan happening.  This seems weird.

 

What am I missing here? Is there something special I need to do to= get good read performance if I am using composite keys ?

Insert performance in both cases is comparable and is as per expec= tation.

 

Any help is appreciated.

Here is the env spec:

 

Amazon EMR

Hbase Cluster- 3 core nodes with 7.5 GB RAM each, 2 CPUs of 2.2 GH= z each. Master 7.5 GB RAM, 2 CPUs of 2.2 GHz each

Hive Cluster – 3 core nodes 3.75 GB RAM each, 1 CPU of 1.8 G= Hz. Master 3.75 GB RAM, 1 CPU of 1.8 GHz

 

Thanks

Rupinder

 

 

This email is intended for the person(s) to whom it is addr= essed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any u= nauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you h= ave received this email in error, please notify the sender immediately by r= eturn email and delete the message and any attachments from your system.

 



 

--
Swarnim

--_000_FCB1703CBF9D9F4E930F44680631697E3F6ED607S1P5DAG1AEXCHPR_--