Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3DEC476CF for ; Fri, 30 Sep 2011 16:50:05 +0000 (UTC) Received: (qmail 55785 invoked by uid 500); 30 Sep 2011 16:50:04 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 55749 invoked by uid 500); 30 Sep 2011 16:50:04 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 55740 invoked by uid 99); 30 Sep 2011 16:50:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Sep 2011 16:50:04 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FRT_ADOBE2,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [98.139.53.212] (HELO nm19-vm0.bullet.mail.ac4.yahoo.com) (98.139.53.212) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 30 Sep 2011 16:49:55 +0000 Received: from [98.139.52.197] by nm19.bullet.mail.ac4.yahoo.com with NNFMP; 30 Sep 2011 16:49:34 -0000 Received: from [98.139.52.131] by tm10.bullet.mail.ac4.yahoo.com with NNFMP; 30 Sep 2011 16:49:34 -0000 Received: from [127.0.0.1] by omp1014.mail.ac4.yahoo.com with NNFMP; 30 Sep 2011 16:49:34 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 239753.45912.bm@omp1014.mail.ac4.yahoo.com Received: (qmail 971 invoked by uid 60001); 30 Sep 2011 16:49:34 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1317401374; bh=3A+2MayQu8lj7qtJc8zWx5ex+X2jpmNCndVI1SuCnMk=; h=X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=QBpHLuuddZ98rMMXYhkRxi9BoAFa/CDgBDOkL5wN9yuJDQtrus7OP6bMHv/JabdGDyKTLQ7GbKudsQV/GROy/eKv+TQegJ/MCrvIeCKJ4PhR+JVkJrG1+7q765GehRz2c0nsv5dnStpPXKylacKfxNqCTDu+U2tcAmya1xC/f7g= X-YMail-OSG: 7eEzPZ4VM1lfnQPIQG6A1Q18Q8cjKgbhVsn70OMupMNdfDo 183Wo6JcIBzYn8M7oqJ7M6.7mJo9HiGlMJx4h47X5c7tZJMd4VD8NrknGFrA rvtgXAPmK5TdjVtVvsw0yB6dzMw6Si9eQrM.IAoGmOyTADmdblVUNwdRABN7 bZm2SI86xNJQgLc5LnVzV5bD7vqLw2_CcZpjXJgj3tjKhIAiPWsWZZe2XJ6y nUC3AZlY_jz0UHN9WfAGy1dcxORbJLbb4HNiQlb0YnEuBf.GjmBxWLCyZmOl kD7LCozgdH2LEIAxAj_xZmoCdBSKf1K1lCxHYwImgaNU23b84haGxPRaQWP_ HeL1RGDYXFH.hVcev1TrWBqODKNz5pOQZs3jsPnQhjnpW_DsYkrulh2.n7YP zLYg6Aik7lqnM1qr.SorIF38M3pOXXF0lIAVJEPijGE1Sg0vWp7lHZEr04ef SgrXitem4.kNYCZCNdE3Itk5a9UVfzl8cIVe8Ct7.RN6.lw9ss9q7ZWSzjSc viyS6aT5jR89sSgZgDMs- Received: from [71.129.154.0] by web65510.mail.ac4.yahoo.com via HTTP; Fri, 30 Sep 2011 09:49:34 PDT X-RocketYMMF: apurtell X-Mailer: YahooMailWebService/0.8.114.317681 References: Message-ID: <1317401374.83957.YahooMailNeo@web65510.mail.ac4.yahoo.com> Date: Fri, 30 Sep 2011 09:49:34 -0700 (PDT) From: Andrew Purtell Reply-To: Andrew Purtell Subject: Re: Hbase-Hive integration performance issues To: "dev@hbase.apache.org" , HBase User In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-146594385-1113819418-1317401374=:83957" X-Virus-Checked: Checked by ClamAV on apache.org ---146594385-1113819418-1317401374=:83957 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I believe this is the latest status:=0A=0A=A0 =A0 https://issues.apache.org= /jira/browse/HIVE-1643?page=3Dcom.atlassian.jira.plugin.system.issuetabpane= ls:all-tabpanel=0A=0ASuggest following up to dev@hive.apache.org and/or use= r@hive.apache.org.=0A=0ABest regards,=0A=0A=0A=A0 =A0- Andy=0A=0AProblems w= orthy of attack prove their worth by hitting back. - Piet Hein (via Tom Whi= te)=0A=0A=0A>________________________________=0A>From: Matthew Tovbin =0A>To: HBase User =0A>Cc: Hbase Dev = =0A>Sent: Friday, September 30, 2011 5:49 AM=0A>Subje= ct: Re: Hbase-Hive integration performance issues=0A>=0A>Hello guys,=0A>=0A= >Any updates on the issue? Anyone?! ;))=0A>=0A>Best regards,=0A>=A0 Matthe= w Tovbin =3D)=0A>=0A>=0A>=0A>On Tue, Sep 20, 2011 at 09:41, Matthew Tovbin = wrote:=0A>=0A>>=A0 Thanks Jean and Sandy.=0A>>=0A>>=A0= =A0 I have hive 0.7.1, and according to this patch=0A>> https://issues.apa= che.org/jira/browse/HIVE-1226 at least exact match=0A>> queries like=A0 "..= .where id =3D '12345'-123' " or partial pushdown "...where id=0A>> like "12= 345%" should work, but I didn't notice it.=0A>>=0A>> Matthew.=0A>>=0A>>=0A>= >=0A>> On Mon, Sep 19, 2011 at 20:37, Sandy Pratt wrote= :=0A>>=0A>>> I suffered the same let down a little while ago.=A0 I believe = this is the=0A>>> relevant JIRA:=0A>>>=0A>>> https://issues.apache.org/jira= /browse/HIVE-1643=0A>>>=0A>>> I'd also like to see Hive be able to limit sc= ans to particular HBase=0A>>> version ranges, but I don't know if that's ev= en planned.=0A>>>=0A>>> Sandy=0A>>>=0A>>> > -----Original Message-----=0A>>= > > From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-= =0A>>> > Daniel Cryans=0A>>> > Sent: Monday, September 19, 2011 09:58=0A>>>= > To: user@hbase.apache.org=0A>>> > Subject: Re: Hbase-Hive integration pe= rformance issues=0A>>> >=0A>>> > (replying to user@, dev@ in BCC)=0A>>> >= =0A>>> > AFAIK the HBase handler doesn't have the wits to understand that y= ou are=0A>>> > doing a prefix scan and thus limit the scan to only the requ= ired rows.=0A>>> There's=0A>>> > a bunch of optimizations like that that ne= ed to be done.=0A>>> >=0A>>> > I'm pretty sure Pig does the same thing, but= don't take my word on it.=0A>>> >=0A>>> > J-D=0A>>> >=0A>>> > On Sun, Sep = 18, 2011 at 4:12 AM, Matthew Tovbin =0A>>> > wrote:=0A>= >> > > Hi guys,=0A>>> > >=0A>>> > > I've got a table in Hbase let's say "tb= l" and I would like to query it=0A>>> > > using Hive. Therefore I mapped a = table to hive as follows:=0A>>> > >=0A>>> > > CREATE EXTERNAL TABLE tbl(id = string, data map) STORED=0A>>> > > BY 'org.apache.hadoop.hiv= e.hbase.HBaseStorageHandler'=0A>>> > > WITH SERDEPROPERTIES ("hbase.columns= .mapping" =3D ":key,data:")=0A>>> > > TBLPROPERTIES("hbase.table.name" =3D = "tbl");=0A>>> > >=0A>>> > > Queries like: "select * from tbl", "select id f= rom tbl", "select id,=0A>>> > > data from tbl" are really fast.=0A>>> > > B= ut queries like "select id from tbl where substr(id, 0, 5) =3D "12345""=0A>= >> > > or "select id from tbl where data["777"] IS NOT NULL" are incredibly= =0A>>> slow.=0A>>> > >=0A>>> > > In the contrary when running from Hbase sh= ell: "scan 'tbl', {=0A>>> > > COLUMNS=3D>'data', STARTROW=3D'12345', ENDROW= =3D'12346'}" or "scan 'tbl', {=0A>>> > > COLUMNS=3D>'data', "FILTER" =3D>= =0A>>> > > FilterList.new([qualifierFilter('777')])}"=0A>>> > > it is light= ning fast!=0A>>> > >=0A>>> > > When I looked into the mapred job generated = by hive on jobtracker I=0A>>> > > discovered that "map.input.records" count= s ALL the items in Hbase=0A>>> > > table, meaning the job makes a full tabl= e scan before it even starts=0A>>> any=0A>>> > mappers!!=0A>>> > > Moreover= , I suspect it copies all the data from Hbase table to hdfs to=0A>>> > > ma= pper tmp input folder before executuion.=0A>>> > >=0A>>> > > So, my questio= ns are - Why hbase storage handler for hive does not=0A>>> > > translate hi= ve queries into appropriate hbase functions? Why it scans=0A>>> > > all the= records and then slices them using "where" clause? How can it=0A>>> > > be= improved? Is Pig's integration better in this case?=0A>>> > >=0A>>> > >=0A= >>> > > Some additional information about the tables:=0A>>> > > Table descr= iption in Hbase:=0A>>> > > jruby-1.6.2 :011 >=A0 describe 'tbl'=0A>>> > > = DESCRIPTION=0A>>> > >=A0 =A0 =A0 =A0 =A0 =A0 =A0 ENABLED=0A>>> > >=A0 {NAME= =3D> 'users', FAMILIES =3D> [{NAME =3D> 'data', BLOOMFILTER =3D>=0A>>> > >= 'ROWCOL', REPLICATIO true=0A>>> > >=A0 N_SCOPE =3D> '0', COMPRESSION =3D> = 'LZO', VERSIONS =3D> '3', TTL =3D>=0A>>> > > '2147483647', BLOCKSIZE =3D>= =0A>>> > >=A0 '65536', IN_MEMORY =3D> 'false', BLOCKCACHE =3D> 'true'}]}=0A= >>> > >=0A>>> > > Table desciption in Hive:=0A>>> > > hive> describe tbl;= =0A>>> > > OK=0A>>> > > id string from deserializer=0A>>> > > data map from deserializer Time taken: 0.08 seconds=0A>>> > >=0A>>> > > B= est regards,=0A>>> > >=A0 Matthew Tovbin =3D)=0A>>> > >=0A>>>=0A>>=0A>>=0A= >=0A>=0A> ---146594385-1113819418-1317401374=:83957--