Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE65B9273 for ; Thu, 13 Sep 2012 14:35:55 +0000 (UTC) Received: (qmail 98093 invoked by uid 500); 13 Sep 2012 14:35:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98008 invoked by uid 500); 13 Sep 2012 14:35:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 97993 invoked by uid 99); 13 Sep 2012 14:35:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 14:35:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kelvin.msj@gmail.com designates 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 14:35:46 +0000 Received: by oagj6 with SMTP id j6so2374904oag.14 for ; Thu, 13 Sep 2012 07:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=UiaXCF4rgYdiTwf2RnQYGGjqPgmwBpIJomOqN7Avwc4=; b=o/zkv3pU0equsHniI4HMVoRTJkyvVkPlW89ZwXl7Zs4eGCXcIzUz2IMkO3O7Jn12CN Fmt6EBWqyYfkicZayVxRgqVjT52+FVlWBtIuksT5fEqyPoRLpvVZdaki/TXuyDPPDASu JU4HDT3feZrqR/x4Id9MMtJe/bFGZElA4kxzKivSbIyG11FFvZcOR0MbTj7tSO0VEKXe Y2vHClkSkvcFgZrWe6X0pNbfjXO2vhpXKKuCuSEZUDMp8CAIcvqILgfk+xKJ2WDqzzh1 2miRdqaxk3yplaZVpHHVCsDR8yzbolCxDh0QADE9w2x90DOzhEPxfwm9Xg1oHeX4XNCE GWDg== MIME-Version: 1.0 Received: by 10.182.131.37 with SMTP id oj5mr2537513obb.54.1347546925617; Thu, 13 Sep 2012 07:35:25 -0700 (PDT) Received: by 10.76.85.66 with HTTP; Thu, 13 Sep 2012 07:35:25 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Sep 2012 15:35:25 +0100 Message-ID: Subject: Re: Hbase Scan - number of columns make the query performance way different From: Shengjie Min To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f6436f8d1d1ef04c996375b --e89a8f6436f8d1d1ef04c996375b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In my case, I am not feeding hbase result to mapred, it's just pure hbase scan, returning all columns vs two columns makes huge difference to me. On 13 September 2012 15:29, Doug Meil wrote= : > > Hi there, I don't know the specifics of your environment, but ... > > http://hbase.apache.org/book.html#perf.reading > 11.8.2. Scan Attribute Selection > > > =C5=A0 describes paying attention to the number of columns you are return= ing, > particularly when using HBase as a MR source. In short, returning only > the columns you need means you are reducing the data transferred between > the RS and the client and the number of KV's evaluated in the RS, etc. > > > > > On 9/13/12 10:12 AM, "Shengjie Min" wrote: > > >Hi, > > > >I found an interesting difference between hbase scan query. > > > >I have a hbase table which has a lot of columns in a single column famil= y. > >eg. let's say I have a users table, then userid, username, email .... et= c > >etc 15 fields all together are in the single columnFamily. > > > >if you are familiar with RDBMS, > > > >query 1: select * from users > >vs > >query 2: select userid, username from users > > > >in mysql, these two has a difference, the query 2 will be obviously > >faster, > >but two queries won't give you a huge difference from performance > >perspective. > > > >In Hbase, I noticed that: > > > >query 3: scan 'users', // this is basically return me all 15 fields > >vs > >query 4: scan 'users', {COLUMNS=3D>['cf:userid','cf:username']} // th= is > >is > >return me only two fields: userid , username > > > >query 3 here takes way longer than query 4, Given a big data set. In my > >test, I have around 1,000,000 user records. You are talking about query = 3 > >- > >100 secs VS query 4 - a few secs. > > > > > >Can anybody explain to me, why the width of the resultset in HBASE can > >impact the performance that much? > > > > > >Shengjie Min > > > --=20 All the best, Shengjie Min --e89a8f6436f8d1d1ef04c996375b--