Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44976FBE2 for ; Sat, 30 Mar 2013 16:38:15 +0000 (UTC) Received: (qmail 36267 invoked by uid 500); 30 Mar 2013 16:38:13 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 35769 invoked by uid 500); 30 Mar 2013 16:38:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 35753 invoked by uid 99); 30 Mar 2013 16:38:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Mar 2013 16:38:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.182 as permitted sender) Received: from [209.85.217.182] (HELO mail-lb0-f182.google.com) (209.85.217.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Mar 2013 16:38:06 +0000 Received: by mail-lb0-f182.google.com with SMTP id z13so982425lbh.27 for ; Sat, 30 Mar 2013 09:37:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=W88t9hP9FP5p82/teJ01xiY70IAdYAVOB1SJXeu9hJU=; b=PbUciod/NDnljgnEOPJplF9UHdKotd+LGJoiCzUdkBwdKcbtAnZJXUB4DH6t1fYE7Y 7cs97hEstfMUtr8uGVyW517GHCRagx4/4Au0+6wS3Rujq/rcePEv8xMtitOAA2fAs+yt OMnWWLVDNitCqlmw5Sec94s70k6Z7VpjgBPEjBKZu01ZBXjb/rpXwPba7mah5QM5mySR 7rcK7wrz/lm4AkzGiG24+gCpn2H07IFO4BwDsfXIAdRkl9/dFGMsTrAu3Q4F0KVIUVyk um7WY0BGTcKU4aNkRcMkTIrG6jHOX7Uc5a2GS7aLdbNWC5Uc1Wzl87+YR3t1I3x4hO8I bZyA== MIME-Version: 1.0 X-Received: by 10.152.145.134 with SMTP id su6mr2957264lab.35.1364661465875; Sat, 30 Mar 2013 09:37:45 -0700 (PDT) Received: by 10.112.84.133 with HTTP; Sat, 30 Mar 2013 09:37:45 -0700 (PDT) In-Reply-To: References: <9ABAF877128BEB4E8E8C7F8ADD9F30977B2A1F@PROD-EXCH-M3.corp.microstrategy.com> Date: Sat, 30 Mar 2013 09:37:45 -0700 Message-ID: Subject: Re: Understanding scan behaviour From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=e89a8f234659e98a1504d9270126 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f234659e98a1504d9270126 Content-Type: text/plain; charset=ISO-8859-1 See javadoc of Scan: * @param stopRow row to stop scanner before (exclusive) */ public Scan(byte [] startRow, byte [] stopRow) { On Sat, Mar 30, 2013 at 8:25 AM, Mohit Anchlia wrote: > Thanks, that's a good point about last byte being max :) > > When I query 1234555..1234556 do I also get row for 1234556 if one exist? > > On Sat, Mar 30, 2013 at 6:55 AM, Asaf Mesika > wrote: > > > Yes. > > Watch out for last byte being max > > > > > > On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia > >wrote: > > > > > Thanks everyone, it's really helpful. I'll change my prefix filter to > end > > > row. Is it necessary to increment the last byte? So if I have hash of > > > 1234555 my end key should be 1234556? > > > > > > > > > On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan < > > > ramkrishna.s.vasudevan@gmail.com> wrote: > > > > > > > Mohith, > > > > > > > > It is always better to go with start row and end row if you are > knowing > > > > what are they. > > > > Just add one byte more to the actual end row (inclusive row) and form > > the > > > > end key. This will narrow down the search. > > > > > > > > Remeber the byte comparison is the way that HBase scans. > > > > Regards > > > > Ram > > > > > > > > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min > > > wrote: > > > > > > > > > Hi, Mohit, > > > > > > > > > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter. > > > > > > > > > > "+" ascii code is 43 > > > > > "," ascii code is 44 > > > > > > > > > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '++++', > > > > ENDROW=>'+++,'} > > > > > > > > > > Min > > > > > > > > > > -----Original Message----- > > > > > From: Mohit Anchlia [mailto:mohitanchlia@gmail.com] > > > > > Sent: Friday, March 29, 2013 1:18 AM > > > > > To: user@hbase.apache.org > > > > > Subject: Re: Understanding scan behaviour > > > > > > > > > > Could the prefix filter lead to full tablescan? In other words is > > > > > PrefixFilter applied after fetching the rows? > > > > > > > > > > Another question I have is say I have row key abc and abd and I > > search > > > > for > > > > > row "abc", is it always guranteed to be the first key when returned > > > from > > > > > scanned results? If so I can alway put a condition in the client > app. > > > > > > > > > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu > wrote: > > > > > > > > > > > Take a look at the following in > > > > > > hbase-server/src/main/ruby/shell/commands/scan.rb > > > > > > (trunk) > > > > > > > > > > > > hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND > > > > > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( > > > 123, > > > > > > 456))"} > > > > > > > > > > > > Cheers > > > > > > > > > > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia < > > > mohitanchlia@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > I see then I misunderstood the behaviour. My keys are id + > > > timestamp > > > > so > > > > > > > that I can do a range type search. So what I really want is to > > > > return a > > > > > > row > > > > > > > where id matches the prefix. Is there a way to do this without > > > having > > > > > to > > > > > > > scan large amounts of data? > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari < > > > > > > > jean-marc@spaggiari.org> wrote: > > > > > > > > > > > > > > > Hi Mohit, > > > > > > > > > > > > > > > > "+" ascii code is 43 > > > > > > > > "9" ascii code is 57. > > > > > > > > > > > > > > > > So "+9" is coming after "++". If you don't have any row with > > the > > > > > exact > > > > > > > > key "+++++", HBase will look for the first one after this > one. > > > And > > > > in > > > > > > > > your case, it's > > > > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. > > > > > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > 2013/3/28 Mohit Anchlia : > > > > > > > > > My understanding is that the row key would start with +++++ > > for > > > > > > > instance. > > > > > > > > > > > > > > > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari < > > > > > > > > > jean-marc@spaggiari.org> wrote: > > > > > > > > > > > > > > > > > >> Hi Mohit, > > > > > > > > >> > > > > > > > > >> I see nothing wrong with the results below. What would I > > have > > > > > > > expected? > > > > > > > > >> > > > > > > > > >> JM > > > > > > > > >> > > > > > > > > >> 2013/3/28 Mohit Anchlia : > > > > > > > > >> > I am running 92.1 version and this is what happens. > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => > 1, > > > > > > STARTROW > > > > > > > => > > > > > > > > >> > 'sdw0'} > > > > > > > > >> > ROW > > > > COLUMN+CELL > > > > > > > > >> > s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x > > > > > > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, > > > > > > > > >> > value=PAGE\x09\x091363056252990\x09\x09/ > > > > > > > > >> > 7F\xFF\xFE\xC2\xA3\x84Z\x7F > > > > > > > > >> > > > > > > > > > >> > 1 row(s) in 0.0450 seconds > > > > > > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => > 1, > > > > > > STARTROW > > > > > > > => > > > > > > > > >> > '------'} > > > > > > > > >> > ROW > > > > COLUMN+CELL > > > > > > > > >> > -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ > > > > > > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, > > > > > > > > >> > value=PAGE\x09239923973\x091363384698919\x09/ > > > > > > > > >> > xFF\xFE\xC2\x8F\xF0\xC1\xBF > > > > > > > > >> > row(s) in 0.0500 seconds > > > > > > > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => > 1, > > > > > > STARTROW > > > > > > > => > > > > > > > > >> > '++++'} > > > > > > > > >> > ROW > > > > COLUMN+CELL > > > > > > > > >> > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF > > > > > > > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426, > > > > > > > > >> > value=PAGE\x09\x091364404145275\x09 \x09/ > > > > > > > > >> > E\xC2S-\x08\x1F > > > > > > > > >> > 1 row(s) in 0.0640 seconds > > > > > > > > >> > hbase(main):006:0> > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan < > > > > > > > > >> > ramkrishna.s.vasudevan@gmail.com> wrote: > > > > > > > > >> > > > > > > > > > >> >> Same question, same time :) > > > > > > > > >> >> > > > > > > > > >> >> Regards > > > > > > > > >> >> Ram > > > > > > > > >> >> > > > > > > > > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan < > > > > > > > > >> >> ramkrishna.s.vasudevan@gmail.com> wrote: > > > > > > > > >> >> > > > > > > > > >> >> > Could you give us some more insights on this? > > > > > > > > >> >> > So you mean when you set the row key as 'azzzaaa', > > though > > > > > this > > > > > > > row > > > > > > > > >> does > > > > > > > > >> >> > not exist, the scanner returns some other row? Or it > > is > > > > > giving > > > > > > > > you a > > > > > > > > >> row > > > > > > > > >> >> > that does not exist? > > > > > > > > >> >> > > > > > > > > > >> >> > Or you mean it is doing a full table scan? > > > > > > > > >> >> > > > > > > > > > >> >> > Which version of HBase and what type of filters are > you > > > > > using? > > > > > > > > >> >> > Regards > > > > > > > > >> >> > Ram > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia < > > > > > > > > >> mohitanchlia@gmail.com > > > > > > > > >> >> >wrote: > > > > > > > > >> >> > > > > > > > > > >> >> >> I have key in the form of "hashedid + timestamp" but > > > when > > > > I > > > > > > run > > > > > > > > scan > > > > > > > > >> I > > > > > > > > >> >> get > > > > > > > > >> >> >> rows for almost every value. For instance if I run > > scan > > > > for > > > > > > > > 'azzzaaa' > > > > > > > > >> >> that > > > > > > > > >> >> >> doesn't even exist even then I get the results. > > > > > > > > >> >> >> > > > > > > > > >> >> >> Could someone help me understand what might be going > > on > > > > > here? > > > > > > > > >> >> >> > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --e89a8f234659e98a1504d9270126--