Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates
 209.85.217.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOT3TWp22s53Cyx0xH-14vhjcrJEpZh5WX4iuw+yCUDoZSg8MA@mail.gmail.com>
References: 
 <CAOT3TWrHOwkKAyutfm1m7pf=VxW0bcdJJNxpzh8BzP-1r0o6vw@mail.gmail.com>
	<CAAT7MkoAr7X=yt6uoBbvu_59-P-K0dy-0brWOkJPLCPK9PrbDw@mail.gmail.com>
	<CAAT7Mkqtor6vsqUy0pbg7hTi6F-AtSfbMKM14uCkXwySc7mbng@mail.gmail.com>
	<CAOT3TWo4Zgexu0cHizSLfOQLPhpxrOSRYTEZjAZc-ghxGMB91w@mail.gmail.com>
	<CAPQV63UJ-WnwfOmUQ-tNN6YMQE9tdBfpg9bwFFLuEnURnudZig@mail.gmail.com>
	<CAOT3TWp5DDpJ3-nh9szypkeFacdeZZD9jpPz8sV_QD1dXUnxiA@mail.gmail.com>
	<CAPQV63Uz7-v_P=+12nvgLSBHKNWnYHHBKnH2R+h2BTSgMqVPUw@mail.gmail.com>
	<CAOT3TWq_Go8dBpp4KdzRH7-qD1ytac=k=tSB-gzd2VSWjDEcgA@mail.gmail.com>
	<CALte62zWhJ+JJ_rReNkzWXBCoENzh1LxwyNmjmUY53nayD5cfw@mail.gmail.com>
	<CAOT3TWq9d422o-8bA6qC2_Y6c158jFCBM-hS_S6xLvftqHvp+w@mail.gmail.com>
	<9ABAF877128BEB4E8E8C7F8ADD9F30977B2A1F@PROD-EXCH-M3.corp.microstrategy.com>
	<CAAT7MkoNoqo12du-iRuYnnRPGnQpsRiko13rHff6sNREJtPKnw@mail.gmail.com>
	<CAOT3TWoOT6Eg3_5JM1GAnAs8BpL=h2Tztd9WwxQ8bquzV2qBXQ@mail.gmail.com>
	<CA+r7Yv=PK-TEA6Ea3LqADMQWEPBXcg5ez_4PFKnAY23ZSZw2pg@mail.gmail.com>
	<CAOT3TWp22s53Cyx0xH-14vhjcrJEpZh5WX4iuw+yCUDoZSg8MA@mail.gmail.com>
Date: Sat, 30 Mar 2013 09:37:45 -0700
Message-ID: 
 <CALte62yLSy9jh=7d=_GZfXDMCEp8c+eC1gx_XTGfQv+wiT+Ncw@mail.gmail.com>
Subject: Re: Understanding scan behaviour
From: Ted Yu <yuzhihong@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=e89a8f234659e98a1504d9270126

--e89a8f234659e98a1504d9270126
Content-Type: text/plain; charset=ISO-8859-1

See javadoc of Scan:

   * @param stopRow row to stop scanner before (exclusive)

   */

  public Scan(byte [] startRow, byte [] stopRow) {


On Sat, Mar 30, 2013 at 8:25 AM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> Thanks, that's a good point about last byte being max :)
>
> When I query 1234555..1234556 do I also get row for 1234556 if one exist?
>
> On Sat, Mar 30, 2013 at 6:55 AM, Asaf Mesika <asaf.mesika@gmail.com>
> wrote:
>
> > Yes.
> > Watch out for last byte being max
> >
> >
> > On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia <mohitanchlia@gmail.com
> > >wrote:
> >
> > > Thanks everyone, it's really helpful. I'll change my prefix filter to
> end
> > > row. Is it necessary to increment the last byte? So if I have hash of
> > > 1234555 my end key should be 1234556?
> > >
> > >
> > > On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Mohith,
> > > >
> > > > It is always better to go with start row and end row if you are
> knowing
> > > > what are they.
> > > > Just add one byte more to the actual end row (inclusive row) and form
> > the
> > > > end key.  This will narrow down the search.
> > > >
> > > > Remeber the byte comparison is the way that HBase scans.
> > > > Regards
> > > > Ram
> > > >
> > > > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min <mili@microstrategy.com>
> > > wrote:
> > > >
> > > > > Hi, Mohit,
> > > > >
> > > > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter.
> > > > >
> > > > > "+" ascii code is 43
> > > > > "," ascii code is 44
> > > > >
> > > > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '++++',
> > > > ENDROW=>'+++,'}
> > > > >
> > > > > Min
> > > > >
> > > > > -----Original Message-----
> > > > > From: Mohit Anchlia [mailto:mohitanchlia@gmail.com]
> > > > > Sent: Friday, March 29, 2013 1:18 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Understanding scan behaviour
> > > > >
> > > > > Could the prefix filter lead to full tablescan? In other words is
> > > > > PrefixFilter applied after fetching the rows?
> > > > >
> > > > > Another question I have is say I have row key abc and abd and I
> > search
> > > > for
> > > > > row "abc", is it always guranteed to be the first key when returned
> > > from
> > > > > scanned results? If so I can alway put a condition in the client
> app.
> > > > >
> > > > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > > > Take a look at the following in
> > > > > > hbase-server/src/main/ruby/shell/commands/scan.rb
> > > > > > (trunk)
> > > > > >
> > > > > >   hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
> > > > > >     (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter (
> > > 123,
> > > > > > 456))"}
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia <
> > > mohitanchlia@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I see then I misunderstood the behaviour. My keys are id +
> > > timestamp
> > > > so
> > > > > > > that I can do a range type search. So what I really want is to
> > > > return a
> > > > > > row
> > > > > > > where id matches the prefix. Is there a way to do this without
> > > having
> > > > > to
> > > > > > > scan large amounts of data?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari <
> > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > >
> > > > > > > > Hi Mohit,
> > > > > > > >
> > > > > > > > "+" ascii code is 43
> > > > > > > > "9" ascii code is 57.
> > > > > > > >
> > > > > > > > So "+9" is coming after "++". If you don't have any row with
> > the
> > > > > exact
> > > > > > > > key "+++++", HBase will look for the first one after this
> one.
> > > And
> > > > in
> > > > > > > > your case, it's
> > > > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF.
> > > > > > > >
> > > > > > > > JM
> > > > > > > >
> > > > > > > > 2013/3/28 Mohit Anchlia <mohitanchlia@gmail.com>:
> > > > > > > > > My understanding is that the row key would start with +++++
> > for
> > > > > > > instance.
> > > > > > > > >
> > > > > > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari <
> > > > > > > > > jean-marc@spaggiari.org> wrote:
> > > > > > > > >
> > > > > > > > >> Hi Mohit,
> > > > > > > > >>
> > > > > > > > >> I see nothing wrong with the results below. What would I
> > have
> > > > > > > expected?
> > > > > > > > >>
> > > > > > > > >> JM
> > > > > > > > >>
> > > > > > > > >> 2013/3/28 Mohit Anchlia <mohitanchlia@gmail.com>:
> > > > > > > > >>  > I am running 92.1 version and this is what happens.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT =>
> 1,
> > > > > > STARTROW
> > > > > > > =>
> > > > > > > > >> > 'sdw0'}
> > > > > > > > >> > ROW
> > > >  COLUMN+CELL
> > > > > > > > >> >  s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x
> > > > > > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106,
> > > > > > > > >> > value=PAGE\x09\x091363056252990\x09\x09/
> > > > > > > > >> >  7F\xFF\xFE\xC2\xA3\x84Z\x7F
> > > > > > > > >> >
> > > > > > > > >> > 1 row(s) in 0.0450 seconds
> > > > > > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT =>
> 1,
> > > > > > STARTROW
> > > > > > > =>
> > > > > > > > >> > '------'}
> > > > > > > > >> > ROW
> > > >  COLUMN+CELL
> > > > > > > > >> >  -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\
> > > > > > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714,
> > > > > > > > >> > value=PAGE\x09239923973\x091363384698919\x09/
> > > > > > > > >> >  xFF\xFE\xC2\x8F\xF0\xC1\xBF
> > > > > > > > >> >   row(s) in 0.0500 seconds
> > > > > > > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT =>
> 1,
> > > > > > STARTROW
> > > > > > > =>
> > > > > > > > >> > '++++'}
> > > > > > > > >> > ROW
> > > >  COLUMN+CELL
> > > > > > > > >> >  +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF
> > > > > > > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426,
> > > > > > > > >> > value=PAGE\x09\x091364404145275\x09 \x09/
> > > > > > > > >> >  E\xC2S-\x08\x1F
> > > > > > > > >> > 1 row(s) in 0.0640 seconds
> > > > > > > > >> > hbase(main):006:0>
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan <
> > > > > > > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > > > >> >
> > > > > > > > >> >> Same question, same time :)
> > > > > > > > >> >>
> > > > > > > > >> >> Regards
> > > > > > > > >> >> Ram
> > > > > > > > >> >>
> > > > > > > > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan <
> > > > > > > > >> >> ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > > > > >> >>
> > > > > > > > >> >> > Could you give us some more insights on this?
> > > > > > > > >> >> > So you mean when you set the row key as 'azzzaaa',
> > though
> > > > > this
> > > > > > > row
> > > > > > > > >> does
> > > > > > > > >> >> > not exist, the scanner returns some other row?  Or it
> > is
> > > > > giving
> > > > > > > > you a
> > > > > > > > >> row
> > > > > > > > >> >> > that does not exist?
> > > > > > > > >> >> >
> > > > > > > > >> >> > Or you mean it is doing a full table scan?
> > > > > > > > >> >> >
> > > > > > > > >> >> > Which version of HBase and what type of filters are
> you
> > > > > using?
> > > > > > > > >> >> > Regards
> > > > > > > > >> >> > Ram
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia <
> > > > > > > > >> mohitanchlia@gmail.com
> > > > > > > > >> >> >wrote:
> > > > > > > > >> >> >
> > > > > > > > >> >> >> I have key in the form of "hashedid + timestamp" but
> > > when
> > > > I
> > > > > > run
> > > > > > > > scan
> > > > > > > > >> I
> > > > > > > > >> >> get
> > > > > > > > >> >> >> rows for almost every value. For instance if I run
> > scan
> > > > for
> > > > > > > > 'azzzaaa'
> > > > > > > > >> >> that
> > > > > > > > >> >> >> doesn't even exist even then I get the results.
> > > > > > > > >> >> >>
> > > > > > > > >> >> >> Could someone help me understand what might be going
> > on
> > > > > here?
> > > > > > > > >> >> >>
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

--e89a8f234659e98a1504d9270126--